How testing could transform higher education

I think the online education movement is amazing. College classes, like youth, are wasted on the young. College students are usually getting their first chance to live away from their parents and figure out who they are, which, understandably, seems more immediate and alive than 19th century English lit. or linear algebra or whatever class material they happen to study. So for a lot of students college is just the last hoop in a series of hoops they have to jump through before they are allowed to start their own life. So I think classes full of people who want to learn that subject is a great idea.

The most interesting phenomenon in this transformation is that education is changing from a non-scalable medium like theater or opera performances to a scalable medium like film or mp3s. This has the usual effect: average quality rises dramatically, price drops, and demand rises. It is all well and good to debate whether the Stanford students who took the Machine Learning class in person got a better experience than those who took the class online, but very likely all but a handful or those who took the online class wouldn’t have been admitted as Stanford students at all.

I do think this will transform the university. With a non-scalable medium, taking a class from the third best algorithms professor in the world is a great opportunity, but with a scalable medium it isn’t clear why anyone wouldn’t just want the best. So expect huge pressure to improve the quality of teaching (which is completely lacking today). And expect these top professors to be treated much more like rock stars, and make a lot of money from teaching. (This isn’t suprising, the value created by a class of 200k people is just so much higher than from a class of 150, it would be surprising if the professor couldn’t capture at least some of that).

But that isn’t what I want to talk about. I want to talk about testing and how I think testing, done right, could have an equal impact on higher education.

People say that these online classes will never replace colleges, and that may be true in some sense. I think if you break up the value of college into its constituant pieces there are really three parts:

  1. Learning. This gets a lot of the press though, as I said, it may not be the most central aspect for many students.
  2. Certification. It is not enough to know something, you need to prove to others that you know it. Companies don’t have the time to devise deep evaluations of everything you were supposed to learn in school so they use silly heuristics like your grades or the reputation of your school. For many people this is why they go to school, to get a degree.
  3. “The college experience.”

Let’s go through each of these.

Learning, is what all articles about MOOCs and online education cover, and though I have a lot to add, I will save it for another time.

By “the college experience” I mean all the non-educational aspects of college. This is the friends, late night conversations, sex, drugs, alcohol, and all that. This is the first time many kids have to move out on their own, away from family and friends who have known them sense grade school, and kind of start fresh. For many people who start a career directly after college this may be both the first and the last chance they have to reinvent themselves. But college administrators have no particular expertise in providing this, and fortunately the college experience isn’t that hard to replicate. I think you just need dorms—i.e. housing where lots of young people are close together—and you need kids to move away from where they grew up and the rest takes care of itself. The housing could probably be cheap and nice too if it weren’t provided by universities, which, whatever their strengths, are not exactly the most efficient landlords.

So that leaves certification. That is what I really want to talk about.

Unlike this article, I think online education companies will make lots of money. The reasons are simple. Education takes a lot of time, so people will pay for better quality. If you are going to spend a few hundred hours in a class you want it to be good, and you would pay a few hundred dollars to get something 10% better for your time. And that doesn’t even address the more irrational reasons. People are used to paying a lot for education, and I think there is an irrational part of human nature that tends to assess prices relative to what they are used to.

But I don’t think producing classes is the best business in this space, and it may not even be the most transformative for education. The best business is certification or testing.

I think people can’t see how important this is because certification and testing is so broken now. How is it broken?

First, it has become customary that colleges get to assess their own students, which, since colleges are paid by the students, has led to the inevitable grade inflation.

Second, colleges have no motivation to make grades good—they don’t benefit by making grades comparable or accurate. How does a 3.3 average from Berkeley compare to a 3.5 from Georgia Tech? Nobody knows and I doubt either Berkeley or Georgia Tech would want to answer that question if they could.

Third GPAs, the usual summary measurement of grades, is a terrible statistic. It averages together lots of subjects in a way that likely isn’t meaningful for anything in particular other than measuring how much, on average, the candidate cared about getting good grades. It is easily manipulated by taking easy classes, which is exactly the wrong thing to reward. And it values consistency over all else (since getting an A is pretty easy these days, getting a high GPA is more about never screwing up then being particularly good at anything).

I have done a fair amount of hiring which let’s you look at GPAs and then do an in person assessment. GPAs aren’t worthless but neither are they worth much.

In short colleges do a terrible job at assessment which has made hiring use grades less than they should.

Outside of grades, most tests kind of suck. The normal “standardized” tests are overly general (one 3 hour test may cover your whole high school or college education). They also try to test subjects like English that are hard to test. Boiling your high school education down to an SAT score or your college education down to a GRE score is silly.

Interestingly the concept of “certification” has arisen in the professional context appart from schools. This is your “Microsoft Certified Systems Engineer” and the like. These certifications have a bad reputation purely because they are pass/fail and aimed at fairly low-end jobs. Having an MCSE is kind of like putting on your resume that you passed high school algebra. It’s not a bad thing, but if you have to say so (or you have to ask) that isn’t good. Harder certifications—an MD, for example—has a better reputation. But any pass/fail test will be aimed at preventing very bad quality rather than finding very good quality (having an MD, after all, doesn’t indicate a good doctor).

But imagine a group of people who care deeply about data working seriously on the idea of assessing skills. First your score would have to be closer to a percentile rank not pass/fail, and that rank would have to be against other people taking the test. Percentiles are wonderful because you know exactly what it means (a 99.9 means the candidate was better than 99.9 percent of all test takers) where as an ‘A’ doesn’t come with that. There are plenty of hard problems to get right: you have to randomize the question pool to avoid cheating, but you have to guarantee a fixed performance (can’t have people lucking out and getting all easy questions).

The other problem with existing tests is that they are too general. This makes studying for them stressful. Tests should cover a single specific area (i.e. “Linear Algebra”) not a general field (“math”). One can always create scores for general areas by taking scores in a set of tests for core subjects in that area.

I think this kind of testing would need to be done in person over a period of a day or so per subject. This sounds ridiculously time consuming compared to short tests like the SATs, but I think that is not an unreasonable percentage of time to spend on assessment and it would stand in for the “final” since this would be a much better, more comparable test.

I think it is easy to miss how powerful this is. To see it, you have to think about the hiring process as it works today. Let’s say there are three types of hiring: unskilled, skilled but inexperienced, and skilled and experienced. Unskilled hiring is not complicated (“have you ever been convicted of a felony?”). Hiring skilled, experienced people is generally done based on what they have accomplished; if they are really good they have been working for a while then they will have done some big things and have people who will vouch for them. This is going to be better than any test. In other words, LinkedIn can solve that problem. But hiring skilled, inexperienced people is pretty hard because they haven’t had an opportunity to do anything yet.

Let me illustrate this by describing the process for hiring new college graduates for a programming job. These are people who have specialized skills (programming) but no real work experience to judge. This process is pretty standard for good silicon valley companies and goes something like this. First you eliminate candidates from all but the best schools. You know that the best programmers at the worse schools are better than the worse programmers at the best schools, but this kind of blunt heuristic is all you have. Then you interview these candidates essentially at random (maybe you look at their projects or internships or whatever but it is done so quickly it is basically random). The first round of interview is usually a one hour phone screen. Assessing a persons skill set in one hour over the phone is, of course, totally impossible. So you reject lots of good people for silly reasons like taking a little longer to answer the one question you had time for. Interviewers are generally poorly calibrated against one-another so it matters almost as much whether you get an easy interviewer as how well you answer unless you are a complete failure. This, if successful, will be followed up by a series of in person one hour interviews. Refining this process, standardizing the question set, avoiding cheating, and calibrating the scores of your interview process are a huge amount of work and usually done wrong.

But the real inefficiency is this. Once you have invested a few dozen hours in assessment of a candidate, what happens to that assessment? Well, for most candidates, say 95%, the answer is “no hire”. This means that another company does exactly the same thing (pick a good school, ask simplistic questions, etc). Basically all the information gained in the first interview is thrown away. In total a candidate may go through 40 hours of interviews at different companies, but the coverage is terrible since all the companies ask the same questions and don’t share the results.

This problem doesn’t just impact companies, it impacts candidates. Candidates who are fantastic programmers but who lack the right degree, or went to the wrong school will not be given an interview at all. It is just too expensive to risk it because the rejection rate for that group, on average, is a little higher. My friend just went through this. He has a math degree from Harvard, and taught himself programming. After four years working as a concert cellist, he wanted to get into software engineering. The problem was, how to convince companies that you know what you know? They don’t have the time to let you prove that, and most won’t even return your calls. Meanwhile anyone with a Stanford CS degree has companies stalking them at home. This is an inefficient market. The Stanford CS kids are better, on average, but they aren’t that much better.

Now imagine that there is a central organization that does assessment. Let’s say that this assessment is granular (say one test per few classes) and let’s say that it is put together by real data people with a lot of effort to make the test material and the ranking itself high quality.

Naive thinking would lead you to believe that companies would hire by taking resumes, applying their random filtering, and then requesting test scores from this central organization for those resumes. But of course that isn’t how it would work at all. Instead you would just query and rank ALL candidates who met your standards on the skills you cared about.

This is the key to why testing is such a good business. It isn’t about charging test takers and competing with ETS. It is about being the sole entity with the ability to query a database that has all the skills of all the professionals and having a deep and well-calibrated assessment of that skill. There is no reason this testing should be limited to things currently taught in college classes, I think it could extend to any quantifiable skill.

If you believe that education will become very cheap as it moves from a non-scalable to a scalable model then this will result in lots of people who can learn things from these classes, but without the usual ability to certify their knowledge (e.g. a degree from Stanford). Of course the online education providers can try to provide this, but what does it mean to have a degree from Coursera or Udacity? I think this is just a digital imitation of the current (bad) practice.

Obviously testing like this wouldn’t replace interviews. But it would replace the part of interviews that companies are bad at (well calibrated assessment of basic skills) and give more time for assessing fit and personality.

Likely people would resent being tested and scored. But people don’t like being interviewed either, and at least this kind of model would mean shorter lower pressure interviews and the ability to “do over” if you happen to get tested on a bad day. Because the “query model” changes, the tests effectively apply you to all companies, rather than having to interview at each one.

This idea is only really easy to scale for quantitative ares. For math, engineering, and the sciences testing, when done right, can be very high quality. For these areas there is no reason for silly pre-computer relics like multiple choice, you can give open ended problems without a pre-defined set of answers. In computer programming you could actually have the person write a computer program.

Non-quantitative disciplines like english are harder to scale, but they are assessable. I think writing can be graded, and I think a really good writing certification could be a lot more useful then college literature classes (which focus on literary critique more than writing) for most uses. So the humanities could be assessed as well, but it would cost more since a human being would need to read the writing.

Its worth noticing the impact this would have on the world if it succeeded. First of all I think having a well-calibrated measurement would put a lot more focus on learning the material and much less on how you learned it. No one will care too much which book you read, which online class you took, or what exercises you did. Those are all just ways to learn. Second, this would truly democratize education—the self taught Bangladeshi is on equal footing with the legacy Harvard admittee.

Another way to say this is that testing, if it is good enough, commoditizes education. This sounds bad, since somehow commodities have gotten a bad wrap. But I mean this in the technical sense. A commodity is something that has a measurement of quality which is accurate enough that the details of the good’s production are irrelevant. If you know the ratings for a diamonds size, clarity, and color, you don’t need to worry about what mine it came from. In the absence of good measurements we are forced to rely on brands or other heuristics. But this is exactly how learning should work. If you learned linear algebra it shouldn’t somehow count for more because that learning happened in a Yale classroom instead of in your bedroom. All that should matter is how well you learned it.

Actually doing this would be a lot of work. Hopefully someone will steal this idea and do it.


  1. codeslinger reblogged this from boredandroid
  2. boredandroid posted this