Tuesday, February 15, 2011

Elementary, My Dear Watson?

As many of you know, this week IBM's computer system, Watson, is competing on Jeopardy against its two strongest performers, Ken Jennings and Brad Rutter. At the time of this post, their first match has finished, and Watson is ahead by a large margin.  Watson's lead is so great that it would be quite surprising if Ken or Brad were to catch up.  Given I do machine learning research (though I'd more call the Jeopardy task AI than ML), I couldn't resist posting about this match.

Ever since witnessing humanity's line fall when Deep Blue defeated Kasparov over a decade ago, we humans have become accustomed to computers outperforming us at various tasks.  Computers were first built precisely to do computations quicker and more accurately than we could hope to.  And even though winning at chess takes a lot more than brute-force computing power (if a computer really tried to calculate all possible chess move sequences, it would take more than the current age of the universe for it to finish), chess seems like one of those activities that computers should be good at.  Actually, the best humans can still easily defeat the best computer systems at Go, but few of us will be surprised when, in a couple years, this ceases to be the case.

Jeopardy is another thing all-together.  One might at first ask what possible hope is there of beating a computer at trivia.  Watson can download pretty much the entire internet into its memory (and has), but the problem is what to do with all this information.  And equally difficult is understanding the Jeopardy questions (or "answers" as they call them) in the first place.  Yesterday, Watson had a lot of trouble with the "name the decade" category precisely because it didn't know the answer had to be a decade.  Or in naming the murderer of Snape and others, Watson couldn't properly rank Voldemort over Harry Potter because, even though it had the entire book in its memory, it had no idea of the concept of murder -- only of word associations, and Harry and murder appear rather frequently together (you-know-who's fault).**

So the difficult part for Watson is exactly the easy part for humans (and vice versa).  Watson can easily store the name of every general, country, and battle, but has a lot more difficulty trying to figure out what is being asked.  Sure there are keywords like "he" or "this date," but when there's any ambiguity, it's quite a challenge.  And even if Watson figures out the answer is a date, it still does massive lookups, searches for correlations, runs machine learning algorithms, etc.  At any point, something can go wrong because Watson doesn't "understand" the way we do.  (Interestingly enough, as Louis von Ahn observes on twitter, the final Jeopardy question about city airport names was pretty hard to answer even for us humans using a search engine, though it wasn't so hard for Ken or Brad, but no human would answer Toronto for "U.S. cities").  Ken and Brad need no help interpreting the questions, but to them, remembering the ridiculous amount of information is the hard part.

That being said, the IBM team has done a great job with Watson.  Watson's performance this far already heralds other incredibly useful applications, which are not so unlike the Star Trek TNG computer systems -- only they'll come much before the 24th century.  Don't get me wrong; there's still quite a bit to go.  Real-life speech doesn't come in the form of nicely-formatted Jeopardy riddles (typed to Watson, who still doesn't have speech recognition), but this demonstration shows we've passed a major hurdle.

Finally, I should say that even though I believe that these technological changes are ultimately for the better, I am rooting for Ken and Brad*.  Once Watson can beat us, there's no going back.  In 10 more years, your typical personal computer, in whatever phone/laptop/terminal/watch shape it is, will run circles around Watson.

Yet, whatever discomfort I have will probably soon go away, and I'll become accustomed to computers being better than we are at yet another thing.  I'll take comfort in my personal better-than-Watson computer.  I'll enjoy the efficiency this brings.  I'll celebrate of the new scientific and medical breakthroughs we'll make using the help of (and closer interaction with) ever more powerful computers.  I'll, for one, eventually welcome our trivia overlords.

Update (2/16/11): Watson won today, as expected. And Daniel Reeves has some interesting ideas on how to change the rules of the game.

*I also felt that the game was a bit rigged against Ken and Brad -- the computer can always buzz in faster, so Ken and Brad have to compete for who wins the more human questions.  Were the game 1 on 1, it would have been a lot closer.  And if it were two Watsons vs just Ken or Brad, I'm sure the humans would win.

**Showing, once more that humans are decidedly more human, as we can easily tell the difference between Harry and Voldemort.

The image is of IBM's Watson Avatar and taken from Wikipedia.  It is posted for commentary under "fair use."


  1. It may be hindsight bias to say "chess seems like one of those activities that computers should be good at". Isn't there one of Asimov's "I, Robot" stories where robots could run around and give kids piggy back rides and whatnot but were not so advanced that they could play chess?

    Great write-up of the Jeopardy match, by the way!

    But speaking of hindsight bias, although the buzzer seems to be turning out to be an unfair advantage for Watson, the humans have a related unfair advantage: They can anticipate the moment the buzzers are activated (when Trebek finishes reading the question) and buzz in faster than is possible for even Watson to react. Of course if they buzz in too early they're locked out for a fraction of a second.

    What's really needed is a rule tweak: Everyone who buzzes in within the first fraction of a second after the buzzers are activated is considered to have buzzed in simultaneously and the tie is broken randomly. Or, simpler, just eliminate the lock-out -- buzzing in before the buzzers are activated is just treated as buzzing in at the exact moment the buzzers are activated.

  2. Thanks, Dan. Sounds like an interesting rule tweak. Ken and Brad have a lot of experience buzzing and they almost never beat Watson. But actually I think the buzzer is fair game. I just have a problem with the 2 humans vs 1 computer ratio.

    As for hindsight bias, I somewhat agree. I tried to address this point -- that soon we'll get used to computers doing whatever they do next and it will become natural. Funnily enough, many people still consider Go a "human game" and will be completely shocked when a computer can win.

  3. I'd like to see a computer beat a comedian at telling jokes.

  4. Exactly! I actually think that Star Trek's musings on what Data could and couldn't do weren't terribly far off.

  5. ...except for that bogus inability to use contractions =)

  6. Lev: Watson has an unfair (in terms of what the competition is really about) advantage in reaction time and Ken and Brad have an unfair advantage in hearing Trebek speak, and thus timing their button press to beat Watson's reaction time (rather rarely, it turns out).
    It would be quite a coincidence if those advantages perfectly canceled out and yielded a fair game, right?

    But maybe it's not so different from the other stark asymmetry here: Humans are vastly better at actually understanding what is being asked, and vastly worse at knowing the answer when Watson is lucky enough to understand the question.

    Put Watson and a human (any human) *together* and you really have a quantum leap forward in question answering. (The human, for example, would nix Watson's pathetic "Toronto" as an answer to a "What U.S. city..." question.)

  7. Anonymous10:32 AM

    Unrelated to the present post, but Google found for me on your twitter the answer to the question I was wondering about when rereading Valiant's theory of the learnable: "who coined the term PAC?" Thanks for your answer! /Jan