Watson Aims to Outfox Humans at Their Own Game
IBM Researchers Conjure Right Question from Wild, Wooly Data
By Roger L. Kay
I joined a small horde of analysts in Rye Brook, NY, Nov. 15th to hear IBM’s vision for the future of
its Systems and Technology Group (STG). The lead-off speakers ratified the absorption of STG
into the company’s Software Group (SWG), with Rod Adkins, the head of STG, warming up the
stage for Steve Mills, the boss at SWG.
Mills didn’t stick around long, which makes sense: even though STG now officially “reports” to
software — a sea change long in coming — the division deals primarily with hardware,
concerning itself with issues such as semiconductor and electronics design, component
sourcing, and the like. As such, STG still deserves its own forum. So, the confab was a “better
together” fest, which also gave STG a chance to update the analyst community on progress on
various hardware fronts.
But what caught my attention was the luncheon demo of Watson, the special-purpose computer
that IBM scientists are rapidly turning into a Jeopardy champion. The Watson project is a follow-
on to Deep Blue, the chess maven that researchers at IBM eventually taught to beat Gary
Kasparov, the Russian grandmaster and former World Chess Champion, who fell to Deep Blue is
1997.
The Jeopardy challenge is quite different from the chess problem. For example, although the
number of possible chess moves is large — 1047 by one estimate — they are all contained within
a deterministic framework: a set of legal moves in a clearly demarcated territory with a fairly
simple set of rules. Excellence at chess involves understanding the consequences of various
sets of moves, making probabilistic projections of likely outcomes, and choosing the best next
legal move. Excellence at Jeopardy, on the other hand, relies on a whole set of subtle, squishy
linguistic relationships, not at all the sort of thing computers are supposed to be good at.
For TV luddites or anyone who missed Jeopardy, the game consists of contestants being given a
sometime-complex statement, which is the answer to the question they must derive. So, “A
thieving villainess and former ACME agent who hides in a wide range of geographies and time
periods” should elicit “Who is Carmen Sandiego?”
Despite the complexity of the challenge, IBM scientists set out to make a Jeopardy champion, and
over time they have developed one that, while not infallible, has begun to rack up records like
previous human grand champions. Watson has even competed with real people of increasing
skill over time and acquitted itself admirably.
At the moment, Watson is running off a series of competitions against human champions, results
to be published at a later point, but I am confident, having seen the statistics, that Watson will
prove that a computer can master a task once thought uniquely human: dealing with
indeterminate information in linguistic form and teasing subtle meanings to light.
A key rule of Jeopardy is that right answers add points, but wrong answers take them away. So,
sometimes it pays just to stay in bed in the morning and not even bother to get up. In other words,
you may be better off keeping your mouth shut. Thus, confidence in the answer is critical. IBM
data showed that high scorers knew the correct answer only some (~60%) of the time but
consistently had high confidence (~90%) in the answers that they did give.
Programming Watson started off simply enough. First off was keyword-matching. Researchers
fed the system dictionaries, encyclopedias, and other reference texts to make it at least recognize
words. Then, they began creating basic “wisdom” consisting of a simple syntax or set of rules for
constructing sentences. These most easily took the form of Subject followed by Verb and
perhaps Object or other modifier (e.g. Carmen Sandiego travels the globe).
Thus, the core data grew to a vast set of true statements (e.g., Carmen Sandiego is a woman),
which must be matched up with the implicit as well as explicit statements in the clue. This is quite
a trick in itself, but in addition the program has to derive the actor(s) and action(s) from the right
set of true statements in its gigantic store and turn them into the correct question.
Based on IBM's DeepQA technology, which lines up the ability to generate hypotheses, gather a
massive amount of evidence, perform analysis, and score the results, Watson applies years of
IBM experience in the areas of natural language processing, information retrieval, knowledge
representation, reasoning, and machine learning to the problem of answering questions in an
open domain. Not at all chess-like, that open domain. And despite these smarts, Watson, at this
point, was losing against pretty much any informed human.
Puns and literary references were added to disambiguate. Slang also turned out to be important.
Some improvement, but still not enough.
The key breakthrough, it turns out, proved to be tuning up the confidence levels. The scientists
programmed the system to generate trial hypotheses, each with its own confidence percentage.
These hypotheses were ranked by confidence, and, if the highest one fell below 50%, Watson
would keep its yap clammed. Watson started generating world-class outcomes, but took hours to
do what a human might do in a few seconds.
The finishing touch came from a redesigned hardware architecture that emphasized parallel
computing (tip of the hat to the STG hardware team). The complex task stream was broken into
many threads that could run at the same time, interact, and be recomputed based on results from
other threads. A hardware grid of thousands of processors was created to execute all these
threads. At this point, Watson began to produce big-league results, beating not only real people,
but real Jeopardy champs by “hitting the buzzer” before they could.
The word is not yet official, but it looks as if another IBM creation will be consistently able to outdo
just about any human before long, this time in a highly delicate and human discipline. John Henry
is rolling over in his grave.
Like all pure research projects, Watson begs the question as to whether all this expenditure to
beat a TV game show is worthwhile. Although the goal itself may seem trivial, many of the
technologies discovered and developed during Watson’s genesis will ultimately lead to better
commercial products: computers that “understand” natural language, that can make reasoned
judgments from information hidden in a huge mass of unstructured data, and that mimic with near
perfect fidelity the thinking of actual humans.
© 2010 Endpoint Technologies Associates, Inc. All rights reserved.
Who's in Jeopardy Now?