How Did IBM Watson AI Really Win Jeapordy?

IBM Watson AI -magoosh

Photo by Atomic Taco.

On 14th January 2011, a match of Jeopardy was played between three players – two humans and IBM Watson. For the first time in Jeopardy, a computer-based system was playing Jeopardy. The winner was IBM Watson, which used data analytics and artificial intelligence to beat humans.

IBM Watson is a cognitive system that combines information retrieval and natural language processing (NLP). Basically, it is a question-answering computer system that answers questions framed in natural language. It applies advanced technologies like natural language processing, information retrieval, knowledge representation, automated reasoning, and machine learning to question answering problems.

Watson uses IBM’s ‘DeepQA’ software and the Apache UIMA (Unstructured Information Management Architecture) framework. It is data-driven.


A question is an input in IBM Watson in natural language. IBM then analyses this input question. It conducts a primary search in its data using various algorithms. The results of this search are various answers. Each answer is then scored to try and find out the most accurate answer. The answer with the best score is then filtered out from the scored list of questions. Then the final answer is synthesized followed by further merging and ranking. Finally, an answer is constructed using trained models along with the confidence of the question.

The answer scoring is done in two ways – context dependent and context independent scoring. The final synthesis is done by evidence retrieval and deep evidence scoring. Basically, it contains six steps –

  1. Question analysis
  2. Primary search
  3. Candidate hypothesis generation
  4. Answer scoring
  5. Supporting Evidence
  6. Merging candidate answers and scoring the confidence

Question analysis

It analyses each and every word in the question. It finds out what is the required output of the question. It determines if the answer required is a place, date, person’s name or a book title (it can be anything).

It then marks a ‘focus’. Focus is basically a word or 2-3 words that would have the strongest relation with the answer. For example, ‘this poet’ can be the focus of a question asking the poet whose name is required as the answer to that particular question. The rest of the words of the question are marked as ‘keywords’.

IBM Watson has many sources of data available to search. It conducts a primary search throughout its database. The primary search results are parsed to build candidates for possible answers based on titles, anchor texts, passage and their parts, checking candidates against constraints.

Primary Search

The keywords are used to search over millions of documents to find relevant hits. It finds documents and passages that contain these hits. It uses tools like Indri passage search and Lucene passage search. The document search results and passage search results are ranked.

Candidate Hypothesis Generation

The possible answers to the question are referred to as ‘Candidate Answers’. These are identified in each search result. These are found by looking at the titles of the documents. A collection of title variants and expansions are also examined to look for the candidate answers. Then possible answers are looked at in the text of documents and passages such as named entities, noun phrases, anchor text, dates, places etc. The candidate answers are then given their first evidence feature scores from their corresponding document search rank and passage search rank.

The scoring is done using a diversity of variations of parameters. Scoring is responsible for obtaining the confidence of the answers. PRISMATIC (relationship search) and Semantic relations (DBpedia) indexes are used. Numerous (50+) scoring components are looked upon, few of which include Gender, Geospatial (location), Temporal, Taxonomic, Name consistency, Source reliability and theory consistency. Context dependent and context independent scoring is done.

Answer Scoring

A sizable number of answer scoring analytics are used to score the candidate answers. An example is Type Coercion (TyCor) scorers. It is an analytic which uses solely the candidate’s answer and the question along with a lot of general background knowledge. The TyCor scores estimate the likelihood of a candidate answer being a case of the Lexical Answer Type (LAT) in the question.

Supporting Evidence

It is basically a passage search. It is similar to a primary search except it uses candidate answer as a term. The candidate answers are further scored to ensure the context. The scoring is done on the basis of passage term match, text alignment, logical form answer candidate scoring, skip-bigram, etc.

Merging of candidate answers is also done because when there is a high number of candidate count, the possibility of duplicate exists. To remove this problem, we merge. Merging requires normalizing scores per feature to make a merger.

After merging, the ranking is done. ML and IBM SPSS are used over the training data to create a model to rank future results. Several techniques like linear and logistic regression are used. A continuous cycle of teaching, training, and execution is done for over 10,000 training questions and 2,000 test questions.

Merging candidate answers and scoring the confidence

Finally, variants of same answers are identified and their feature scores are merged together. Then, the final confidence scores for the candidate answers are calculated. This is done using a series of ‘Machine Learning’ models. These models weigh all of the feature scores to produce the final confidence scores. The answer with the highest confidence is the final answer to the question.


IBM Watson uses a lot of complex algorithms and systems to run. It is the output of a very big effort that allowed it to win Jeopardy. So, it can be said to be a successful machine that used artificial intelligence to outperform humans. The IBM Watson that was developed for winning jeopardy was just the basic step as it has a lot of potential in other industries, especially data-driven. In fact, it is now being developed to be implemented in industries such as healthcare and finance.

Comments are closed.

Magoosh blog comment policy: To create the best experience for our readers, we will only approve comments that are relevant to the article, general enough to be helpful to other students, concise, and well-written! 😄 Due to the high volume of comments across all of our blogs, we cannot promise that all comments will receive responses from our instructors.

We highly encourage students to help each other out and respond to other students' comments if you can!

If you are a Premium Magoosh student and would like more personalized service from our instructors, you can use the Help tab on the Magoosh dashboard. Thanks!