The Oracle(ID:8403/)

Early QA system 


Early quesion-answering system by Antony Phillips, 1960 MIT

Famous for its fail message "(THE ORACLE DOES NOT KNOW)".


References:
  • Phillips, A.V., "A Question-Answering Routine," Masters Thesis, Mathematics Department, MIT, Cambridge, Mass; 1960. view details
  • Bobrow, D.G. "Natural Language Input for a Computer Problem Solving System", Report MAC-TR-1, Project MAC, M.I.T., Cambridge, Mass., June 1964 view details External link: Online copy pdf ps Abstract: The STUDENT problem solving system, programmed in LISP, accepts as input a comfortable but restricted subset of English which can express a wide variety of algebra story problems. STUDENT finds the solution to a large class of these problems. STUDENT can utilize a store of global information not specific to any one problem, and may make assumptions about the interpretation of ambiguities in the wording of the problem being solved. If it uses such information, or makes any assumptions, STUDENT communicates this fact to the user. The thesis includes a summary of other English language question-answering systems. All these systems, and STUDENT are evaluated according to four standard criteria. The linguistic analysis in STUDENT is a first approximation to the analytic portion of a semantic theory of discourse outlined in the thesis. STUDENT finds the set of kernel sentences which are the base of the input discourse, and transforms this sequence of kernel sentences into a set of simultaneous equations which form the semantic base of the Student system. STUDENT then tries to solve this set of equations for the values of requested unknowns. If it is successful it gives the answers in English. If not, STUDENT asks the user for more information, and indicates the nature of the desired information. The STUDENT system is a first step toward natural language communication with computers. Further work on the semantic theory proposed should result in much more sophisticated systems. Extract: Introduction
    Introduction
    The aim of the research reported here was to discover how one could build a computer program which could communicate with people in a natural language within some restricted problem domain. In the coarse of this investigation, I wrote a set of computer pro-grams, the STUDENT system, which accepts as input a comfortable but restricted subset of English which can be used to express, a wide variety of algebra story problems. The problems shown in Figure 1 illustrate some of the communication and problem solving capabilities of this system.
    In the following discussion, I shall use phrases such as "the computer understands English".  In all such cases, the "English" is just the restricted subset of  English which is allowable as input for the computer program under discussion. In addition, for purposes of this report I have adopted the following operational definition of understanding.  A computer "understands" a subset of English if it accepts input sentences which are members of this subset, and answers questions based on information centered in the input. The STUDENT system understands English in this sense. Extract: BASEBALL
    Baseball is a question-answering  system designed and programmed at Lincoln Laboratories by  Green, Wolf, Chomsky and Laughery (19).    It  is a data base system  in which the data  is placed in memory in a prestructered tree format.    The data consists of the dates,   location, opposing teams and scores of some American League baseball games.    Only questions to the system can be given in English, not the data.
    Questions mast be simple sentences, with no relative clauses,  logical or coordinate connectives. With these restrictions, the program will accept any question couched in words contained in a vocabulary list quite adequate for asking questions about baseball statistics.    In addition,  the parsing routine,  based on techniques developed by Harris (21) , must find a parsing for the question.
    The questions must pertain to statistics about baseball games  found in the information store.    One cannot    ask questions about extrema,  such as "Highest" score or "fewest" number of games won.    The parsed question is transformed into a standard specification (or spec)  list and the question-answering routine utilizes this canonical form for the meaning of the question. For example, the question  "Who beat the Yankees on July 4th?" would be transformed into the "spec list":
    Team (Losing)= New York
    Team (winning) = ?
    Date      = July
    Because Baseball does not utilize English for data input, we cannot talk about deductions made from information implicit in several sentences.    However, Baseball can perform operations such as counting (the number of games played by Boston, for example)  and thus  in the sense that it is utilizing  several separate data units in its store,   it is performing deductions.
    Baseball's abilities can only be extended by extensive re-programming,  though the techniques utilized have some general applicability.    Because the parsing program has a very complete grammar, and the vocabulary list  is quite comprehensive for the problem domain, the user needs no knowledge of the internal structure of the Baseball program.    No provision for interaction with the user was made.
    Extract: Advice-taker
    McCarthy's Advice-taker, though not designed to accept English input, would make an excellent base for a question-answering system. Fischer Black has programmed a system which can do all of McCarthy's Advice-Taker problems, and can be adapted to accept a very limited subset of English. The deductive system in Black's program is equtvalent to the propositional calculus. Extract: Phillips QAS
    One of the earliest question-answering systems was written in 1960 at MIT by Anthony Phillip. It is a data base system which accepts sentences which can be parsed by a very simple context-free phrase structure grammar, of the type defined by Chomsky. Additional syntactic restrictions require that each word must be in only one grammatical class, and that a sentence has exactly one parsing.
    A parsed sentence is transformed into a list of five elements, the subject, verb, object, time phrase, and place phrase in the sentence. All other, information in the sentence in disregarded. Questions are answered by matching the list from the transformed question against the list for each input sentence. When a match is found, the corresponding sentence is given as an answer.
    Phillips' system has no deductive ability and adding new abilities would require reprogramming the system. A questioner must be aware that the system utilizes a matching process which does not recognize synonyms, and therefore the sentence "The teacher eats lunch at noon." will not be recognized as an answer to the question "What does the teacher do at twelve o'clock?" When Phillips' system cannot find an answer, it reports only "(THE ORACLE DOES NOT KNOW)". It provides for no further interaction with the user.
  • Simmons, R.F., "Answering English Questions by Computer - A Survey" SDC Report SP-1536 Santa Monica, Calif.; April, 1964 view details
  • Simmons, R. F. "Answering English questions by computer: a survey" p53-70 view details Abstract: Fifteen experimental English language question-answering systems which are programmed and operating are described and reviewed. The systems range from a conversation machines to programs which make sentences about pictures and systems which translate from English into logical calculi. Systems are classified as list-structured data-based, graphic data-based, text-based and inferential. Principles and methods of operations are detailed and discussed.

    It is concluded that the data-base question-answerer has passed from initial research into the early developmental phase. The most difficult and important research questions for the advancement of general-purpose language processors are seen to be concerned with measuring meaning, dealing with ambiguities, translating into formal languages and searching large tree structures. DOI Extract: The Oracle
    The Oracle.
    As a Master's thesis under John McCarthy, then at MIT, A. V. Phillips programmed an experimental system to answer questions from simple English sentences (1960). Its mode of operation is to produce a syntactic analysis of both the question and of a corpus of text which may contain an answer. This analysis transforms both the question and the sentence into a canonical form which shows the subject, the verb, the object, and nouns of place and time. The system was written in Lisp which simplified the programming task.
    Its principle of operation can be appreciated by following the example in Figure 1. The example sentence is analyzed into subject, verb and (essentially) object. The analysis is limited to simple sentences and breaks down if the sentence has two or more subjects or objects. The first, stage of analysis is to look up each word in a small dictionary to discover its word class assignment. At this point such words as school, park, morning, etc., are also coded as time or place nouns. During the analysis the question is transformed into declarative order and auxiliary verbs are combined with their head verbs so that both question and potential answering statement are in the canonical form, subject-verb-object, as shown in Figure 1.

    A comparison is then made to determine if the elements of the sentence match those of the question. In the example all three elements match and the program would print out "to school" followed by the entire sentence. Had the input been a complete question, i.e., "Did the teacher go to school?" the Oracle would have modified its behavior to respond "Yes."
    As an early question answerer, the Oracle is a competent example of the principle of answering questions by structural matching of syntactic-semantic codes. Within the range of very simple English structures the method is uncomplicated and easily achievable. The principle of double coding--for syntactic and semantic word class---will be scent to generalize to much more complicated structures than Oracle used.
    The conversation machine and the Oracle are two prototypes of question-answerers, which even in 1959 demonstrated that {f statements could be coded semantically and syntactically they could be matched to discover how closely they resembled each other. For the conversation machine the match was against a coded data base and the selection of a reply to a remark was a function of the type of correspondence between the remark after coding and the program's coded knowledge. For the Oracle the comparison was between an English question and an English sentence, both of which were inputs.

          in [ACM] CACM 8(01) Jan 1965 view details