The Oracle(ID:8403/)Early QA systemEarly quesion-answering system by Antony Phillips, 1960 MIT Famous for its fail message "(THE ORACLE DOES NOT KNOW)". References: Introduction The aim of the research reported here was to discover how one could build a computer program which could communicate with people in a natural language within some restricted problem domain. In the coarse of this investigation, I wrote a set of computer pro-grams, the STUDENT system, which accepts as input a comfortable but restricted subset of English which can be used to express, a wide variety of algebra story problems. The problems shown in Figure 1 illustrate some of the communication and problem solving capabilities of this system. In the following discussion, I shall use phrases such as "the computer understands English". In all such cases, the "English" is just the restricted subset of English which is allowable as input for the computer program under discussion. In addition, for purposes of this report I have adopted the following operational definition of understanding. A computer "understands" a subset of English if it accepts input sentences which are members of this subset, and answers questions based on information centered in the input. The STUDENT system understands English in this sense. Extract: BASEBALL Baseball is a question-answering system designed and programmed at Lincoln Laboratories by Green, Wolf, Chomsky and Laughery (19). It is a data base system in which the data is placed in memory in a prestructered tree format. The data consists of the dates, location, opposing teams and scores of some American League baseball games. Only questions to the system can be given in English, not the data. Questions mast be simple sentences, with no relative clauses, logical or coordinate connectives. With these restrictions, the program will accept any question couched in words contained in a vocabulary list quite adequate for asking questions about baseball statistics. In addition, the parsing routine, based on techniques developed by Harris (21) , must find a parsing for the question. The questions must pertain to statistics about baseball games found in the information store. One cannot ask questions about extrema, such as "Highest" score or "fewest" number of games won. The parsed question is transformed into a standard specification (or spec) list and the question-answering routine utilizes this canonical form for the meaning of the question. For example, the question "Who beat the Yankees on July 4th?" would be transformed into the "spec list": Team (Losing)= New York Team (winning) = ? Date = July Because Baseball does not utilize English for data input, we cannot talk about deductions made from information implicit in several sentences. However, Baseball can perform operations such as counting (the number of games played by Boston, for example) and thus in the sense that it is utilizing several separate data units in its store, it is performing deductions. Baseball's abilities can only be extended by extensive re-programming, though the techniques utilized have some general applicability. Because the parsing program has a very complete grammar, and the vocabulary list is quite comprehensive for the problem domain, the user needs no knowledge of the internal structure of the Baseball program. No provision for interaction with the user was made. Extract: Advice-taker McCarthy's Advice-taker, though not designed to accept English input, would make an excellent base for a question-answering system. Fischer Black has programmed a system which can do all of McCarthy's Advice-Taker problems, and can be adapted to accept a very limited subset of English. The deductive system in Black's program is equtvalent to the propositional calculus. Extract: Phillips QAS One of the earliest question-answering systems was written in 1960 at MIT by Anthony Phillip. It is a data base system which accepts sentences which can be parsed by a very simple context-free phrase structure grammar, of the type defined by Chomsky. Additional syntactic restrictions require that each word must be in only one grammatical class, and that a sentence has exactly one parsing. A parsed sentence is transformed into a list of five elements, the subject, verb, object, time phrase, and place phrase in the sentence. All other, information in the sentence in disregarded. Questions are answered by matching the list from the transformed question against the list for each input sentence. When a match is found, the corresponding sentence is given as an answer. Phillips' system has no deductive ability and adding new abilities would require reprogramming the system. A questioner must be aware that the system utilizes a matching process which does not recognize synonyms, and therefore the sentence "The teacher eats lunch at noon." will not be recognized as an answer to the question "What does the teacher do at twelve o'clock?" When Phillips' system cannot find an answer, it reports only "(THE ORACLE DOES NOT KNOW)". It provides for no further interaction with the user. It is concluded that the data-base question-answerer has passed from initial research into the early developmental phase. The most difficult and important research questions for the advancement of general-purpose language processors are seen to be concerned with measuring meaning, dealing with ambiguities, translating into formal languages and searching large tree structures. DOI Extract: The Oracle The Oracle. As a Master's thesis under John McCarthy, then at MIT, A. V. Phillips programmed an experimental system to answer questions from simple English sentences (1960). Its mode of operation is to produce a syntactic analysis of both the question and of a corpus of text which may contain an answer. This analysis transforms both the question and the sentence into a canonical form which shows the subject, the verb, the object, and nouns of place and time. The system was written in Lisp which simplified the programming task. Its principle of operation can be appreciated by following the example in Figure 1. The example sentence is analyzed into subject, verb and (essentially) object. The analysis is limited to simple sentences and breaks down if the sentence has two or more subjects or objects. The first, stage of analysis is to look up each word in a small dictionary to discover its word class assignment. At this point such words as school, park, morning, etc., are also coded as time or place nouns. During the analysis the question is transformed into declarative order and auxiliary verbs are combined with their head verbs so that both question and potential answering statement are in the canonical form, subject-verb-object, as shown in Figure 1. A comparison is then made to determine if the elements of the sentence match those of the question. In the example all three elements match and the program would print out "to school" followed by the entire sentence. Had the input been a complete question, i.e., "Did the teacher go to school?" the Oracle would have modified its behavior to respond "Yes." As an early question answerer, the Oracle is a competent example of the principle of answering questions by structural matching of syntactic-semantic codes. Within the range of very simple English structures the method is uncomplicated and easily achievable. The principle of double coding--for syntactic and semantic word class---will be scent to generalize to much more complicated structures than Oracle used. The conversation machine and the Oracle are two prototypes of question-answerers, which even in 1959 demonstrated that {f statements could be coded semantically and syntactically they could be matched to discover how closely they resembled each other. For the conversation machine the match was against a coded data base and the selection of a reply to a remark was a function of the type of correspondence between the remark after coding and the program's coded knowledge. For the Oracle the comparison was between an English question and an English sentence, both of which were inputs. in [ACM] CACM 8(01) Jan 1965 view details |