Intermediate language for the ALA (Automatic Language Analyser) system at Indiana University 1960-63

  • Simmons, R.F., "Answering English Questions by Computer - A Survey" SDC Report SP-1536 Santa Monica, Calif.; April, 1964 view details
  • Simmons, R. F. "Answering English questions by computer: a survey" p53-70 view details Abstract: Fifteen experimental English language question-answering systems which are programmed and operating are described and reviewed. The systems range from a conversation machines to programs which make sentences about pictures and systems which translate from English into logical calculi. Systems are classified as list-structured data-based, graphic data-based, text-based and inferential. Principles and methods of operations are detailed and discussed.

    It is concluded that the data-base question-answerer has passed from initial research into the early developmental phase. The most difficult and important research questions for the advancement of general-purpose language processors are seen to be concerned with measuring meaning, dealing with ambiguities, translating into formal languages and searching large tree structures. DOI Extract: FLEX and ALA
    The Automatic Language Analyzer (ALA).
    From Indiana University a series of quarterly reports by Householder et al. (1960-62) and a final technical report by Thorne (1962) describe the progress toward completion of a rather complicated automatic language analysis system) This system is designed to handle the breadth and complexity of language found in a book on astronomy. As a question-answering system, it introduces a variation of the principle of translating from English into an intermediate, language which bears a strong relationship to dependency structure. When translated to the intermediate language, FLEX, the question or text is also augmented by semantic codes obtained from Roget's Thesaurus--or from a specially constructed thesaurus. The degree of matching between question and text is then computed to select, a best answer.
    The primary information store for the ALA is a pre-analyzed set of sentences stored on tape. The preanalysis includes assignment of FLEX codes and of thesaurus references. The thesaurus is a list of clusters each of which indexes the portions of the text in which members of the cluster appear. A dictionary of word-stems and phrases provides cross-references to clusters in which the word appears. The sequence of operations is that the question is first analyzed and assigned FLEX and thesaurus codes, then sentences are selected and matched, and finally the paragraphs that contain supposed answering sentences are printed out with their scores.
    The transformation of English into the FLEX language is begun by looking up each word in a dictionary to assign ordinary syntactic word classes. At this point a great deal of effort is spent to resolve word-class ambiguity by use of special routines which use additional cues available in the sentence. The next phase is to order the words into clauses and phrases and to cheek the accuracy of tMs ordering. The breaking into clauses is accomplished by the use of marker words such as verbs and absolute markers such as because, how, if, what, when, etc. When the sentence has been analyzed into subject, verb and their qualifiers, the translation into FLEX is accomplished as shown below.

    The old manatestale food reluctantly.

    The notation is to be read, "S1 means subject, S2 is the first qualifier, P1 means the verb, and P2 . . . . Pn refer to verb modifiers." The importance to the sentence of each FLEX symbol is rated separately for subject and predicate in order of the numbers assigned. Thus an S1 or a P1 are most heavily weighted in the later comparison process. Each word also carries a semantic coding. This code is simply a list of the thesaurus clusters in which it is found.
    Although programming of this system is apparently not yet completed, and it may be claimed that the FLEX transformation leaves much to be desired as an intermediate language, the ALA is unquestionably one of the more ambitious and sophisticated systems so far described. At this stage of experimentation it is worth wondering how well the semantic correlations will in general correspond to meaning matches between statements. In any ease it is a For other associative scoring techniques see (Doyle 1963). clearly formulated realization of what has hitherto been a rather vague idea that a thesaurus may be helpful in question answering.
