SAD SAM(ID:3359/sad001)

Basic English querying system 


for Sentence Appraiser and Diagrammer and Semantic Analyzing Machine

Lindsay's natural language programming system written in IPL-V at Carnegie in 1962/3. Worked only on Basic English (ref Ogden) and was chiefly concerned with representing genealogy.



Related languages
IPL-V => SAD SAM   Written using

References:
  • Lindsay, Robert K. "Inferential memory as the basis of machines which understand natural language" view details
          in Feigenbaum, E. and Feldman, J. (eds.) "Computers and Thought" MIT Press, Cambridge, MA, 1963 view details
  • Bobrow, D.G. "Natural Language Input for a Computer Problem Solving System", Report MAC-TR-1, Project MAC, M.I.T., Cambridge, Mass., June 1964 view details External link: Online copy pdf ps Abstract: The STUDENT problem solving system, programmed in LISP, accepts as input a comfortable but restricted subset of English which can express a wide variety of algebra story problems. STUDENT finds the solution to a large class of these problems. STUDENT can utilize a store of global information not specific to any one problem, and may make assumptions about the interpretation of ambiguities in the wording of the problem being solved. If it uses such information, or makes any assumptions, STUDENT communicates this fact to the user. The thesis includes a summary of other English language question-answering systems. All these systems, and STUDENT are evaluated according to four standard criteria. The linguistic analysis in STUDENT is a first approximation to the analytic portion of a semantic theory of discourse outlined in the thesis. STUDENT finds the set of kernel sentences which are the base of the input discourse, and transforms this sequence of kernel sentences into a set of simultaneous equations which form the semantic base of the Student system. STUDENT then tries to solve this set of equations for the values of requested unknowns. If it is successful it gives the answers in English. If not, STUDENT asks the user for more information, and indicates the nature of the desired information. The STUDENT system is a first step toward natural language communication with computers. Further work on the semantic theory proposed should result in much more sophisticated systems. Extract: Introduction
    Introduction
    The aim of the research reported here was to discover how one could build a computer program which could communicate with people in a natural language within some restricted problem domain. In the coarse of this investigation, I wrote a set of computer pro-grams, the STUDENT system, which accepts as input a comfortable but restricted subset of English which can be used to express, a wide variety of algebra story problems. The problems shown in Figure 1 illustrate some of the communication and problem solving capabilities of this system.
    In the following discussion, I shall use phrases such as "the computer understands English".  In all such cases, the "English" is just the restricted subset of  English which is allowable as input for the computer program under discussion. In addition, for purposes of this report I have adopted the following operational definition of understanding.  A computer "understands" a subset of English if it accepts input sentences which are members of this subset, and answers questions based on information centered in the input. The STUDENT system understands English in this sense. Extract: SAD SAM

    4) Lindsay. While at the Carnegie Institute of Technology, Robert Lindsay (28) programmed the SAD SAM question-answering system. The input to the system is a set of sentences in Basic English, a subset of English devised by C.K. Ogden (35), which has a vocabulary of about 1500 words and a simple subset of the full English grammar. The SAD part (Syntactic Appraiser and Diagrammer) of SAD SAM parses the sentence using a predictive analysis scheme. The Semantic Analyzing Machine (SAM) extracts from these parsed sentences information about the family relationships of people mentioned;  it stores this information on a computer representation of the family tree, and ignores all other information in the sentence. For example, from the parsing of "Tom, Mary's brother, went to the store." Lindsay's program would extract the sibling relationship of Tom and Mary, place them on the family tree as descendants of the same mother and father, and ignore the information about where Tom went.
    The information storage structure utilized by SAD SAM, namely, the family tree, facilitates deductions from information implicit in many sentences. Because a family relationship is defined in terms of the relative position (no pun intended) of two people in their family tree, computation of the relationship, is independent of the number of sentences required to place in the tree, the path between the individuals.
    Extending the abilities of the SAD SAM system would require reprogramming. No provision is made for interaction with the user. No internal knowledge of the program structure is necessary if the user restricts his queries to questions of family relationships, and his language to Basic English. Extract: BASEBALL
    Baseball is a question-answering  system designed and programmed at Lincoln Laboratories by  Green, Wolf, Chomsky and Laughery (19).    It  is a data base system  in which the data  is placed in memory in a prestructered tree format.    The data consists of the dates,   location, opposing teams and scores of some American League baseball games.    Only questions to the system can be given in English, not the data.
    Questions mast be simple sentences, with no relative clauses,  logical or coordinate connectives. With these restrictions, the program will accept any question couched in words contained in a vocabulary list quite adequate for asking questions about baseball statistics.    In addition,  the parsing routine,  based on techniques developed by Harris (21) , must find a parsing for the question.
    The questions must pertain to statistics about baseball games  found in the information store.    One cannot    ask questions about extrema,  such as "Highest" score or "fewest" number of games won.    The parsed question is transformed into a standard specification (or spec)  list and the question-answering routine utilizes this canonical form for the meaning of the question. For example, the question  "Who beat the Yankees on July 4th?" would be transformed into the "spec list":
    Team (Losing)= New York
    Team (winning) = ?
    Date      = July
    Because Baseball does not utilize English for data input, we cannot talk about deductions made from information implicit in several sentences.    However, Baseball can perform operations such as counting (the number of games played by Boston, for example)  and thus  in the sense that it is utilizing  several separate data units in its store,   it is performing deductions.
    Baseball's abilities can only be extended by extensive re-programming,  though the techniques utilized have some general applicability.    Because the parsing program has a very complete grammar, and the vocabulary list  is quite comprehensive for the problem domain, the user needs no knowledge of the internal structure of the Baseball program.    No provision for interaction with the user was made.
    Extract: Advice-taker
    McCarthy's Advice-taker, though not designed to accept English input, would make an excellent base for a question-answering system. Fischer Black has programmed a system which can do all of McCarthy's Advice-Taker problems, and can be adapted to accept a very limited subset of English. The deductive system in Black's program is equtvalent to the propositional calculus.
          in Feigenbaum, E. and Feldman, J. (eds.) "Computers and Thought" MIT Press, Cambridge, MA, 1963 view details
  • Simmons, R.F., "Answering English Questions by Computer - A Survey" SDC Report SP-1536 Santa Monica, Calif.; April, 1964 view details
          in Feigenbaum, E. and Feldman, J. (eds.) "Computers and Thought" MIT Press, Cambridge, MA, 1963 view details
  • Simmons, R. F. "Answering English questions by computer: a survey" p53-70 view details Abstract: Fifteen experimental English language question-answering systems which are programmed and operating are described and reviewed. The systems range from a conversation machines to programs which make sentences about pictures and systems which translate from English into logical calculi. Systems are classified as list-structured data-based, graphic data-based, text-based and inferential. Principles and methods of operations are detailed and discussed.

    It is concluded that the data-base question-answerer has passed from initial research into the early developmental phase. The most difficult and important research questions for the advancement of general-purpose language processors are seen to be concerned with measuring meaning, dealing with ambiguities, translating into formal languages and searching large tree structures. DOI Extract: SAD SAM
    SAD SAM.
    This acronym stands for Sentence Appraiser and Diagrammer and Semantic Analyzing Machine. It was programmed in IPL-V by R. Lindsay as part of a dissertation at Carnegie Institute of Technology. SAD SAM is divided into two parts, a parsing section and a section for handling meanings. The system is designed to accept, simple sentences limited to a Basic English vocabulary concerning family relationships. The data base is in the form of a family tree represented in the program by a hierarchical set of lists. As a sentence is read, it is parsed and the information that a person bears a relationship of brother, mother, father, etc., to someone else is extracted, and the name so represented is appended to the appropriate lists or branches of the family tree.

    The parsing system is an independent program which uses a form of the predictive-analysis teelmiques which have been described in detail by Oettinger and Kuno (1963). Although it was designed for relatively simple structures, Lindsay reports that it can handle relative clauses and at least some appositional strings. As a result of the parsing, the input to the semantic analysis program is (1) a sentence whose parts are labelled noun, verb, noun phrase, etc., and (2) a tree structure showing the relationships among these grammatical features.

    The semantic analyzer searches for subject-complement combinations which are connected by the verb "to be" and cross-references these to indicate that each is equivalent to the other. Words which modify such equivalent words are then grouped together. The vocabulary of Basic English provides only eight words to characterize kinship relations so these are then sought in the sentence. Thus, for the sentence,
    John's father, Bill, is Mary's father.
    The term, "John's father" would be set equivalent to the complement, "Mary's father." The two kinship terms would be recognized and the proper names which modify them would then be discovered. The word "Bill" modifies the subject and since subject and object are equivalent it also modifies the object. Triplets are constructed to show the relationship between each pair of names as follows:
    Bill(father)John
    Bill(father)Mary

    These relationships are added to the family tree which then has the following structure:
    Family unit (name)
    (Attribute)(Value)
    HusbandBill
    WifeUnknown
    OffspringJohn, Mary
    Husband's parentsUnknown
    Wife's parentsUnknown

    Since this data, structure is in the form of IPL lists, instead of actual names of family members, pointers may be used to indicate the location of a different family list which contains the names. The result is an interlocking data structure which allows a fairly significant level of inference.

    In the example above, since John and Mary are the offspring of a common parent, it is known that they are siblings. It a following sentence states that Jane is Bill's wife, it  will be immediately known that Jane is the mother of John and Mary (since no multiple marriages are permitted).

    Lindsay's primary interest was in machine comprehension of English and he attempted to show that an important component of understanding lay in building large coordinated data structures from the text which was read. He found it necessary to use a syntactic analysis to discover relationships between the words which his program was able to understand and then to transform the portions of the sentence which were understood into a form which could map onto his data structure.
          in [ACM] CACM 8(01) Jan 1965 view details
  • Sammet, Jean E. "Computer Languages - Principles and History" Englewood Cliffs, N.J. Prentice-Hall 1969. p.669 view details
          in [ACM] CACM 8(01) Jan 1965 view details