STUDENT(ID:6493/stu002)

Country: us
- languages for us
- us/1964
Began: 1964
Type:Ad hoc query languages
- Ad hoc query languages

D.G. Bobrow 1964. Early query system. Sammet 1969, p.664.

CARPS

Improvement

References:

Bobrow, D.G. "Natural Language Input for a Computer Problem Solving System", Report MAC-TR-1, Project MAC, M.I.T., Cambridge, Mass., June 1964 view details External link: Online copy pdf ps Abstract: The STUDENT problem solving system, programmed in LISP, accepts as input a comfortable but restricted subset of English which can express a wide variety of algebra story problems. STUDENT finds the solution to a large class of these problems. STUDENT can utilize a store of global information not specific to any one problem, and may make assumptions about the interpretation of ambiguities in the wording of the problem being solved. If it uses such information, or makes any assumptions, STUDENT communicates this fact to the user. The thesis includes a summary of other English language question-answering systems. All these systems, and STUDENT are evaluated according to four standard criteria. The linguistic analysis in STUDENT is a first approximation to the analytic portion of a semantic theory of discourse outlined in the thesis. STUDENT finds the set of kernel sentences which are the base of the input discourse, and transforms this sequence of kernel sentences into a set of simultaneous equations which form the semantic base of the Student system. STUDENT then tries to solve this set of equations for the values of requested unknowns. If it is successful it gives the answers in English. If not, STUDENT asks the user for more information, and indicates the nature of the desired information. The STUDENT system is a first step toward natural language communication with computers. Further work on the semantic theory proposed should result in much more sophisticated systems. Extract: Introduction
Introduction
The aim of the research reported here was to discover how one could build a computer program which could communicate with people in a natural language within some restricted problem domain. In the coarse of this investigation, I wrote a set of computer pro-grams, the STUDENT system, which accepts as input a comfortable but restricted subset of English which can be used to express, a wide variety of algebra story problems. The problems shown in Figure 1 illustrate some of the communication and problem solving capabilities of this system.
In the following discussion, I shall use phrases such as "the computer understands English".  In all such cases, the "English" is just the restricted subset of  English which is allowable as input for the computer program under discussion. In addition, for purposes of this report I have adopted the following operational definition of understanding.  A computer "understands" a subset of English if it accepts input sentences which are members of this subset, and answers questions based on information centered in the input. The STUDENT system understands English in this sense. Extract: BASEBALL
Baseball is a question-answering  system designed and programmed at Lincoln Laboratories by  Green, Wolf, Chomsky and Laughery (19).    It  is a data base system  in which the data  is placed in memory in a prestructered tree format.    The data consists of the dates,   location, opposing teams and scores of some American League baseball games.    Only questions to the system can be given in English, not the data.
Questions mast be simple sentences, with no relative clauses,  logical or coordinate connectives. With these restrictions, the program will accept any question couched in words contained in a vocabulary list quite adequate for asking questions about baseball statistics.    In addition,  the parsing routine,  based on techniques developed by Harris (21) , must find a parsing for the question.
The questions must pertain to statistics about baseball games  found in the information store.    One cannot    ask questions about extrema,  such as "Highest" score or "fewest" number of games won.    The parsed question is transformed into a standard specification (or spec)  list and the question-answering routine utilizes this canonical form for the meaning of the question. For example, the question  "Who beat the Yankees on July 4th?" would be transformed into the "spec list":
Team (Losing)= New York
Team (winning) = ?
Date      = July
Because Baseball does not utilize English for data input, we cannot talk about deductions made from information implicit in several sentences.    However, Baseball can perform operations such as counting (the number of games played by Boston, for example)  and thus  in the sense that it is utilizing  several separate data units in its store,   it is performing deductions.
Baseball's abilities can only be extended by extensive re-programming,  though the techniques utilized have some general applicability.    Because the parsing program has a very complete grammar, and the vocabulary list  is quite comprehensive for the problem domain, the user needs no knowledge of the internal structure of the Baseball program.    No provision for interaction with the user was made.
Extract: Advice-taker
McCarthy's Advice-taker, though not designed to accept English input, would make an excellent base for a question-answering system. Fischer Black has programmed a system which can do all of McCarthy's Advice-Taker problems, and can be adapted to accept a very limited subset of English. The deductive system in Black's program is equtvalent to the propositional calculus.

Simmons, R. F. "Storage and retrieval of aspects of meaning in directed graph structures" view details Extract: Introduction
Introduction
Behind the development of every new computer language there lies a set of problems and a set of programming structures with whose aid the problems can be managed. With Fortran the problem was to solve algebraic equations without the need for a great deal of I/O bookkeeping and without concern for detailed computer-word-packing statements. Behind JovrAL lay the command-control problem, which customarily dealt with complex data structures and the need to use every bit of computer memory efficiently. IPL grew in response to a need to use associative list structures of nonnumeric symbolic data in a computer. Lisp answered the need for a high-level functional language to handle recursive algebraic and symbolic structures. Comit was the machine translator's approach to handling natural language strings.
In developing a special concept dictionary for storing and retrieving the meanings of words and phrases of English, the authors have found it desirable to use a complex network whose nodes are interrelated in several ways. Basically, the idea is that of a dictionary of English words in which each word has associated with it word-class information and lists of attribute-value pairs that define various aspects of its meaning and usage in terms of pointers to other words in the dictionary. In a data structure language such as Jovial, in addition to ordinary table structures several additional levels of associative coding are required to give easy access to the data without excessive costs in either space or processing time.
Because of the many levels of associative linking required the authors decided to use Lisp, at least for early experimental work with the system. Advantages of Lisp extended beyond the ease of producing complex data structures; they also included the simplicity of producing complex, often recursive functions for maintaining and querying the dictionaiy. An additional advantage is gained in the fact that although Lisp is primarily an interpretive system it does allow for the compiling of a fast-running completed program. The most serious disadvantage of Lisp for our system is that in present versions1 it is limited to core memory for most uses. This limitation means that a dictionary of the type we are studying could not exceed two or three hundred words.
Since we are aiming for an eventual vocabulary of from five to fifty thousand words, the limitation to core memory is intolerable. Either an expansion of Lisp will be required or the writing of a special language using auxiliary memory for handling cyclical structures in large complex networks will grow out of our experiments with the conceptual dictionary.
Extract: The Problem
The Problem
The major shortcoming of all existing retrieval systems is their inability to handle anything vaguely resembling the meaning of words taken singly, let alone the meaning of the language strings that they comprise. Synonym dictionaries and thesauri have often been added but have proved but feeble makeshifts offering little improvement over the use of root forms of words alone. To the extent that automatic syntactic analysis has been available it has only emphasized the need for word and phrase meanings.
Five years of Synthex research toward the development of question-answering systems based on natural language text have confirmed this inability to deal with meanings. In these five years many approaches have been attempted toward representing some aspects related to the meaning of words. Most have been unsuccessful. It was learned early that the use of a synonym dictionary did not greatly improve our understanding of text in response to questions. More recently it was realized that even a well-coded thesaurus was not an answer. At various times attempts were made to save syntactic contexts associated with words as possible representations of meanings; these too, although promising, did not appear to be a reasonable answer. With more recent research, particularly that of Bobrow [1964], Raphael [1964], Quillian [1965], and Thompson [1964], it has become apparent that, in addition to dictionary-type meanings, there is a need for something that can best be characterized as a knowledge of the world (e.g., Cows eat grass. Walls are vertical. Grass doesn't eat., etc.). Without something representing knowledge of the world it can hardly be hoped that a word or sentence can be understood.
The consequence of this line of thought is the realization that the problem requires the development of a conceptual dictionary that would contain definitional material, associative material, and some representation of knowledge of the world. These three aspects of a word's meaning seem to be the minimum that will allow for enough understanding of English to make question answering even a reasonable probability.
Extract: The Conceptual Dictionary
The Conceptual Dictionary
In a conceptual dictionary each word would be characterized by (a) a set of class memberships, (b) a set of attributes, (c) a set of associations, and (d) a set of active and passive actions.
The set of class memberships includes statements of the form the/an X(noun) is a/an F(noun). Thus, "an aard-vark is an animal" or "an aardvark is a mammal" are both examples of statements giving rise to class membership characteristics. For many nouns the class membership set is one of the basic definitional aspects of the word.
Attributes characterizing a word are such that if y is an attribute of x, then "x is y" is a true statement and the string "the yx" is grammatical. Thus if "scaly" is an attribute of "aardvark," then "an aardvark is scaly" is true and "the scaly aardvark" is grammatical. Associates of a word are in a loose part-whole relationship. If x has a y, then y is an associate of x\ thus "John has a wallet" and "John has a nose" provide the two associates, "nose" and "wallet," for John.
The set of actions characterizing a word are derived from the verbs and their complements or objects that appear in context with the word. The two sentences "Natives eat aardvarks" and "Aardvarks eat ants" provide "eaten by natives" and "eat ants" as passive and active actions related to aardvarks.
The idea underlying this schema is that the meaning of a word can be conceptualized in bands of closeness of relation. The class membership of an object is so closely related as to be an integral part of it or its perception. Attributes and things closely associated with a word are seen as important but less essential, while the actions associating one word and another may range from important to irrelevant. Extract: Toward an Operational System
Toward an Operational System
The conceptual dictionary briefly described above exists now as an experimental Lisp program in. 47k of core memory in the ARPA-SDC Q-32 Time-Sharing system. It is currently limited to a dictionary of 200-300 words and a relatively small set of program functions. What is required of even an early operational system is the ability to handle from 20 to 50 thousand words and a rather large set of functions for dealing with questions of increasing difficulty. Our expectations are that Lisp will be expanded into a system that uses up to four million words of disk to augment core and that we will be able to pay an increased cost of response time in favor of continuing to use Lisp for a large operational system. If that cost is prohibitive it will be necessary to produce a system tailored to the needs of the conceptual dictionary, and that will be able to use auxiliary memory efficiently to deal with a very large network of complexly linked words.
Although it is our belief that new problems create the need for new languages, it is apparent that existing languages are largely sufficient for our language processing problems, but in many cases, especially among the list-oriented languages, they simply have not geared themselves to the large amounts of data and data processing required in this special field.
Extract: Acknowledgments
Acknowledgments. I wish to acknowledge my debt to the twenty or so people who have studied question-answering systems over the past decade. Their work is reviewed elsewhere [Simmons 1965]. Headers knowing the Quillian system will recognize that the result of the author's three years of acquaintance with Quillian was the appropriation wholeheartedly of his ideas insofar as the author was able to understand them. A special debt is also expressed to Fred Thompson for leading the author to an understanding of parsing directly into a data structure and for acquainting him with TEMPO'S forthcoming language system of associative cycles. Programming and detail design of the conceptual dictionary described in this paper were accomplished by John Burger.
Extract: Discussion
Discussion
Salton opened the discussion with the comment that a system such as this is inherently not extendable. The system will operate nicely with various kinds of "fish," but will run into trouble when "whales" appear, since the system will not know how to deal with aquatic mammals. He compared the system to the "Baseball" system, in which one cannot go beyond a limited range of questions, e.g., one has trouble if a new team appears. Young objected to this view, saying he believed the two systems were quite different, and that the present system was in principle infinitely extendable and very general. He said he could conceive of a system one level above this one which could deal with generalizations and which would make handling propositions easier than with the present special programming.
Burger said that work with higher level relationships was planned, but that the immediate extensions would be more trivial, in the way of inserting a great deal of knowledge about the world, of the kind any child has (e.g., "Walls are vertical.").
Gora observed that the present system is based upon two forms of the verb "is" and the verb "has," and that more relationships were needed. Burger said the system was not in principle so limited. Gorn then asked how they would deal with the statement " 'Word' is a word." Burger had no immediate answer, though he thought they would eventually be able to deal with such cases.
Responding to a question from Cheydleur, Burger said they had about 50 different functions.
Mooers asked if there were any inherent features of Lisp which limited their work. Burger said that a fundamental limitation was the limitation to use of core storage alone in the SDC version of Lisp. They could not use disks. The second limitation was the inability to break out individual characters from the "atoms" of Lisp. This prevents au easy and direct way of treating the similarity between "dog" and "dogs." At present this relationship has to be put in as a separate piece of information.
Mitchell then mentioned that for a very much larger data base a limitation will be the time required to pass the dictionaries through the machine. He said one of the big improvements in speed due to syntax-directed compilers was that they had less need to refer to a very large dictionary. Burger admitted that this was a very important problem, and one which is being considered. What can be done outside of Lisp is being studied, since a problem in using an auxiliary memory with a list-structure system like Lisp is that interrelationships between lists are broken up if just one section is brought from the auxiliary memory. So far, a good solution to the problem has not been found.
Abstract: An experimental system that uses LISP to make a conceptual dictionary is described. The dictionary associates with each English word the syntactic information, definitional material, and references to the contexts in which it has been used to define other words. Such relations as class inclusion, possession, and active or passive actions are used as definitional material. The resulting structure serves as a powerful vehicle for research on the logic of question answering. Examples of methods of inputting information and answering simple English questions are given. An important conclusion is that, although LISP and other list processing languages are ideally suited for producing complex associative structures, they are inadequate vehicles for language processing on any large scale—at least until they can use auxiliary memory as a continuous extension of core memory.

in [ACM] CACM 9(03) March 1966 includes proceedings of the ACM Programming Languages and Pragmatics Conference, San Dimas, California, August 1965 view details

Charniak, E. , "CARPS, A Program Which Solves Calculus Word Problems", Report MAC-TR-51, Project MAC, M.I.T., Cambridge, Mass., July 1968 view details External link: Online copy pdf Abstract: A program was written to solve calculus word problems. The program CARPS (Calculus Rate Problem Solver), is restricted to rate problems. The overall plan of the program is similar to Bobrow's STUDENT , the primary difference being the introduction ps
in [ACM] CACM 9(03) March 1966 includes proceedings of the ACM Programming Languages and Pragmatics Conference, San Dimas, California, August 1965 view details

Charniak, Eugene "Computer Solution of Calculus Word Problems" view details Extract: Introduction
Introduction
The research described in this paper had as its goal the creation of a program which solves freshman calculus word problems.  The program, CARPS (CAlculus Rate Problem Solver), is restricted to rate problems.  It is described in greater detail in MAC-TR-51 (Thesis). CARPS was primarily motivated by Bobrow's work on STUDENT? a program which solves high school algebra word problems.  An understanding of STUDENT is sufficiently important to our work that we shall analyze Bobrow's program in the second section of the paper.
CARPS is written in two languages. The bulk of the coding is in LISP. There are, however, large sections which require a great deal of pattern matching, something in which LISP is not particularly powerful.  These sections were written in CONVERT, a language especially designed for pattern matching.  Since CONVERT is embedded in LISP it was an especially convenient choice because we could easily switch back and forth between the two languages.  Both of these languages were available on the Project MAC PDP-6 time sharing system which was used in this research. Also available were J. Moses' algebraic simplification and differentiation routines which are described in [Moses].  The PDP-6 system which has a quarter million words of core storage, gave us a decided advantage over Bobrow, whose program had to fit into a 32K 7094 LISP system, whereas ours wallows in the comparative luxury of 45K of memory.
Extract: STUDENT
STUDENT
As we mentioned earlier Bobrow's program, STUDENT, solves algebra word problems.  To really understand how STUDENT works we should go through a problem and see how the program solves it.
Extract: Conclusions
Conclusions
In section 2 we noted several aspects of STUDENT which we felt needed further work.  Let us compare CARPS and STUDENT in these areas.
1)  By storing information in terms of structures, CARPS is better able to recognize that two phrases describe the same object.
2)  Again because of the use of structures CARPS can gather information about an object in piecemeal fashion.  STUDENT was essentially required to generate one equation for each sentence in the problem description.  In calculus word problems it is not uncommon to have two or three sentences providing information for one equation.
3)  CARPS to a limited degree is able to use its knowledge to parse its input sentences.  For example we saw how ALTITUDE was interpreted as ALTITUDE OF THE FILTER because CARPS knew that since the filter was a cone and cones have altitudes, the filter had an altitude.
4)  Whereas STUDENT has only one soiutlon method (i.e., solution of linear equations), CARPS has several and can decide which is appropriate for a given problem.  CARPS machinery for solving its equations (e.g., differentiation, simplification) is also more complex than STUDENT'S.
5)  CARPS utilizes a more sophisticated gramtical analysis of the sentences than STUDENT.  This is used both in breaking up sentences and in generating the internal structures.  Up to now 14 problems have been solved by CARPS. Most were taken from standard texts, sometimes with slight modifications.  (Variations in the statements of problems would, of course, increase the number of problems solvable by CARPS.)  The other problems solved by CARPS are in Appendix A.
Many of the improvements that we claim for CARPS were necessitated by the increased
complexity of the problems that we expected it to solve.  However many weaknesses of STUDENT are still present in some form in our design.
1)  An important weakness in the program is due to its dependence on key words to signify the type of problem (i.e., distance or volume) and the method of solution to be used. What one would like to have in a calculus problem solver is a program which would use the information presented in the problem to figure out relationships among the elements (e.g., similar triangles) and actually propose the method of solution. The answers which could be provided by such a program are currently built into CARPS' data base for several cases, but this scheme severely limits the power of the program.
2)  Another weakness of CARPS is its limited knowledge of English syntax.  It would not be too difficult for CARPS to learn new syntactic rules by adding these rules to its CONVERT subroutines. Actually what would be more satisfying would be a different method of parsing the sentences into components of structures.  Currently the CONVERT rules are attempted one at a time until one matches the sentence. A better approach would be an incremental left to right parse which, when finished with the sentence, would have.translated it into the internal model.
3)  A very powerful calculus word problem solver will require a good deal of "cormnon sense'' knowledge.  Consider this problem which we gave to CARPS
(A LADDER 20.0 FT LONG LEANS AGAINST A HOUSE /.  FIND THE RATE AT WHICH THE TOP OF THE LADDER IS MOVING DOWNWARD IF ITS FOOT IS 12.0 FT FROM THE HOUSE AND MOVING AT THE RATE 2.0 ET PER SEC /.)
Much to our surprise CARPS was not able to solve it. A closer look at the problem shows why.  The last phrase mentions that the ladder is moving at the rate 2 ft per second.  CARPS has, as an internal check, the requirement that associated with each velocity must be the direction of the velocity.  The point is that the problem never gave this direction. Most people, however, would assume that it was moving horizontally away from the nouse.  T'ne reason of course, is that a familiarity with ladders or the law of gravity tells us that this is the most likely way for it to be moving.
This is not an isolated incident.  Consider the problem
(A BARGE WHOSE DECK IS 10 FT BELOW THE LEVEL OF A DOCK IS BEING DRAWN IN BY MEANS OF A CABLE ATTACHED TO THE DECK AND PASSING THROUGH A RING ON THE DOCK. WHEN THE BARGE IS 24 FT FROM AND APPROACHING THE DOCK AT 3/4 FT / SEC, HOW FAST IS THE CABLE BEING PULLED IN ?)
The ship is moving horizontally towards the dock. The problem does not mention this.
Nor is the need for a body of "real world" information unique to calculus problems.  Consider an example taken from a children's story.  "Mike's costume had big ruffles.  He refused to wear it." The question is "Why?". A program which will answer correctly must know that boys usually will not wear girl's clothes, and anything with ruffles on it must be for girls.
It is our belief that a program which is to understand second grade stories must have at its disposal a body of facts comparable to that of an average seven year old.  More generally, computer understanding of any body of literature will require a data base similar to that of a human reader.  The major problems facing computer understanding of natural language are, in our opinion, collecting, storing, and utilizing such large bodies of information.

      in Donald E. Walker, Lewis M. Norton (Eds.): Proceedings of the 1st International Joint Conference on Artificial Intelligence IJCAI-69, Washington, DC, May 1969. William Kaufmann, 1969 view details

Sammet, Jean E. "Computer Languages - Principles and History" Englewood Cliffs, N.J. Prentice-Hall 1969. p.664. view details
in Donald E. Walker, Lewis M. Norton (Eds.): Proceedings of the 1st International Joint Conference on Artificial Intelligence IJCAI-69, Washington, DC, May 1969. William Kaufmann, 1969 view details

Schwarcz, R., J.F. Burger & R.F. Simmons (1970). A deductive question-answerer for natural language inference. CACM, 3. view details Abstract: The question-answering aspects of the Protosynthex III prototype language processing system are described and exemplified in detail. The system is written in LISP 1.5 and operates on the Q-32 time-sharing system. The system's data structures and their semantic organization, the deductive question-answering formalism of relational properties and complex-relation-forming operators, and the question-answering procedures which employ these features in their operation are all described and illustrated. Examples of the system's performance and of the limitations of its question-answering capability are presented and discussed. It is shown that the use of semantic information in deductive question answering greatly facilitates the process, and that a top-down procedure which works from question to answer enables effective use to be made of this information. It is concluded that the development of Protosynthex III into a practically useful system to work with large data bases is possible but will require changes in both the data structures and the algorithms used for question answering DOI Extract: PILOT, SIR, STUDENT
However, Teitelman [27] has implemented question-answering routines in his PILOT system that return their output in a subset of English. He limits himself, however, to simple format insertion methods and has not solved the problem of English input. In this respect he follows Raphael's SIR system [21] and Bobrow's STUDENT system [1], both of which exhibit—-along with good deductive capability in narrowly circumscribed domains-—a capability for input and output in a limited subset of English than is achieved through format matching and insertion rather than through linguistically motivated semantic analysis and generation procedures (though Bobrow's use of formats in a recursive manner does derive somewhat from early versions of transformational theory).
in Donald E. Walker, Lewis M. Norton (Eds.): Proceedings of the 1st International Joint Conference on Artificial Intelligence IJCAI-69, Washington, DC, May 1969. William Kaufmann, 1969 view details