Protosynthex(ID:459/pro085)

Querying system for English test 


for protosynthesis

Query system for English text, based on protosynthesis, (a form of inference combined with tree-searching)

From Simmons 1965 "The approach of Protosynthex is to successively filter out more and more irrelevant information, leaving  ultimately only statements which have a high probability of being answers to the question."

According to Simmons 61 reported Nov 1960: "The initial vehicle, protosynthex, will be an elementary language-processing device which reads simple printed material, and answers simple questions phrased in elementary English."

called Synthex by analogy to Memex


Places
Related languages
Protosynthex => Protosynthex   Evolution of
Protosynthex => Protosynthex II   Evolution of

References:
  • Simmons, R. F. "Synthex" p140-1 view details Abstract: The objective of this project is to develop a research methodology anil a vehicle for the design and construction of a general-purpose computerized system for synthesizing complex human cognitive functions. The initial vehicle, protosynthex, will be an elementary language-processing device which reads simple printed material, and answers simple questions phrased in elementary English.
    We believe that a logic established for cognitive aspects of language will readily generalize to the synthesis of other cognitive functions such as motor skills and multi-modal inputs.
    Current efforts are concerned with language analysis from a structural and a syntactic viewpoint, and with the development of organizational principles for the storage of textual content. We have conducted several minor experiments tu test the usefulness of various approaches to these problems. These studies include the development of methods for quantifying synonymic meanings, the application of scaling techniques to text to determine concept clusters, and a statistical analysis of interrelated frequencies of certain grammatical forms.
    These experiments have led to the formulation of an immediate task: developing a system which can read and answer questions about a basic reading primer. The primer selected has a limited vocabulary of some 150 elementary English words, is about 7500 words in total length, and is made up of simple sentence structures. If a system can be devised to handle this material, then our concepts about a general-purpose synthex will be greatly clarified. External link: Online copy
          in [ACM] CACM 4(03) (March 1961) view details
  • Simmons, R. F. "Synthetic language behavior" Data Process. Management, 5, 12 (1963), 11-18. view details
          in [ACM] CACM 4(03) (March 1961) view details
  • Simmons, R. F. and McConlogue, K. L. "Maximum-depth indexing for Computer retrieval of English language data" Amer. Documentation, 14, 1, (1963), 68-73. view details
          in [ACM] CACM 4(03) (March 1961) view details
  • Simmons, R. F. and McConlogue, K. L. "Maximum-depth indexing for Computer retrieval of English language data" Doc. SP-775, System Development Corp., Santa Monica, Calif. 1963 view details
          in [ACM] CACM 4(03) (March 1961) view details
  • Simmons, Robert F. "SYNTHEX" view details
          in Orr, William (ed) "Conversational Computing", 1968 view details
  • Bobrow, D.G. "Natural Language Input for a Computer Problem Solving System", Report MAC-TR-1, Project MAC, M.I.T., Cambridge, Mass., June 1964 view details External link: Online copy pdf ps Abstract: The STUDENT problem solving system, programmed in LISP, accepts as input a comfortable but restricted subset of English which can express a wide variety of algebra story problems. STUDENT finds the solution to a large class of these problems. STUDENT can utilize a store of global information not specific to any one problem, and may make assumptions about the interpretation of ambiguities in the wording of the problem being solved. If it uses such information, or makes any assumptions, STUDENT communicates this fact to the user. The thesis includes a summary of other English language question-answering systems. All these systems, and STUDENT are evaluated according to four standard criteria. The linguistic analysis in STUDENT is a first approximation to the analytic portion of a semantic theory of discourse outlined in the thesis. STUDENT finds the set of kernel sentences which are the base of the input discourse, and transforms this sequence of kernel sentences into a set of simultaneous equations which form the semantic base of the Student system. STUDENT then tries to solve this set of equations for the values of requested unknowns. If it is successful it gives the answers in English. If not, STUDENT asks the user for more information, and indicates the nature of the desired information. The STUDENT system is a first step toward natural language communication with computers. Further work on the semantic theory proposed should result in much more sophisticated systems. Extract: Introduction
    Introduction
    The aim of the research reported here was to discover how one could build a computer program which could communicate with people in a natural language within some restricted problem domain. In the coarse of this investigation, I wrote a set of computer pro-grams, the STUDENT system, which accepts as input a comfortable but restricted subset of English which can be used to express, a wide variety of algebra story problems. The problems shown in Figure 1 illustrate some of the communication and problem solving capabilities of this system.
    In the following discussion, I shall use phrases such as "the computer understands English".  In all such cases, the "English" is just the restricted subset of  English which is allowable as input for the computer program under discussion. In addition, for purposes of this report I have adopted the following operational definition of understanding.  A computer "understands" a subset of English if it accepts input sentences which are members of this subset, and answers questions based on information centered in the input. The STUDENT system understands English in this sense. Extract: SYNTHEX
    The SXNTHEX system is a text-based question-answering system designed and programmed at SDC by Simmons, Klein and McConologue. The entire contents of a children's encyclopedia has been transcribed to magnetic tape for use as the information store. An index, has been prepared listing the location of all the consent words in the text, i.e. including words like "worm," "eat." and "birds," while excluding function words like "and," "the," and "of." All the content words of a question are extracted, and information rich sections of the text are retrieved, i.e. sections that are locally dense in content words contained in the question. For example, if the question were "What do worms eat?", with content words "worms" and "eat", the two sentences "Birds eat worms on the grass." and "Host worms usually eat grass." might be retrieved. At this time, the program performs a syntactic analysis of the question and of the sentences that may contain the answer. A comparison of the dependency trees of the question and various sentences may eliminate some irrelevant sentences. In the example, "Birds eat worms on the grass" is eliminated because "worms" is the object of the verb "eats" instead of the subject as in the question. In the general case, the remaining sentences are given in some ranked order as possibly answering the question.
    SYNTHEX is limited syntactically by its grammar to the extent that the syntactic analysis eliminates irrelevant statements. It makes no use of the meaning of any statements or words, and cannot deduce answers from information implicit in two or more sentences. Because the grammar is independent of the program, the syntactic ability of SYHTHEX can be extended relatively easily. However, before it can become a good question-answering system, some semantic abilities will have to be added.                                     .
    SYNTHEX does not explicitly provide for interaction with the user, but because it is implemented in the SDC time-sharing system (9), a user may modify a previous question if the sentences retrieved were not suitable. The mechanism for selection of sentences must be kept in mind to get best results. Extract: BASEBALL
    Baseball is a question-answering  system designed and programmed at Lincoln Laboratories by  Green, Wolf, Chomsky and Laughery (19).    It  is a data base system  in which the data  is placed in memory in a prestructered tree format.    The data consists of the dates,   location, opposing teams and scores of some American League baseball games.    Only questions to the system can be given in English, not the data.
    Questions mast be simple sentences, with no relative clauses,  logical or coordinate connectives. With these restrictions, the program will accept any question couched in words contained in a vocabulary list quite adequate for asking questions about baseball statistics.    In addition,  the parsing routine,  based on techniques developed by Harris (21) , must find a parsing for the question.
    The questions must pertain to statistics about baseball games  found in the information store.    One cannot    ask questions about extrema,  such as "Highest" score or "fewest" number of games won.    The parsed question is transformed into a standard specification (or spec)  list and the question-answering routine utilizes this canonical form for the meaning of the question. For example, the question  "Who beat the Yankees on July 4th?" would be transformed into the "spec list":
    Team (Losing)= New York
    Team (winning) = ?
    Date      = July
    Because Baseball does not utilize English for data input, we cannot talk about deductions made from information implicit in several sentences.    However, Baseball can perform operations such as counting (the number of games played by Boston, for example)  and thus  in the sense that it is utilizing  several separate data units in its store,   it is performing deductions.
    Baseball's abilities can only be extended by extensive re-programming,  though the techniques utilized have some general applicability.    Because the parsing program has a very complete grammar, and the vocabulary list  is quite comprehensive for the problem domain, the user needs no knowledge of the internal structure of the Baseball program.    No provision for interaction with the user was made.
    Extract: Advice-taker
    McCarthy's Advice-taker, though not designed to accept English input, would make an excellent base for a question-answering system. Fischer Black has programmed a system which can do all of McCarthy's Advice-Taker problems, and can be adapted to accept a very limited subset of English. The deductive system in Black's program is equtvalent to the propositional calculus.
          in Orr, William (ed) "Conversational Computing", 1968 view details
  • Simmons, R.F., S. Klein and K.L. McConologue, "Indexing and Dependancy Logic for Answering English Questions" American Documentation 1964 view details
          in Orr, William (ed) "Conversational Computing", 1968 view details
  • Simmons, R. F.; Klein, S., and McConlogue, K. L. "Indexing and dependency logic for answering English questions" Amer. Documentation 15, 3, (1964), pp196-204. view details
          in Orr, William (ed) "Conversational Computing", 1968 view details
  • Simmons, R.F., "Answering English Questions by Computer - A Survey" SDC Report SP-1536 Santa Monica, Calif.; April, 1964 view details
          in Orr, William (ed) "Conversational Computing", 1968 view details
  • Simmons, R. F. "Answering English questions by computer: a survey" p53-70 view details Abstract: Fifteen experimental English language question-answering systems which are programmed and operating are described and reviewed. The systems range from a conversation machines to programs which make sentences about pictures and systems which translate from English into logical calculi. Systems are classified as list-structured data-based, graphic data-based, text-based and inferential. Principles and methods of operations are detailed and discussed.

    It is concluded that the data-base question-answerer has passed from initial research into the early developmental phase. The most difficult and important research questions for the advancement of general-purpose language processors are seen to be concerned with measuring meaning, dealing with ambiguities, translating into formal languages and searching large tree structures. DOI Extract: Protosynthex
    Protosynthex
    At SDC, Simmons and McConlogue with linguistic support from Klein (Simmons, Klein, MeConlogue, 1963) have built a system which attempts to answer questions from an encyclopedia. The problem in this system was to accept natural English questions and search a large text to discover the most acceptable sentence, paragraph or article as an answer. Beginning at the level of ordinary text, Protosynthex makes all index, then uses a synonym dictionary, a complex intersection logic, and a simple information scoring function to select those sentences and paragraphs which most resemble the question. At this point, both the question and the retrieved text are parsed and compared. Retrieved statements whose structure or whose content words do not match those of the question are rejected. A final phase of analysis cheeks the semantic correspondence of words in the answer with words in the question.
    Beginning with natural text that has been keypunched, an indexing pass is made and an index entry is constructed for each content word in the text. A root-form logic is used to combine entries for words with similar forms; for example, only one index entry exists for govern, governor, government, governing, etc. The contents of the entry are a set of VAPS numbers which indicate the Volume, Article, Paragraph and Sentence address of each occurrence of the indexed word.
    The first step in answering the question is to look up all of its content words in the index and so retrieve all of the appropriate VAPS numbers. At this stage a dictionary of words of related meaning is used to expand the meaning of the question's words to any desired level. [...]
    The intersection test finds the smallest unit of text, preferably a sentence, in which the greatest number of words intersect. A simple information score based on the inverse of the frequency of occurrence of the word in the large sample of text is used to weight some words more heavily than others in selecting potential answers. All of this computation is done with the VAPS numbers that were obtained from the index. The highest scoring five or ten potential answers are then retrieved from the tape on which the original text was stored. These comprise an  information-rich set of text which roughly corresponds to the set of alternatives proposed by the question (within limits of the available text).
    The question and the text are then parsed using a modification of the dependency logic developed by D. Hays (1962). [...] In passing it should be noted that the parsing system learns its own word classes as a result of being given correctly analyzed text. The human operator interacts frequently with this parser to help it avoid errors and ambiguities. [...]
    The actual matching is accomplished by a fairly complex set of programs which build a matrix containing all the words from the question and from its possible answers. The matrix is examined to discover the structural matches and the part of the statement which corresponds to the question word. [...] A semantic evaluation system is now required to score each of the words in phrases corresponding to "what."
    This system is essentially a dictionary lookup whose entries can grow as a function of use. If certain words are found to be answers to "where" questions they will be so coded in the dictionary. [...] Those which corresponded most closely would have been scored as best answers. The semantic evaluation system is still at early stages of  experimentation but is expected to resemble the parsing system in that its dictionary will be developed and modified as a set function of experience with text under the control of an online operator.
    The approach of Protosynthex is to successively filter out more and more irrelevant information, leaving  ultimately only statements which have a high probability of being answers to the question. This system is an attempt to deal with a large and syntactically complex sample of natural text. It is a symbiotic system in which a man works with the computer to help resolve both syntactic and semantic ambiguities. Only in this fashion is the program able to overcome the problems associated with multiple, apparently valid interpretations of the same sentence or question.
          in [ACM] CACM 8(01) Jan 1965 view details
  • Simmons, R. F. "Storage and retrieval of aspects of meaning in directed graph structures" view details Extract: Introduction
    Introduction
    Behind the development of every new computer language there lies a set of problems and a set of programming structures with whose aid the problems can be managed. With Fortran the problem was to solve algebraic equations without the need for a great deal of I/O bookkeeping and without concern for detailed computer-word-packing statements. Behind JovrAL lay the command-control problem, which customarily dealt with complex data structures and the need to use every bit of computer memory efficiently. IPL grew in response to a need to use associative list structures of nonnumeric symbolic data in a computer. Lisp answered the need for a high-level functional language to handle recursive algebraic and symbolic structures. Comit was the machine translator's approach to handling natural language strings.
    In developing a special concept dictionary for storing and retrieving the meanings of words and phrases of English, the authors have found it desirable to use a complex network whose nodes are interrelated in several ways. Basically, the idea is that of a dictionary of English words in which each word has associated with it word-class information and lists of attribute-value pairs that define various aspects of its meaning and usage in terms of pointers to other words in the dictionary. In a data structure language such as Jovial, in addition to ordinary table structures several additional levels of associative coding are required to give easy access to the data without excessive costs in either space or processing time.
    Because of the many levels of associative linking required the authors decided to use Lisp, at least for early experimental work with the system. Advantages of Lisp extended beyond the ease of producing complex data structures; they also included the simplicity of producing complex, often recursive functions for maintaining and querying the dictionaiy. An additional advantage is gained in the fact that although Lisp is primarily an interpretive system it does allow for the compiling of a fast-running completed program. The most serious disadvantage of Lisp for our system is that in present versions1 it is limited to core memory for most uses. This limitation means that a dictionary of the type we are studying could not exceed two or three hundred words.
    Since we are aiming for an eventual vocabulary of from five to fifty thousand words, the limitation to core memory is intolerable. Either an expansion of Lisp will be required or the writing of a special language using auxiliary memory for handling cyclical structures in large complex networks will grow out of our experiments with the conceptual dictionary.
    Extract: The Problem
    The Problem
    The major shortcoming of all existing retrieval systems is their inability to handle anything vaguely resembling the meaning of words taken singly, let alone the meaning of the language strings that they comprise. Synonym dictionaries and thesauri have often been added but have proved but feeble makeshifts offering little improvement over the use of root forms of words alone. To the extent that automatic syntactic analysis has been available it has only emphasized the need for word and phrase meanings.
    Five years of Synthex research toward the development of question-answering systems based on natural language text have confirmed this inability to deal with meanings. In these five years many approaches have been attempted toward representing some aspects related to the meaning of words. Most have been unsuccessful. It was learned early that the use of a synonym dictionary did not greatly improve our understanding of text in response to questions. More recently it was realized that even a well-coded thesaurus was not an answer. At various times attempts were made to save syntactic contexts associated with words as possible representations of meanings; these too, although promising, did not appear to be a reasonable answer. With more recent research, particularly that of Bobrow [1964], Raphael [1964], Quillian [1965], and Thompson [1964], it has become apparent that, in addition to dictionary-type meanings, there is a need for something that can best be characterized as a knowledge of the world (e.g., Cows eat grass. Walls are vertical. Grass doesn't eat., etc.). Without something representing knowledge of the world it can hardly be hoped that a word or sentence can be understood.
    The consequence of this line of thought is the realization that the problem requires the development of a conceptual dictionary that would contain definitional material, associative material, and some representation of knowledge of the world. These three aspects of a word's meaning seem to be the minimum that will allow for enough understanding of English to make question answering even a reasonable probability.
    Extract: The Conceptual Dictionary
    The Conceptual Dictionary
    In a conceptual dictionary each word would be characterized by (a) a set of class memberships, (b) a set of attributes, (c) a set of associations, and (d) a set of active and passive actions.
    The set of class memberships includes statements of the form the/an X(noun) is a/an F(noun). Thus, "an aard-vark is an animal" or "an aardvark is a mammal" are both examples of statements giving rise to class membership characteristics. For many nouns the class membership set is one of the basic definitional aspects of the word.
    Attributes characterizing a word are such that if y is an attribute of x, then "x is y" is a true statement and the string "the yx" is grammatical. Thus if "scaly" is an attribute of "aardvark," then "an aardvark is scaly" is true and "the scaly aardvark" is grammatical. Associates of a word are in a loose part-whole relationship. If x has a y, then y is an associate of x\ thus "John has a wallet" and "John has a nose" provide the two associates, "nose" and "wallet," for John.
    The set of actions characterizing a word are derived from the verbs and their complements or objects that appear in context with the word. The two sentences "Natives eat aardvarks" and "Aardvarks eat ants" provide "eaten by natives" and "eat ants" as passive and active actions related to aardvarks.
    The idea underlying this schema is that the meaning of a word can be conceptualized in bands of closeness of relation. The class membership of an object is so closely related as to be an integral part of it or its perception. Attributes and things closely associated with a word are seen as important but less essential, while the actions associating one word and another may range from important to irrelevant. Extract: Toward an Operational System
    Toward an Operational System
    The conceptual dictionary briefly described above exists now as an experimental Lisp program in. 47k of core memory in the ARPA-SDC Q-32 Time-Sharing system. It is currently limited to a dictionary of 200-300 words and a relatively small set of program functions. What is required of even an early operational system is the ability to handle from 20 to 50 thousand words and a rather large set of functions for dealing with questions of increasing difficulty. Our expectations are that Lisp will be expanded into a system that uses up to four million words of disk to augment core and that we will be able to pay an increased cost of response time in favor of continuing to use Lisp for a large operational system. If that cost is prohibitive it will be necessary to produce a system tailored to the needs of the conceptual dictionary, and that will be able to use auxiliary memory efficiently to deal with a very large network of complexly linked words.
    Although it is our belief that new problems create the need for new languages, it is apparent that existing languages are largely sufficient for our language processing problems, but in many cases, especially among the list-oriented languages, they simply have not geared themselves to the large amounts of data and data processing required in this special field.
    Extract: Acknowledgments
    Acknowledgments. I wish to acknowledge my debt to the twenty or so people who have studied question-answering systems over the past decade. Their work is reviewed elsewhere [Simmons 1965]. Headers knowing the Quillian system will recognize that the result of the author's three years of acquaintance with Quillian was the appropriation wholeheartedly of his ideas insofar as the author was able to understand them. A special debt is also expressed to Fred Thompson for leading the author to an understanding of parsing directly into a data structure and for acquainting him with TEMPO'S forthcoming language system of associative cycles. Programming and detail design of the conceptual dictionary described in this paper were accomplished by John Burger.
    Extract: Discussion
    Discussion
    Salton opened the discussion with the comment that a system such as this is inherently not extendable. The system will operate nicely with various kinds of "fish," but will run into trouble when "whales" appear, since the system will not know how to deal with aquatic mammals. He compared the system to the "Baseball" system, in which one cannot go beyond a limited range of questions, e.g., one has trouble if a new team appears. Young objected to this view, saying he believed the two systems were quite different, and that the present system was in principle infinitely extendable and very general. He said he could conceive of a system one level above this one which could deal with generalizations and which would make handling propositions easier than with the present special programming.
    Burger said that work with higher level relationships was planned, but that the immediate extensions would be more trivial, in the way of inserting a great deal of knowledge about the world, of the kind any child has (e.g., "Walls are vertical.").
    Gora observed that the present system is based upon two forms of the verb "is" and the verb "has," and that more relationships were needed. Burger said the system was not in principle so limited. Gorn then asked how they would deal with the statement " 'Word' is a word." Burger had no immediate answer, though he thought they would eventually be able to deal with such cases.
    Responding to a question from Cheydleur, Burger said they had about 50 different functions.
    Mooers asked if there were any inherent features of Lisp which limited their work. Burger said that a fundamental limitation was the limitation to use of core storage alone in the SDC version of Lisp. They could not use disks. The second limitation was the inability to break out individual characters from the "atoms" of Lisp. This prevents au easy and direct way of treating the similarity between "dog" and "dogs." At present this relationship has to be put in as a separate piece of information.
    Mitchell then mentioned that for a very much larger data base a limitation will be the time required to pass the dictionaries through the machine. He said one of the big improvements in speed due to syntax-directed compilers was that they had less need to refer to a very large dictionary. Burger admitted that this was a very important problem, and one which is being considered. What can be done outside of Lisp is being studied, since a problem in using an auxiliary memory with a list-structure system like Lisp is that interrelationships between lists are broken up if just one section is brought from the auxiliary memory. So far, a good solution to the problem has not been found.
    Abstract: An experimental system that uses LISP to make a conceptual dictionary is described. The dictionary associates with each English word the syntactic information, definitional material, and references to the contexts in which it has been used to define other words. Such relations as class inclusion, possession, and active or passive actions are used as definitional material. The resulting structure serves as a powerful vehicle for research on the logic of question answering. Examples of methods of inputting information and answering simple English questions are given. An important conclusion is that, although LISP and other list processing languages are ideally suited for producing complex associative structures, they are inadequate vehicles for language processing on any large scale—at least until they can use auxiliary memory as a continuous extension of core memory.

          in [ACM] CACM 9(03) March 1966 includes proceedings of the ACM Programming Languages and Pragmatics Conference, San Dimas, California, August 1965 view details
  • Simmons, R.F., J.F. Burger & R. Schwarcz "A computational model of verbal understanding" view details
          in [AFIPS] Proceedings of the 1968 Fall Joint Computer Conference FJCC 33 view details
  • Sammet, Jean E. "Computer Languages - Principles and History" Englewood Cliffs, N.J. Prentice-Hall 1969. p.669. view details Extract: Protosynthex
    The Protosynthex system is based on natural English text since it is an attempt to answer questions (phrased in natural English) from an encyclopedia. As such, it is the only system described in this section which does not have a structured data base, The basic principle is to use synonyms, intersection logic, and a simple scoring function to find the sentences and paragraphs which most closely resemble the question. Statements thus retrieved which do not match the structure of content words of the question are rejected. For example, the question What animals live longer than men? might cause the following list of content words to be used in searching the index:

    Word Words of Related Meaning

    animals mammals, reptiles fish live age longer older, ancient men person, people, women

    The smallest unit with the greatest intersection with these words is then found, where some weighting is applied to certain words in the search.The highest scoring units are selected and then both the question and the proposed answer are parsed to make sure that the structures are the same.

          in [AFIPS] Proceedings of the 1968 Fall Joint Computer Conference FJCC 33 view details
  • Schwarcz, R., J.F. Burger & R.F. Simmons (1970). A deductive question-answerer for natural language inference. CACM, 3. view details Abstract: The question-answering aspects of the Protosynthex III prototype language processing system are described and exemplified in detail. The system is written in LISP 1.5 and operates on the Q-32 time-sharing system. The system's data structures and their semantic organization, the deductive question-answering formalism of relational properties and complex-relation-forming operators, and the question-answering procedures which employ these features in their operation are all described and illustrated. Examples of the system's performance and of the limitations of its question-answering capability are presented and discussed. It is shown that the use of semantic information in deductive question answering greatly facilitates the process, and that a top-down procedure which works from question to answer enables effective use to be made of this information. It is concluded that the development of Protosynthex III into a practically useful system to work with large data bases is possible but will require changes in both the data structures and the algorithms used for question answering DOI Extract: PILOT, SIR, STUDENT
    However, Teitelman [27] has implemented question-answering routines in his PILOT system that return their output in a subset of English. He limits himself, however, to simple format insertion methods and has not solved the problem of English input. In this respect he follows Raphael's SIR system [21] and Bobrow's STUDENT system [1], both of which exhibit—-along with good deductive capability in narrowly circumscribed domains-—a capability for input and output in a limited subset of English than is achieved through format matching and insertion rather than through linguistically motivated semantic analysis and generation procedures (though Bobrow's use of formats in a recursive manner does derive somewhat from early versions of transformational theory).
          in [AFIPS] Proceedings of the 1968 Fall Joint Computer Conference FJCC 33 view details
  • Stock, Marylene and Stock, Karl F. "Bibliography of Programming Languages: Books, User Manuals and Articles from PLANKALKUL to PL/I" Verlag Dokumentation, Pullach/Munchen 1973 487 view details Abstract: PREFACE  AND  INTRODUCTION
    The exact number of all the programming languages still in use, and those which are no longer used, is unknown. Zemanek calls the abundance of programming languages and their many dialects a "language Babel". When a new programming language is developed, only its name is known at first and it takes a while before publications about it appear. For some languages, the only relevant literature stays inside the individual companies; some are reported on in papers and magazines; and only a few, such as ALGOL, BASIC, COBOL, FORTRAN, and PL/1, become known to a wider public through various text- and handbooks. The situation surrounding the application of these languages in many computer centers is a similar one.

    There are differing opinions on the concept "programming languages". What is called a programming language by some may be termed a program, a processor, or a generator by others. Since there are no sharp borderlines in the field of programming languages, works were considered here which deal with machine languages, assemblers, autocoders, syntax and compilers, processors and generators, as well as with general higher programming languages.

    The bibliography contains some 2,700 titles of books, magazines and essays for around 300 programming languages. However, as shown by the "Overview of Existing Programming Languages", there are more than 300 such languages. The "Overview" lists a total of 676 programming languages, but this is certainly incomplete. One author ' has already announced the "next 700 programming languages"; it is to be hoped the many users may be spared such a great variety for reasons of compatibility. The graphic representations (illustrations 1 & 2) show the development and proportion of the most widely-used programming languages, as measured by the number of publications listed here and by the number of computer manufacturers and software firms who have implemented the language in question. The illustrations show FORTRAN to be in the lead at the present time. PL/1 is advancing rapidly, although PL/1 compilers are not yet seen very often outside of IBM.

    Some experts believe PL/1 will replace even the widely-used languages such as FORTRAN, COBOL, and ALGOL.4) If this does occur, it will surely take some time - as shown by the chronological diagram (illustration 2) .

    It would be desirable from the user's point of view to reduce this language confusion down to the most advantageous languages. Those languages still maintained should incorporate the special facets and advantages of the otherwise superfluous languages. Obviously such demands are not in the interests of computer production firms, especially when one considers that a FORTRAN program can be executed on nearly all third-generation computers.

    The titles in this bibliography are organized alphabetically according to programming language, and within a language chronologically and again alphabetically within a given year. Preceding the first programming language in the alphabet, literature is listed on several languages, as are general papers on programming languages and on the theory of formal languages (AAA).
    As far as possible, the most of titles are based on autopsy. However, the bibliographical description of sone titles will not satisfy bibliography-documentation demands, since they are based on inaccurate information in various sources. Translation titles whose original titles could not be found through bibliographical research were not included. ' In view of the fact that nany libraries do not have the quoted papers, all magazine essays should have been listed with the volume, the year, issue number and the complete number of pages (e.g. pp. 721-783), so that interlibrary loans could take place with fast reader service. Unfortunately, these data were not always found.

    It is hoped that this bibliography will help the electronic data processing expert, and those who wish to select the appropriate programming language from the many available, to find a way through the language Babel.

    We wish to offer special thanks to Mr. Klaus G. Saur and the staff of Verlag Dokumentation for their publishing work.

    Graz / Austria, May, 1973
          in [AFIPS] Proceedings of the 1968 Fall Joint Computer Conference FJCC 33 view details
  • Cuadra, Carlos A. "SDC Experiences with Large Data Bases" Journal of Chemical Information and Computer Sciences 15(1) 1975 view details Abstract: SDC operates a large-data-base system that permits users all over the United States and in several foreign countries to search very large bibliographic files interactively, by means of a terminal and telephone connection. Developing extensive use of such systems requires not only technical considerations—such as proper selection and handling of data base elements— but also a massive educational effort, to help provide the large user community necessary to share the sizable costs of data base acquisition, file development, and storage. The growing acceptance of on-line retrieval services attests to the success of that effort, as well as to their inherent cost-effectiveness.
    Extract: Protosynthex
    In 1960, SDC developed its first interactive retrieval system, known as "Protosynthex." The system used what is now referred to as a full-text approach, the text being the contents of the Golden Book Encyclopedia.
          in [AFIPS] Proceedings of the 1968 Fall Joint Computer Conference FJCC 33 view details
    Resources
    • Memorial page for Simmons at UTA
      He began work in 1955 at RAND Corporation and continued in 1957 at its offshoot, the System Development Corporation, Santa Monica, where he was Head of the Language Processing Research Program until 1968. The research center that he started at SDC was one of the first in the world to investigate computer processing of natural language. He pioneered work there on question-answering systems and on natural language access to both databases and text files that has had a lasting effect on the field. His research was directed at the construction of Synthex, a computerized system to synthesize human language behavior. He wrote in 1962,


      "The synthex project is an outgrowth of a longstanding interest in the conscious processes of humans. After taking the Ph.D. I had an opportunity to read freely among my interests in psychology. Many nineteenth-century explorations of conscious processes of thinking, believing, etc., caught my fancy at that time. William James' Principles of Psychology seemed to me to be a high-water mark for psychologists who were interested in the various problems of conscious psychology. The fact that the whole current of psychology has turned to the more rewarding (but to me less inspiring) study of more easily observed behavior seemed to leave a great gap in the study of human behavior. The problem then and now associated with consciousness appears to be the impossibility of formulating experimentally answerable questions. Studying cognitive processes by synthesizing them on computers seems to offer some hope that eventually we may come to understand enough about the difference between organisms and machines that a question about consciousness may be asked.''



      This computational approach and metaphor for human cognition, which Robert F. Simmons did so much to originate, has had a revolutionary effect on psychology, linguistics, and philosophy, and is now emerging as a new discipline called Cognitive Science. external link