COMIT(ID:19/com007)

String-handling and pattern-matching language 


The first string-handling and pattern-matching language, designed for applications in natural language translation. The user has a workspace organized into shelves. Strings are made of constituents (words), accessed by subscript. A program is a set of rules, each of which has a pattern, a replacement and goto another rule.

Places
People: Hardware:
Related languages
COMIT => COMIT II   Evolution of
COMIT => CONVERT   Extension of
COMIT => DYSTAL   Influence
COMIT => DYSTAL   Influence
COMIT => EOL   Influence
COMIT => LECOM   Port
COMIT => METEOR   Derivation of
COMIT => SNOBOL   Influence
COMIT => The New Language   Influence
COMIT => Wegstein string Algol   Influence

References:
  • Yngve, Victor "A programming language for mechanical translation" Mechanical Translation 4, 3 (Dec. 1957), 59-65. view details
  • Yngve, Victor "The COMIT system for mechanical translation" pp183-187 view details
          in 1959 IFIP Congress Paris, France view details
  • Tonge, Fred M. review of Yngve 1960 view details Abstract: This article concerns the production of natural language sentences from a phrase-structure hierarchy or immediate constituent organization. The proposed model uses a push- down temporary storage in expanding sentence constituents, much in the manner common to current algebraic compilers. The addition of expansions involving discontinuous constituents (that is, introducing a constituent "one down" in the temporary push-down store lather than on top) permits generation of many of the more complicated structures found in English. The hypothesis introduces an actual limit on the depth of temporary memory, and postulates that much of the grammatical complexity of a language derives from the need to insure that the depth is not exceeded. Then follows a discussion of the types of grammatical rules which would tend toward safe constructions, and of the occurrence of such rules in English. This scholarly and stimulating paper has much to offer those interested in communicating with computers. As has been suggested in other contexts (see, for example, Morris Halle's review article in Information and Control 4(1) pp88-90, a similar but reverse mechanism suggests a process for ingesting natural language. And for those interested in designing more formal computer languages, the analysis of some of the "peculiarities" of English, relating them to possible symbolic mechanisms of the human language user, holds some valuable lessons.
          in ACM Computing Reviews 2(05) September-October 1961 view details
  • Yngve, V. H. "A model and an hypothesis for language structure" Proc. Amer. Phil. Soc. 104, 5 (Oct. 1960), pp444-466 view details
          in ACM Computing Reviews 2(05) September-October 1961 view details
  • Sammet, Jean E "1960 Tower of Babel" diagram on the front of CACM January 1961 view details

          in [ACM] CACM 4(01) (Jan 1961) view details
  • Yngve, Victor "COMIT Programmer's Reference Manual", MIT Press 1961. view details
          in [ACM] CACM 4(01) (Jan 1961) view details
  • Yngve, Victor "The COMIT operators' manual" in SHARE distribution 1198, IBM, N. Y., Sept. 1961 view details
          in [ACM] CACM 4(01) (Jan 1961) view details
  • Grems, Mandalay "A survey of languages and systems for information retrieval" pp43-46 view details
          in [ACM] CACM 5(01) January 1962 "Design, Implementation and Application of IR-Oriented Languages," ACM Computer Language Committee on Information Retrieval on 20-21 October 1961 in Princeton, N. J. view details
  • Yngve, Victor "An Introduction to COMIT Programming." MIT Cambridge, Mass., 1962 view details
          in [ACM] CACM 5(01) January 1962 "Design, Implementation and Application of IR-Oriented Languages," ACM Computer Language Committee on Information Retrieval on 20-21 October 1961 in Princeton, N. J. view details
  • Yngve, Victor "COMIT Programmers' Reference Manual." MIT Cambridge, Mass, 1962 view details
          in [ACM] CACM 5(01) January 1962 "Design, Implementation and Application of IR-Oriented Languages," ACM Computer Language Committee on Information Retrieval on 20-21 October 1961 in Princeton, N. J. view details
  • Yngve, Victor H. "Toward better programming languages" view details Abstract: THE STYLE of a programming language often has a great influence on the approach that a programmer takes to a particular problem. This is the Whorf hypothesis applied to programming languages. Most programming languages are general purpose in that anything that is programmable can be programmed in them. But to the extent that they differ, some programs are more easily expressed in one language than in another. We do not consider here the important considerations of the efficient use of computer memory and running time, since these are more related to the particular machine and compiler configuration used than to the structure of the programming language itself. We are more interested in factors of programming convenience, brevity, fluency, legibility, ease of learning, ease of checkout, etc., that are of more direct concern to the programmer. Extract: Introduction
    One of the biggest advantages of list processing languages is the opportunity they afford to refer with one name to a whole list, possibly also including many sublists. The elements comprising the lists usually are single registers; some may contain data and some may contain information specifying the organization of or interrelations between the data registers. The reason why methods of data reference are so  important for the style of a programming language is that a programmer tends to think in terms of the items that he can name in the language and manipulate. It is in the area of data reference that the COMIT language is much different from other programming languages. COMIT was originally designed for convenient handling of alphabetic data - English and foreign words and sentences together with any temporary data needed for the operations of mechanical translation programs. COMIT has turned out, however, to be convenient for a number of other types of programs where the data are essentially non-numerical in character.

    In COMIT, the items of reference are not computer registers, but problem-oriented items of any convenient size called constituents. A constituent may consist of any number of characters; thus it could be used for a letter, an English word of any lengh, or any other item of non- numerical data. The programmer may refer to these  constituents directly, and so he does not have to take into consideration the particular computer word size. A  constituent may also carry, in addition to the data part, any number of classificatory subscripts, and a numerical  subscript useful for counting and simple arithmetic computations.

    COMIT is rich in its methods of data reference. It includes several types of direct referencing, as well as direct address referencing and indirect address  referencing. By direct referencing we mean the mention in the program of some or all of the data characters or classificatory subscripts. The desired data item or  constituent is found by a linear search through a defined portion of the data. A fast alphabetic search is also available for table or dictionary look-up.

    A convenient and much-used method of reference that is perhaps unique to COMIT is through context. It is possible, for example, to refer to the data on the right or on the left of a given item or to the data between given items. This is possible because data in COMIT can be conceived of as being arrayed along a one-dimensional problem space. The programmer thinks in terms of this problem-space and not in terms of computer registers. He is free to insert and delete items from the problem space without having to keep track of storage.
          in Invited papers view details
  • Yngve, Victor H. COMIT as an IR language pp19-28 view details Abstract: Many of the features that make COMIT a good all-around symbol manipulation language also render it well-suited to various types of information retrieval programs. Presented here is a general discussion of this unique and different programming language and an examination of some of its appllications. DOI Extract: What COMIT is
    What COMIT IS
    COMIT is a user-oriented general-purpose symbol- manipulation programming language. It is user-oriented in that it is a high-level language that is easy to learn and to use. The use of COMIT should minimize the user's time for programming a problem, The computer time. needed for checking it out, and thus the elapsed time required lo obtain a running program. It is a general-purpose symbol-manipulation language in that it is especially convonient for problems ranging from mechanical translation  to information retrieval, and from theorem proving to ihe maintenance of predominantly non-numerical files.

    COMIT was originally conceived as a special-purpose problem-oriented programming language for research on the mechanical translation into English of languages such as German and Russian. It was based on a notation in daily use by some of the linguists working on this problem at M.I.T. Some of the individual features of the notation were dropped as inappropriate, and many features were added to convert it to a programming language. It was soon realized that the requirements of mechanical translation could not be used for the complete definition of a programming language because the problem of mecchanical translation was partially unsolved and it was not known where some of the solutions might lie or what types of programs might be needed. For this reason the notation was consciously broadened so as to be potentially  useful for as wide a class of symbol manipulation problems  as possible. At this point the emphasis shifted from problem orientation to user orientation, and COMIT changed from a special-purpose language to a general-purpose language. The breadth of problem areas falling within its capabilities is only now being realized as COMIT goes into general use.

    From the very beginning there was an effort to make COMIT easy and natural to use, and easy to learn. It was desired to provide a laguage whioh would convert the computer into a tool that the individual research worker could use himself with the facility with which he would use his slide rule. To this end an effort was made to provide a language that would foster fluency and freedom of expression. The high-level notation adopted led to  brevity; considerations of naturalness were taken into account; arbitrary and needless restrictions were kept to a  minimum; and many checkout aids were incorporated in the system. It has been estimated that COMIT may offer at least a factor-of-20 increase in programming and checkout speed over machine language, and a sizeable margin over older symbol-manipulation languages.

    [...] A list of some of the problem areas in which COMIT programs have been written or are being written is as follows: mechanical translation routines, information retrieval research, vocabulary analysis, text processing-editing, random generation of English sentences, automatic milling machine programming, sosiological data reduction, simulation of hunmn problem solving, simulation of games, theorem-proving and mathematical logic, logico-semantic investigations, electrical network analysis.

    COMIT promises to be especially useful for infornmtion retrieval problems for two reasons. Two of the most: central built-in features are a simple scheme for dictionary search and a simple scheme of search using criteria such class inclusion and context. The dictionary search scheme offers automatic alphabetization of the dictionary entries and a high-speed binary search at run time. The other search scheme, using criteria such as class inclusion and context, is a linear scan through a defined portion of the data looking for a condition of exact match or of inclusion. This "workspace search" can be easily used for searches based on descriptors or other more complicated schemes using local or distant context.
          in [ACM] CACM 5(02) February 1962 view details
  • Bobrow, Daniel G.; and Raphael, Bertram. "A comparison of list-processing computer languages" RAND Corp., Santa Monica, Calif., RM-384Q-PR, Oct. 1963 view details
          in [ACM] CACM 5(02) February 1962 view details
  • Bobrow, Daniel G. "The COMIT Feature in LISP II" Technical Report, MIT Artificial Intelligence Laboratory, Number AIM-76, February 19 1965. view details Extract: Purpose
    I Purpose
    The purpose of the COMIT feature is to facilitate certain types of list manipulations in LISP II. This feature is a syntactic convenience rather than an extension of the semantics of LISP. It perrmits the programmer to test directly whether a piece  of list  structure matches a certain pattern,  and if so, to construct another structure utilising subsegments of the original structure which matched parts of the given pattern.
    Extract: The Match and Construct Interpreters
    The Match and Construct Interpreters
    The COMIT feature can be implemented by programming the interpretive functions in LISP,
                   match           [pattern; workspace]
    and         construct      [format; array]
    The match interpreter has two arguments, a pattern of the type described below, and a workspace which is the list which is to be matched against the pattern.    If the match fails, the value of the function is NIL.    If it succeeds,  the value is a symbolic array which  gives  the  segmentation of the workspace which allowed a match.
    The construct interpreter uses a  format,  to be described below, and an array which is an output of a successful match.    It constructs a new structure according to this format.
          in [ACM] CACM 5(02) February 1962 view details
  • Crisman, P. A. (Ed.) The time-sharing system: a programmer's guide. (2nd ed.) M.I.T. Press, Cambridge,Mass., 1966 view details Extract: LAnguages and Subsystems available
    This edition is a major revision and extension of the first, to incorporate the changes to the (CTSS) Compatible Time-Sharing System during the past two years. The manual itself is organized into sections for easy referencing and to facilitate future expansion and modification. The manual is maintained on-line within the system in an attempt to keep all system documcutation continuously up to date. A system user can keep his manual updated by periodically inspecting a table of contents and requesting an on-line printout of those sections which have been revised since he last updated his copy.

    Some features of the CTSS which are detailed in this edition are: password logic, introduction of more elaborate accounting features, inter-console message, public files, and macrocommands. A new file system was implemented to remove some previous weaknesses and to test a prototype of the file system which is proposed for the next timesharing system.

    Available languagcs and subsystems include AED (modified ALGOL), BEFAP, COGO-90, COMIT, DYNAMO, ESL DisplaySystem, LAPC, GPSS, MAD, T\IADTHN, SNOBOL, STRESS, and BLOW (BLODI Diagram Compiler). The manual presents a brief description of general subroutines and a description of commands for the creation, editing, compression, printing, and housekeeping of files, program execution and debugging.

          in [ACM] CACM 5(02) February 1962 view details
  • Darlington, J. L., "Machine Methods for Proving Logical Arguments Expressed in English", Mechanical Translation, Vol. 8, Nos. 3 and 4 (June, Oct., 1965), pp. 41-67. view details
          in [ACM] CACM 5(02) February 1962 view details
  • Goldstein, M. "Computer Languages" The American Mathematical Monthly, Vol. 72, No. 2, Part 2: Computers and Computing Feb., 1965 pp141-146 view details Extract: Languages scene
    An important step in artificial language development centered around the
    idea that i t is desirable to be able to exchange computer programs between
    different computer labs or at least between programmers on a universal level.
    In 1958, after much work, a committee representing an active European computer
    organization, GAMM, and a United States computer organization, ACNI,
    published a report (updated two years later) on an algebraic language called
    ALGOL. The language was designed to be a vehicle for expressing the processes
    of scientific and engineering calculations of numerical analysis. Equal stress was
    placed on man-to-man and man-to-machine communication. It attempts to
    specify a language which included those features of algebraic languages on
    which it was reasonable to expect a wide range of agreement, and to obtain a
    language that is technically sound. In this respect, ALGOL Set an important
    precedent in language definition by presenting a rigorous definition of its syntax.
    ALGOL compilers have also been written for many different computers.
    It is very popular among university and mathematically oriented computer
    people especially in Western Europe. For some time in the United States, it will
    remain second to FORTRAN, with FORTRAN becoming more and more like
    ALGOL.
    The largest user of data-processing equipment is the United States Government.
    Prodded in Part by a recognition of the tremendous programming investment
    and in part by the suggestion that a common language would result only
    if an active Sponsor supported it, the Defense Department brought together
    representatives of the major manufacturers and Users of data-processing equipment
    to discuss the problems associated with the lack of standard programming
    languages in the data processing area. This was the start of the conference on
    Data Systems Languages that went on to produce COBOL, the common business-
    oriented language. COBOL is a subset of normal English suitable for expressing
    the solution to business data processing problems. The language is
    now implemented in various forms on every commercial computer.
    In addition to popular languages like FORTRAN and ALGOL, we have
    some languages used perhaps by only one computing group such as FLOCO,
    IVY, MADCAP and COLASL; languages intended for student problems, a
    sophisticated one like MAD, others like BALGOL, CORC, PUFFT and various
    versions of university implemented ALGOL compilers; business languages in addition
    to COBOL like FACT, COMTRAN and UNICODE; assembly (machine)
    languages for every computer such as FAP, TAC, USE, COMPASS; languages to simplify problem solving in "artificial intelligence," such as the so-called list
    processing languages IPL V, LISP 1.5, SLIP and a more recent one NU SPEAK;
    string manipulation languages to simplify the manipulation of symbols rather
    than numeric data like COMIT, SHADOW and SNOBOL; languages for
    command and control problems like JOVIAL and NELIAC; languages to simplify
    doing symbolic algebra by computer such as ALPAK and FORMAC;
    a proposed new programming language tentatively titled NPL; and many,
    many, more. A veritable tower of BABEL!
          in [ACM] CACM 5(02) February 1962 view details
  • Bachman, C. W. review of Bobrow and Raphael 1963 view details Abstract: The title of the article clearly states the authors' purpose which is carried out with considerable success. Unfortunately, the October 1963 publication date indicates that the article is now three years behind the current state of the art and any new developments in the field, such as CORAL and IDS, are not included.

    COMIT, IPL-V, LISP 1.5, and SLIP are described and referenced. The four languages and their processors are compared under the topics: data structure, storage allocation, programming formalism, useability, and execution time. Certain selectivity criteria are presented to aid a user in selecting a list-processing language for his own use.

    The paper is directed toward the reader who has some acquaintance with list-processing fundamentals and would assist him in becoming more familiar with the languages discussed. The references could then lead the reader to greater depth if desired.

    The article is not a list-processing primer. It does not attempt to explain why list-processing languages exist or the problems they attempt to solve for their users.
          in ACM Computing Reviews 7(03) May-June 1966 view details
  • Gladun, V. P. "Memory organization for list processing" pp26-29 view details
          in Cybernetics. New York. 2. 1966, March-April view details
  • Guzmán, A. and McIntosh, H. V., "CONVERT" view details Abstract: A programming language is described which is applicable to problems conveniently described by transformation rules. By this is meant that patterns may be prescribed, each being associated with a skeleton, so that a series of such pairs may be searched until a pattern is found which matches an expression to be transformed. The conditions for a match are governed by a code which also allows subexpressions to be identified and eventually substituted into the corresponding skeleton. The primitive patterns and primitive skeletons are described, as well as the principles which allow their elaboration into more complicated patterns and skeletons. The advantages of the language are that it allows one to apply transformation rules to lists and arrays as easily as strings, that both patterns and skeletons may be defined recursively, and that as a consequence programs may be stated quite concisely.

          in [ACM] CACM 9(08) August 1966 view details
  • Sammet, Jean E., "Roster of Programming Languages 1967" view details
          in Computers & Automation 16(6) June 1967 view details
  • Yngve, Victor H. "MT at MIT 1965" pp452-523 view details
          in Booth, A.D. ed. "Machine Translation" Amsterdam: North-Holland 1967 view details
  • Abrahams, Paul W. "Symbol manipulation languages". New York: New York University, Courant Inst. of Mathematical Sciences 1968 view details
          in Booth, A.D. ed. "Machine Translation" Amsterdam: North-Holland 1967 view details
  • Knuth, Donald E. The Art of computer programming, Addison-Wesley Publishing Company Reading, MA 1968 view details Extract: History And Bibliography
    History And Bibliography
    Linear lists and rectangular arrays of information kept in consecutive memory locations were widely used from the earliest days of stored-program computers, and the earliest treatises on programming gave the basic algorithms for traversing these structures. [ For example, see J. von Neumann, Collected Works, vol. V, 113-116 (written 1947); M. V. Wilkes, D. J. Wheeler, S. Gill, The Preparation of Programs for an Electronic Digital Computer (Reading, Mass.: Addison-Wesley, 1951), subroutine V-1. ] Before the days of index registers, operations on sequential linear lists were done by performing arithmetic on the machine language instructions themselves, and this type of operation was one of the early motivations for having a computer whose programs share memory space with the data they manipulate.

    Techniques which permit variable-length linear lists to share sequential locations, in such a way that they shift back and forth when necessary, as described in Section 2.2.2, were apparently a much later invention. J. Dunlap of Digitek Corporation developed these techniques in 1963 in connection with the design of a series of compiler programs; about the same time the idea independently appeared in the design of a COBOL compiler at IBM Corporation, and a collection of related subroutines called CITRUS was subsequently used at, various installations. The techniques remained unpublished until after they had been independently developed by Jan Garwick of Norway; see BIT 4 (1964), 137-140.

    The idea of having linear lists in non-sequential locations seems to have originated in connection with the design of computers with drum memories; the first such computer was developed by General Electric Corporation during 1946-1948, and the IBM 650 was the most notable later machine of this type. After executing the instruction in location n, such a computer is usually not ready to get its next instruction from location n + 1 because the drum has already rotated past this point. Depending on the instruction being performed, the most favorable position for the next instruction might be n + 7 or n + 18, etc., and the machine can operate up to six or seven times faster if its instructions are optimally located rather than consecutive. [ For a discussion of the interesting problems concerning best placement of these instructions, see the author's article in JACM 8 (1961), 119-150. ] Therefore the machine design provides an extra address field in each machine language instruction, to serve as a link to the next instruction. Such a machine is called a "one-plus-one-address computer," as distinguished from MIX which is a one-address computer." The design of one-plus-one-address computers is apparently the first appearance of the linked-list idea within computer programs, although the dynamic insertion and deletion operations which we have used so frequently in this chapter were still unknown.

    Linked memory techniques were really born when A. Newell, C. Shaw, and H. Simon began their investigations of heuristic problem-solving by machine. As an aid to writing programs that searched for proofs in mathematical logic, they designed the first " listprocessing" language IPL-II in the spring of 1956. (IPL stands for Information Processing Language.) This was a system which made use of links and included important concepts like the list of available space, but the concept of stacks was not yet well developed; IPL-III was designed a year later, and it included "push down" and "pop up" for stacks as important basic operations. (For references to IPL-II see "The Logic Theory Machine," IRE Transactions on Information Theory IT-2 (Sept. 1956), 61-70; see also Proc. Western Joint Comp. Conf. (Feb. 1957), 218-240. Material on IPL- III first appeared in course notes given at the University of Michigan in the summer of 1957.)

    The work of Newell, Shaw, and Simon inspired many other people to use linked memory (which was often at the time referred to as NSS memory), mostly for problems dealing with simulation of human thought processes. Gradually, the techniques became recognized as basic computer programming tools; the first article describing the usefulness of linked memory for "down-to-earth" problems was published by J. W. Carr, III, in CACM 2 (Feb. 1959), 4-6. Carr pointed out in this article that linked lists can readily be manipulated in ordinary programming languages, without requiring sophisticated sub-routines or interpretive systems. See also G. A. Blaauw, IBM J. Res. and Dev. 3 (1959), 288-301.

    At first, one-word nodes were used for linked tables, but about 1959 the usefulness of several consecutive words per node and "multilinked" lists was gradually being discovered by several different groups of people. The first article dealing specifically with this idea was published by D. T. Ross, CACM 4 (1961), 147-150; at that time he used the term "plex" for what has been called a "node" in this chapter, but he subsequently has used the word "plex" in a different sense to denote a class of nodes combined with associated algorithms for their traversal.

    Notations for referring to fields within nodes are generally of two kinds: the name of the field either precedes or follows the pointer designation. Thus, while we have written "INFO(P)" in this chapter, some other authors write, for example, "P.INFO ". At the time this chapter was prepared, the two notations seemed to be equally prominent. The notation adopted here has been used in the author's lectures since 1962 and it has also been independently devised by numerous other people, primarily because it is natural to use mathematical functional notation to describe attributes of a node. (For example, see the article by N. Wirth and C. A. R. Hoare, CACM 9 (1966), 413-432.) Note that "INFO(P)" is pronounced "info of P" in conventional mathematical verbalization, just as f(x) is rendered "f of x." The alternative notation P.INFO has less of a natural flavor, since it tends to put the emphasis on P, although it can be read "P's info" the reason INFO(P) seems to be more natural is apparently the fact that P is variable, but INFO has a fixed significance when the notation is employed. By analogy, we could consider a vector A = (A [ l ] , A [ 2 ] , ..., A [ l00 ] ) to be a node having 100 fields named 1, 2,..., 100. Now the second field would be referred to as "2(P)" in our notation, where P points to the vector A; but if we are referring to the jth element of the vector, we find it more natural to write A [ j ] , putting the variable quantity "j" second. Similarly it seems most appropriate to put the variable quantity "P' second in the notation INFO(P).

    Perhaps the first people to recognize that the concepts "stack" (last-in-first-out) and "queue" ( first-in-first-out) are important objects of study were cost accountants interested in reducing income tax assessments; for a discussion of the "LIFO" and "FIFO" methods of pricing inventories, see any intermediate accounting textbook, e.g., C. F. and W. J. Schlatter, Cost Accounting (New York: Wiley, 1957), Chapter 7. In 1947 A. M. Turing developed a stack, called Reversion Storage, for use in subroutine linkage (see Section 1.4.5). No doubt simple uses of stacks kept in sequential memory locations were common in computer programming from the earliest days, since a stack is such a simple and natural concept. The programming of stacks in linked form appeared first in IPL, as stated above; the name "stack" stems from IPL terminology (although "pushdown list" was the more official IPL wording), and it was also independently introduced in Europe by E. W. Dijkstra. "Deque" is a term introduced by E. J. Schweppe.

    The origin of circular and doubly linked lists is obscure; presumably these ideas occurred naturally to many people. A strong factor in the popularization of these techniques was the existence of general List-processing systems based on them [principally the Knotted List Structures, CACM 5 (1962), 161-165, and Symmetric List Processor, CACM 6 (1963), 524-544, of J. Weizenbaum ] .

    Various methods for addressing and traversing multidimensional arrays of information were developed independently by clever programmers since the earliest days of computers, and thus another part of the unpublished computer folklore came into existence. This subject was first surveyed in print by H. Hellerman, CACM 5 (1962), 205-207. See also J. C. Gower, Comp. J. 4 (1962), 280-286.

    Tree structures represented explicitly in computer memory were described for the first time in applications to algebraic formula manipulation. The A-1 compiler language, developed by G. M. Hopper in 1951, used arithmetic expressions written in a three-address code; the latter is equivalent to the INFO, LLINK, and RLINK of a binary tree representation. In 1952, H. G. Kahrimanian developed algorithms for differentiating algebraic formulas represented in the A-1 compiler language; see Symposium on automatic Programming (Washington, D.C.: Office of Naval Research, May 1954), 6-14.

    Since then, tree structures in various guises have been studied independently by many people in connection with numerous computer applications, but the basic techniques for tree manipulation (not general List manipulation) have seldom appeared in print except in detailed description of particular algorithms. The first general survey was made in connection with a more general study of all data structures by K. E. Iverson and L. R. Johnson [ IBM Corp. research reports RC-390, RC-603, 1961; see Iverson, A Programming Language (New York: Wiley, 1962), Chapter 3 ] . See also G. Salton, CACM 5 (1962), 103-114.

    The concept of threaded trees is due to A. J. Perlis and C. Thornton, CACM 3 (1960), 195-204. Their paper also introduced the important idea of traversing trees in various orders, and gave numerous examples of algebraic manipulation algorithms. Unfortunately, this important paper was hastily prepared and it contains many misprints. The threaded lists of Perlis and Thornton actually were only "right-threaded trees" in our terminology; binary trees which are threaded in both directions were independently discovered by A. W. Holt, A Mathematical and Applied Investigation of Tree Structures (Thesis, U. of Pennsylvania, 1963). Postorder and preorder for the nodes of trees were called "normal along order" and "dual along order" by Z. Pawlak, Colloquium on the Foundation of Mathematics, etc. (Tihany, 1962, published by Akademiai Kiado, Budapest, 1965), 227-238. Preorder was called "subtree order" by Iverson and Johnson in the references cited above. Graphical ways to represent the connection between tree structures and corresponding linear notations were described by A. G. Oettinger, Proc. Harvard Symp. on Digital Computers and their Applications (April, 1961), 203-224.

    The history of tree structures as mathematical entities, together with a bibliography of the subject, is reviewed in Section 2.3.4.6.

    At the time this section was written, the most widespread knowledge about information structures was due to programmers' exposure to List processing systems, which have a very important part in this history. The first widely used system was IPL-V (a descendant of IPL-III, developed late in 1959); IPL-V is an interpretive system in which a programmer learns a machine-like language for List operations. At about the same time, FLPL (a set of FORTRAN sub-routines for List manipulation, also inspired by IPL but using subroutine calls instead of interpretive language) was developed by H. Gelernter and others. A third system, LISP, was designed by J. McCarthy, also in 1959. LISP is quite different from its predecessors: programs for it are expressed in mathematical functional notation combined with "conditional expressions" (see Chapter 8), then converted into a List representation. Many List processing systems have come into existence since then, of which the most prominent historically is J. Weizenbaum's SLIP; this is a set of subroutines for use in FORTRAN programs, operating on doubly linked Lists.

    An article by Bobrow and Raphael, CACM 7 (1964), 231-240, may be read as a brief introduction to IPL-V, LISP, and SLIP, and it gives a comparison of these systems. An excellent introduction to LISP has been given by P. M. Woodward and D. P. Jenkins, Comp. J. 4 (1961), 47-53. See also the authors' discussions of their own systems, which are each articles of considerable historical importance: "An introduction to IPL-V" by A. Newell and F. M. Tonge, CACM 3 (1960), 205-211; "A FORTRAN-compiled List Processing Language" by H. Gelernter, J. R. Hansen, and C. L. Gerberich, JACM 7 (1960), 87-101; "Recursive functions of symbolic expressions and their computation by machine, I" by John McCarthy, CACM 3 (1960), 184-195; "Symmetric List Processor" by J. Weizenbaum, CACM 6 (1963), 524-544. The latter article includes a complete description of all of the algorithms used in SLIP. In recent years a number of books about these systems have also been written.

    Several string manipulation systems have also appeared; these are primarily concerned with operations on variable-length strings of alphabetic information (looking for occurrences of certain substrings, etc.). Historically, the most important of these have been COMIT (V. H. Yngve, CACM 6 (1963), 83-84) and SNOBOL (D. J. Farber, R. E. Griswold, and I. P. Polonsky, JACM 11 (1964), 21-30). Although string manipulation systems have seen wide use and although they are primarily composed of algorithms such as we have seen in this chapter, they play a comparatively small role in the history of the techniques of information structure representation; users of these systems have largely been unconcerned about the details of the actual internal processes carried on by the computer. For a survey of string manipulation techniques, see S. E. Madnick, CACM 10 (1967), 420-424.

    The IPL-V and FLPL systems for List-processing did not use either a garbage collection or a reference count technique for the problem of shared Lists; instead, each List was "owned" by one List and "borrowed" by all other Lists which referred to it, and a List was erased when its "owner" allowed it to be. Hence, the programmer was enjoined to make sure no List was still borrowing any Lists that were being erased. The reference counter technique for Lists was introduced by G. E. Collins, CACM 3 (1960), 655-657; see also the important sequel to this paper, CACM 9 (1966), 578-588. Garbage collection was first described in McCarthy's article cited above; see also CACM 7 (1964), 38, and an article by Cohen and Trilling, BIT 7 (1967), 22-30.

    Unfortunately, too few people have realized that manipulation of links is not a magic, complex process that must be done by fancy List manipulation routines. There is still a widespread tendency to feel that the advantages of link manipulation can be obtained only by using a large List-processing system; actually, as we have seen in many examples throughout this chapter, it is quite easy to design linking algorithms which operate more efficiently than a general List-processing subroutine could possibly do. The author hopes that these detailed examples will help to convey a better-balanced attitude towards the use of structural links in data.

    Dynamic storage allocation algorithms were in use several years before published information about them appeared. A very readable discussion is given by W. T. Comfort, CACM 7 (1964), 357-362 (an article written in 1961). The "boundary-tag' method, introduced in Section 2.5, was designed by the author in 1962 for use in a control program for the B5000 computer. The "buddy system" was first used by H. Markowitz in connection with the SIMSCRIPT programming system in 1963, and it was independently discovered and published by K. Knowlton, CACM 8 (1965), 623-625; see also CACM 9 (1966), 616-625. For further discussion of dynamic storage allocation, see the articles by Iliffe and Jodeit, Comp. J. 5 (1962), 200-209; Bailey, Barnett, and Burleson, CACM 7 (1964), 339-346; Berztiss, CACM 8 (1965), 512-513; and D. T. Ross, CACM 10 (1967), 481-492.

    A general discussion of information structures and their relation to programming has been given by Mary d'Imperio, "Data Structures and their Representation in Storage," Annual Review of Automatic Programming 5 (Oxford: Pergamon Press), in press. This paper is also a valuable guide to the history of the topic, since it includes a detailed analysis of the structures used in connection with twelve List processing and string manipulation systems. See also the proceedings of two symposia, CACM 3 (1960), 183-234 and CACM 9 (1966), 567-643, for further historical details. (Several of the individual papers from these proceedings have already been cited above.)

    An excellent annotated bibliography, which is primarily oriented towards applications to symbol manipulation and algebraic formula manipulation but which has numerous connections with the material of this chapter, has been compiled by Jean E. Sammet, Comput. Rev. 7 (July-August 1966), B1-B31.


          in Booth, A.D. ed. "Machine Translation" Amsterdam: North-Holland 1967 view details
  • Sammet, Jean E. "Revised Annotated Descriptor Based Bibliography for the Use of Computers for Non-Numerical Mathematics" view details
          in Bobrow, D. G. (ed) "Symbol Manipulation Languages and Techniques", Proceedings of the IFIP Working Conference on Symbol Manipulation Languages. North-Holland Publishing Co., Amsterdam, 1968 view details
  • Sammet, Jean E. "Computer Languages - Principles and History" Englewood Cliffs, N.J. Prentice-Hall 1969. pp.416-436 view details
          in Bobrow, D. G. (ed) "Symbol Manipulation Languages and Techniques", Proceedings of the IFIP Working Conference on Symbol Manipulation Languages. North-Holland Publishing Co., Amsterdam, 1968 view details
  • Stock, Karl F. "A listing of some programming languages and their users" in RZ-Informationen. Graz: Rechenzentrum Graz 1971 56 view details Abstract: 321 Programmiersprachen mit Angabe der Computer-Hersteller, auf deren Anlagen die entsprechenden Sprachen verwendet werden kennen. Register der 74 Computer-Firmen; Reihenfolge der Programmiersprachen nach der Anzahl der Herstellerfirmen, auf deren Anlagen die Sprache implementiert ist; Reihenfolge der Herstellerfirmen nach der Anzahl der verwendeten Programmiersprachen.

    [321 programming languages with indication of the computer manufacturers, on whose machinery the appropriate languages are used to know.  Register of the 74 computer companies;  Sequence of the programming languages after the number of manufacturing firms, on whose plants the language is implemented;  Sequence of the manufacturing firms after the number of used programming languages.]
          in Bobrow, D. G. (ed) "Symbol Manipulation Languages and Techniques", Proceedings of the IFIP Working Conference on Symbol Manipulation Languages. North-Holland Publishing Co., Amsterdam, 1968 view details
  • Sammet, Jean E., "Programming languages: history and future" view details
          in [ACM] CACM 15(06) (June 1972) view details
  • Sammet, Jean E., "Roster of Programming Languages 1972" 58 view details
          in Computers & Automation 21(6B), 30 Aug 1972 view details
  • Stock, Marylene and Stock, Karl F. "Bibliography of Programming Languages: Books, User Manuals and Articles from PLANKALKUL to PL/I" Verlag Dokumentation, Pullach/Munchen 1973 132 view details Abstract: PREFACE  AND  INTRODUCTION
    The exact number of all the programming languages still in use, and those which are no longer used, is unknown. Zemanek calls the abundance of programming languages and their many dialects a "language Babel". When a new programming language is developed, only its name is known at first and it takes a while before publications about it appear. For some languages, the only relevant literature stays inside the individual companies; some are reported on in papers and magazines; and only a few, such as ALGOL, BASIC, COBOL, FORTRAN, and PL/1, become known to a wider public through various text- and handbooks. The situation surrounding the application of these languages in many computer centers is a similar one.

    There are differing opinions on the concept "programming languages". What is called a programming language by some may be termed a program, a processor, or a generator by others. Since there are no sharp borderlines in the field of programming languages, works were considered here which deal with machine languages, assemblers, autocoders, syntax and compilers, processors and generators, as well as with general higher programming languages.

    The bibliography contains some 2,700 titles of books, magazines and essays for around 300 programming languages. However, as shown by the "Overview of Existing Programming Languages", there are more than 300 such languages. The "Overview" lists a total of 676 programming languages, but this is certainly incomplete. One author ' has already announced the "next 700 programming languages"; it is to be hoped the many users may be spared such a great variety for reasons of compatibility. The graphic representations (illustrations 1 & 2) show the development and proportion of the most widely-used programming languages, as measured by the number of publications listed here and by the number of computer manufacturers and software firms who have implemented the language in question. The illustrations show FORTRAN to be in the lead at the present time. PL/1 is advancing rapidly, although PL/1 compilers are not yet seen very often outside of IBM.

    Some experts believe PL/1 will replace even the widely-used languages such as FORTRAN, COBOL, and ALGOL.4) If this does occur, it will surely take some time - as shown by the chronological diagram (illustration 2) .

    It would be desirable from the user's point of view to reduce this language confusion down to the most advantageous languages. Those languages still maintained should incorporate the special facets and advantages of the otherwise superfluous languages. Obviously such demands are not in the interests of computer production firms, especially when one considers that a FORTRAN program can be executed on nearly all third-generation computers.

    The titles in this bibliography are organized alphabetically according to programming language, and within a language chronologically and again alphabetically within a given year. Preceding the first programming language in the alphabet, literature is listed on several languages, as are general papers on programming languages and on the theory of formal languages (AAA).
    As far as possible, the most of titles are based on autopsy. However, the bibliographical description of sone titles will not satisfy bibliography-documentation demands, since they are based on inaccurate information in various sources. Translation titles whose original titles could not be found through bibliographical research were not included. ' In view of the fact that nany libraries do not have the quoted papers, all magazine essays should have been listed with the volume, the year, issue number and the complete number of pages (e.g. pp. 721-783), so that interlibrary loans could take place with fast reader service. Unfortunately, these data were not always found.

    It is hoped that this bibliography will help the electronic data processing expert, and those who wish to select the appropriate programming language from the many available, to find a way through the language Babel.

    We wish to offer special thanks to Mr. Klaus G. Saur and the staff of Verlag Dokumentation for their publishing work.

    Graz / Austria, May, 1973
          in Computers & Automation 21(6B), 30 Aug 1972 view details
  • Hutchins, John "Machine translation: past, present, future" Chichester, Ellis Horwood, 1986 view details Extract:
    The Massachusetts Institute of Technology made the first appointment of a research worker in the MT field. As we have seen (ch.2.4), this was the appointment in May 1951 of Yehoshua Bar- Hillel, a mathematician at the Hebrew University of Jerusalem. For two years Bar-Hillel investigated the possibilities of MT, instigated meetings, published reviews, and made some important theoretical contributions. However, he did not himself do any practical research. In July 1953 he returned to Israel. Victor H. Yngve took over and set up the research project in the Research Laboratory of Electronics at MIT. It was funded from 1954 until 1965 primarily by the National Science Foundation, although support was also given by the U.S.Army Signal Corps, the U.S.Air Force Office of Scientific Research and the U.S.Navy Office of Naval Research.

    From the beginning, MT research at MIT had a strongly theoretical bias. No attempt was made to construct interim systems which though serving some practical needs were producing output of low quality. The goal of research at MIT was throughout "to work toward the achievement of language translation completely by machine and of a quality that rivals translations made by hand" (Yngve 1967). The researchers at MIT were "not looking for short-cut methods that might yield partially adequate translations at an early date" but for "definitive solutions that will constitute permanent advances in the field." (Yngve 1961). At MIT research was focussed on advances in linguistic theory, particularly the theory of transformational grammar, and on the development of programming tools for linguistic research. In both areas, researchers connected with the MIT project made important contributions of significance going well beyond the specific domain of MT research. (A complete bibliography of the MIT group is given in Yngve 1967.)

    The need for good programming tools became apparent very early at MIT, as it did elsewhere (cf. the development of SLC at Georgetown, ch.4.3). At this time there were no highlevel languages such as Fortran and Algol, which in any case were designed primarily for mathematical work and were not suitable for non-numerical applications. Such were the complexities of early computers that linguists engaged in MT relied on expert programmers to implement their procedures in assembler code, they did not attempt to program themselves. As a result neither linguist nor programmer were fully effective. At MIT it was concluded that the solution was a programming system which allowed the linguist to write his procedures in a notation specially devised to fill his needs. The answer was COMIT, the first programming language devised for string-handling and pattern-matching. It was the product of collaborative work with the Computation Center at MIT. Research began in 1957 and a version was ready the same year, thus antedating by two years or more the first reasonably full implementation of the programming language LISP, the list-processing and symbol-manipulation language also devised for linguistic research and later adopted particularly in Artificial Intelligence (cf.ch.15 and 18.2 below).

    Yngve claimed (1967) that COMIT was learnt very easily by linguists ("about six one-hour periods are sufficient to train a novice"), and enabled the team to formulate ideas clearly and concisely. ("The availability of the COMIT notation greatly increased the productivity of the group even before we could run a single program in COMIT.") Sammet (1969: 435) corroborates this opinion, comparing COMIT favourably to LISP: "One of the outstanding things about COMIT is the discrepancy between its apparent and surface difficult notation and the actual ease of writing and using the language. This contrasts sharply... to LISP, whose notation is inherently simpler than COMIT but which seems to be much harder to learn and use." Sammet (1969: 416-436) gives a good general description of COMIT, its development, technical characteristics and theoretical foundations. Descriptions of COMIT with specific reference to its MT applications have been given by Yngve (1958, 1960a, 1967: 454-465). Extract:
    There remained the ‘sentence-by-sentence’ method of A.F.R.Brown. This was designed for French-English translation. In a lecture given to the Association for Computing Machinery in June 1957, Brown (1958) reported that by January of that year he had devised rules for dealing with 220 sentences in chemistry. He described his method thus: "I opened a recent French chemical journal at random, went to the beginning of the article, and set out to formulate verbal rules that would translate the first sentence. It had about forty words, and it took ten hours to work out the rules. Turning to the second sentence, I added new items to the dictionary, invented new rules, and modified existing rules until the system would handle both sentences. The third sentence was attacked in the same way, and so on up to 220." (There could be no better description of the ‘pure’ cyclic approach; cf. 4.4 and 8.2) Brown was confident that in this way "most of the major difficulties have been met and solved" for French, and that "further progress... should be very rapid." By June 1957 the program had been coded and tested on an ILLIAC computer. (However, dictionary lookup had not yet at this stage been mechanized.) In the programming for moving, substituting and rearranging elements much use was made of sub-routines which in Brown’s view were "so general as to be almost independent of what languages are concerned", a feature which he emphasised in later developments.


    Two years later, in June 1959, the system was ready for testing (at the same time as GAT). On a prepared French text of 200,000 words and a random text of 10,000 words the results were considered to be nearly as acceptable as those for GAT. Later the same month, at the Paris Unesco conference, Brown gave a demonstration of his French system; this was the first public demonstration of a MT system with an unprepared text. By this time, the method was developing definitely into a general programming system designed to provide facilities for various linguistic and MT operations under full control of the linguist, who was able to alter and expand data and rules whenever desirable. In recognition of this development, Brown’s system was renamed the Simulated Linguistic Computer (SLC).


    The computer implementation of the GAT method, the SERNA system, was largely the work of Peter Toma (1959), initially alone. Toma had joined the Georgetown project in June 1958 to work on dictionary searching and syntactic analysis in Zarechnak’s group.6 (Toma had worked previously at the California Institute of Technology and for the International Telemeter Corporation on the Mark I system under Gilbert King.). Toma and his colleagues obtained access to the Pentagon’s IBM 705 computer during its ‘servicing time’, and between November 1958 and June 1959 worked continuously throughout every weekend (Toma 1984). According to Toma, the test of GAT in June 1959 was run on the Pentagon computer.


    There is some controversy over the significance of Toma’s contribution to the Georgetown system. Toma claims that SERNA, acronym of the Russian ‘S Russkogo Na Angliskij’ (from Russian to English), was entirely his own creation, but Zarechnak (1979: 31-32) contends that Toma’s responsibility was limited to coordination of the programming efforts while Zarechnak had overall responsibility for the linguistic formulations. While this may be true, there is no denying that Toma’s programming skills made possible the "first significant continuous outputs for Russian to English", as Dostert readily acknowledged (in the preface to Macdonald 1963).


    On 25th January 1960 a demonstration of GAT (SERNA) was staged at the Pentagon before representatives of government agencies, rerunning some of the earlier tests of the Russian-English translations of organic chemistry. Early in 1961 the programming system for GAT was converted for use on the IBM 709. The opportunity was taken to introduce certain improvements in the efficiency and accuracy of the operations. As a result, so many alterations of the SERNA programs were necessary that in effect there was a new system; it was now called the Direct Conversion programming system, and placed under the direction of John Moyne (1962).


    Apart from Russian and French, research teams at Georgetown also examined other languages. Chinese was investigated by a team advised by John de Francis, producing in 1962 a Chinese-English MT dictionary using telegraphic code for Chinese characters, and starting work on a MT system for mathematics texts. There was some work on the comparative syntax of English and Turkish, and during 1961 some discussion about setting up a pilot project for English-Turkish translation (Macdonald 1963). Brown did a tentative study of Arabic-English MT on the SLC basis (Brown 1966). Much more substantial was the work of the Comparative Slavic Research Group set up in October 1961 under Milos Pacak. This group investigated Czech, Polish, Russian and Serbo- Croatian with the objective of establishing a common intermediary language, for use in MT systems for these languages into and from English.


    By late 1961 the SLC French-English system had been adapted for Russian-English, and it could also now be run on the IBM 709. SLC was now no longer restricted to one specific language pair but it had become a generalized programming system (Brown 1966). As a MT system for French-English translation, the SLC method remained largely the special and sole concern of Dr. Brown (Zarechnak & Brown 1961); but as a programming system it was often used to support the GAT Russian-English system.7 At the Teddington conference in September 1961, the demonstration of GAT was run on SLC only, since conversion of the SERNA programs to the IBM 709 was not yet complete. As a result of this demonstration, EURATOM (at Ispra, Italy) decided to install the Georgetown system using SLC programming, both for producing translations for their personnel and as a basis for further research (ch.11.1 below).


    Another demonstration of GAT was conducted in October 1962 at the Oak Ridge National Laboratory, under the auspices of the U.S. Atomic Energy Commission. This time the texts were in the field of cybernetics, using both prepared and unprepared texts. [...]


    By by one from the first practical applications of logical capabilities of machines was their utilization for the translation of texts from an one tongue on other. Linguistic differences represent the serious hindrance on a way for the development of cultural, social, political and scientific connections between nations. Automation of the process of a translation, the application of machines, with a help which possible to effect a translation without a knowledge of the corresponding foreign tongue, would be by an important step forward in the decision of this problem.


    It was admitted that the system, developed primarily for the field of organic chemistry, had problems with the new vocabulary and style of cybernetics literature, but clearly there was confidence in the Georgetown team’s ability to improve the programs and dictionaries, and the Oak Ridge authorities decided to install GAT for producing internal translations.


    In the event, the GAT Russian-English systems were installed at Ispra in 1963 and at Oak Ridge in 1964 at or just after the termination of the Georgetown project in March 1963. So came to an end the largest MT project in the United States. Some MT research on a Russian-English system continued at Georgetown under the direction of R.Ross Macdonald after 1965 (Josselson 1971), but it was to be on a much smaller scale and without CIA sponsorship. The reasons for the unexpected withdrawal of support in 1963 are unclear. Zarechnak (1979) believes the official version citing unsatisfactory quality was not completely honest, while Toma (1984) alludes to internal conflicts between linguists and programmers leading to wholesale resignations. Whatever the cause, there could be very little further experimental development of the Georgetown systems after their installation at Ispra and Oak Ridge. Indeed, they remained virtually unchanged until their replacements by Systran (ch.12.1) at Ispra in 1970 and at Oak Ridge in 1980.


          in Computers & Automation 21(6B), 30 Aug 1972 view details
  • Library of Congress Subject Headings C27 view details
          in Computers & Automation 21(6B), 30 Aug 1972 view details