Compiler for UNIVAC I
Hopper's second compiler, January 1953
Hopper (1955) type 2 - Compiling routines - system
in Symposium on Automatic Programming For Digital Computers, Office of Naval Research, Dept. of the Navy, Washington, D.C. PB 111 607 May 13-14 1954 view details
Automatic coding is a means for reducing problem costs and is one of the answers to a programmer's prayer. Since every problem must be reduced to a series of elementary steps and transformed into computer instructions, any method which will speed up and reduce the cost of this process is of importance.
Each and every problem must go through the same stages:
The process of analysis cannot be assisted by the computer itself. For scientific problems, mathematical or engineering, the analysis includes selecting the method of approximation, setting up specifications for accuracy of sub-routines, determining the influence of roundoff errors, and finally presenting a list of equations supplemented by definition of tolerances and a diagram of the operations. For the commercial problem, again a detailed statement describing the procedure and covering every eventuality is required. This will usually be presented in English words and accompanied again by a flow diagram.
The analysis is the responsibility of the mathematician or engineer, the methods or systems man. It defines the problem and no attempt should be made to use a computer until such an analysis is complete.
The job of the programmer is that of adapting the problem definition to the abilities and idiosyncrasies of the particular computer. He will be vitally concerned with input and output and with the flow of operations through the computer. He must have a thorough knowledge of the computer components and their relative speeds and virtues.
Receiving diagrams and equations or statements from the analysis he will produce detailed flow charts for transmission to the coders. These will differ from the charts produced by the analysts in that they will be suited to a particular computer and will contain more detail. In some cases, the analyst and programmer will be the same person.
It is then the job of the coder to reduce the flow charts to the detailed list of computer instructions. At this point, an exact and comprehensive knowledge of the computer, its code, coding tricks, details of sentinels and of pulse code are required. The computer is an extremely fast moron. It will, at the speed of light, do exactly what it is told to do no more, no less.
After the coder has completed the instructions, it must be "debugged". Few and far between and very rare are those coders, human beings, who can write programs, perhaps consisting of several hundred instructions, perfectly, the first time. The analyzers, automonitors, and other mistake hunting routines that have been developed and reported on bear witness to the need of assistance in this area. When the program has been finally debugged, it is ready for production running and thereafter for evaluation or for use of the results.
Automatic coding enters these operations at four points. First, it supplies to the analysts, information about existing chunks of program, subroutines already tested and debugged, which he may choose to use in his problem. Second, it supplies the programmer with similar facilities not only with respect to the mathematics or processing used, but also with respect to using the equipment. For example, a generator may be provided to make editing routines to prepare data for printing, or a generator may be supplied to produce sorting routines.
It is in the third phase that automatic coding comes into its own, for here it can release the coder from most of the routine and drudgery of producing the instruction code. It may, someday, replace the coder or release him to become a programmer. Master or executive routines can be designed which will withdraw subroutines and generators from a library of such routines and link them together to form a running program.
If a routine is produced by a master routine from library components, it does not require the fourth phase - debugging - from the point of view of the coding. Since the library routines will all have been checked and the compiler checked, no errors in coding can be introduced into the program (all of which presupposes a completely checked computer). The only bugs that can remain to be detected and exposed are those in the logic of the original statement of the problem.
Thus, one advantage of automatic coding appears, the reduction of the computer time required for debugging. A still greater advantage, however, is the replacement of the coder by the computer. It is here that the significant time reduction appears. The computer processes the units of coding as it does any other units of data --accurately and rapidly. The elapsed time from a programmer's flow chart to a running routine may be reduced from a matter of weeks to a matter of minutes. Thus, the need for some type of automatic coding is clear.
Actually, it has been evident ever since the first digital computers first ran. Anyone who has been coding for more than a month has found himself wanting to use pieces of one problem in another. Every programmer has detected like sequences of operations. There is a ten year history of attempts to meet these needs.
The subroutine, the piece of coding, required to calculate a particular function can be wired into the computer and an instruction added to the computer code. However, this construction in hardware is costly and only the most frequently used routines can be treated in this manner. Mark I at Harvard included several such routines — sin x, log10x, 10X- However, they had one fault, they were inflexible. Always, they delivered complete accuracy to twenty-two digits. Always, they treated the most general case. Other computers, Mark II and SEAC have included square roots and other subroutines partially or wholly built in. But such subroutines are costly and invariant and have come to be used only when speed without regard to cost is the primary consideration.
It was in the ENIAC that the first use of programmed subroutines appeared. When a certain series of operations was completed, a test could be made to see whether or not it was necessary to repeat them and sequencing control could be transferred on the basis of this test, either to repeat the operations or go on to another set.
At Harvard, Dr. Aiken had envisioned libraries of subroutines. At Pennsylvania, Dr. Mauchly had discussed the techniques of instructing the computer to program itself. At Princeton, Dr. von Neumman had pointed out that if the instructions were stored in the same fashion as the data, the computer could then operate on these instructions. However, it was not until 1951 that Wheeler, Wilkes, and Gill in England, preparing to run the EDSAC, first set up standards, created a library, and the required satellite routines and wrote a book about it, "The Preparation of Programs for Electronic Digital Computers". In this country, comprehensive automatic techniques first appeared at MIT where routines to facilitate the use of Whirlwind I by students of computers and programming were developed.
Many different automatic coding systems have been developed - Seesaw, Dual, Speed-Code, the Boeing Assembly, and others for the 701, the A—series of compilers for the UNIVAC, the Summer Session Computer for Whirlwind, MAGIC for the MIDAC and Transcode for the Ferranti Computer at Toronto. The list is long and rapidly growing longer. In the process of development are Fortran for the 704, BIOR and GP for the UNIVAC, a system for the 705, and many more. In fact, all manufacturers now seem to be including an announcement of the form, "a library of subroutines for standard mathematical analysis operations is available to users", "interpretive subroutines, easy program debugging - ... - automatic program assembly techniques can be used."
The automatic routines fall into three major classes. Though some may exhibit characteristics of one or more, the classes may be so defined as to distinguish them.
1) Interpretive routines which translate a machine-like pseudocode into machine code, refer to stored subroutines and execute them as the computation proceeds — the MIT Summer Session Computer, 701 Speed-Code, UNIVAC Short-Code are examples.
2) Compiling routines, which also read a pseudo-code, but which withdraw subroutines from a library and operate upon them, finally linking the pieces together to deliver, as output, a complete specific program for future running — UNIVAC A — compilers, BIOR, and the NYU Compiler System.
3) Generative routines may be called for by compilers, or may be independent routines. Thus, a compiler may call upon a generator to produce a specific input routine. Or, as in the sort-generator, the submission of the specifications such as item-size, position of key-to produce a routine to perform the desired operation. The UNIVAC sort-generator, the work of Betty Holberton, was the first major automatic routine to be completed. It was finished in 1951 and has been in constant use ever since. At the University of California Radiation Laboratory, Livermore, an editing generator was developed by Merrit Ellmore — later a routine was added to even generate the pseudo-code.
The type of automatic coding used for a particular computer is to some extent dependent upon the facilities of the computer itself. The early computers usually had but a single input-output device, sometimes even manually operated. It was customary to load the computer with program and data, permit it to "cook" on them, and when it signalled completion, the results were unloaded. This procedure led to the development of the interpretive type of routine. Subroutines were stored in closed form and a main program referred to them as they were required. Such a procedure conserved valuable internal storage space and speeded the problem solution.
With the production of computer systems, like the UNIVAC, having, for all practical purposes, infinite storage under the computers own direction, new techniques became possible. A library of subroutines could be stored on tape, readily available to the computer. Instead of looking up a subroutine every time its operation was needed, it was possible to assemble the required subroutines into a program for a specific problem. Since most problems contain some repetitive elements, this was desirable in order to make the interpretive process a one-time operation.
Among the earliest such routines were the A—series of compilers of which A-0 first ran in May 1952. The A-2 compiler, as it stands at the moment, commands a library of mathematical and logical subroutines of floating decimal operations. It has been successfully applied to many different mathematical problems. In some cases, it has produced finished, checked and debugged programs in three minutes. Some problems have taken as long as eighteen minutes to code. It is, however, limited by its library which is not as complete as it should be and by the fact that since it produces a program entirely in floating decimal, it is sometimes wasteful of computer time. However, mathematicians have been able rapidly to learn to use it. The elapsed time for problems— the programming time plus the running time - has been materially reduced. Improvements and techniques now known, derived from experience with the A—series, will make it possible to produce better compiling systems. Currently, under the direction of Dr. Herbert F. Mitchell, Jr., the BIOR compiler is being checked out. This is the pioneer - the first of the true data-processing compilers.
At present, the interpretive and compiling systems are as many and as different as were the computers five years ago. This is natural in the early stages of a development. It will be some time before anyone can say this is the way to produce automatic coding.
Even the pseudo-codes vary widely. For mathematical problems, Laning and Zeirler at MIT have modified a Flexowriter and the pseudo-code in which they state problems clings very closely to the usual mathematical notation. Faced with the problem of coding for ENIAC, EDVAC and/or ORDVAC, Dr. Gorn at Aberdeen has been developing a "universal code". A problem stated in this universal pseudo-code can then be presented to an executive routine pertaining to the particular computer to be used to produce coding for that computer. Of the Bureau of Standards, Dr. Wegstein in Washington and Dr. Huskey on the West Coast have developed techniques and codes for describing a flow chart to a compiler.
In each case, the effort has been three-fold:
1) to expand the computer's vocabulary in the direction required by its users.
2) to simplify the preparation of programs both in order to reduce the amount of information about a computer a user needed to learn, and to reduce the amount he needed to write.
3) to make it easy, to avoid mistakes, to check for them, and to detect them.
The ultimate pseudo-code is not yet in sight. There probably will be at least two in common use; one for the scientific, mathematical and engineering problems using a pseudo-code closely approximating mathematical symbolism; and a second, for the data-processing, commercial, business and accounting problems. In all likelihood, the latter will approximate plain English.
The standardization of pseudo-code and corresponding subroutine is simple for mathematical problems. As a pseudo-code "sin x" is practical and suitable for "compute the sine of x", "PWT" is equally obvious for "compute Philadelphia Wage Tax", but very few commercial subroutines can be standardized in such a fashion. It seems likely that a pseudocode "gross-pay" will call for a different subroutine in every installation. In some cases, not even the vocabulary will be common since one computer will be producing pay checks and another maintaining an inventory.
Thus, future compiling routines must be independent of the type of program to be produced. Just as there are now general-purpose computers, there will have to be general-purpose compilers. Auxiliary to the compilers will be vocabularies of pseudo-codes and corresponding volumes of subroutines. These volumes may differ from one installation to another and even within an installation. Thus, a compiler of the future will have a volume of floating-decimal mathematical subroutines, a volume of inventory routines, and a volume of payroll routines. While gross-pay may appear in the payroll volume both at installation A and at installation B, the corresponding subroutine or standard input item may be completely different in the two volumes. Certain more general routines, such as input-output, editing, and sorting generators will remain common and therefore are the first that are being developed.
There is little doubt that the development of automatic coding will influence the design of computers. In fact, it is already happening. Instructions will be added to facilitate such coding. Instructions added only for the convenience of the programmer will be omitted since the computer, rather than the programmer, will write the detailed coding. However, all this will not be completed tomorrow. There is much to be learned. So far as each group has completed an interpreter or compiler, they have discovered in using it "what they really wanted to do". Each executive routine produced has lead to the writing of specifications for a better routine.
1955 will mark the completion of several ambitious executive routines. It will also see the specifications prepared by each group for much better and more efficient routines since testing in use is necessary to discover these specifications. However, the routines now being completed will materially reduce the time required for problem preparation; that is, the programming, coding, and debugging time. One warning must be sounded, these routines cannot define a problem nor adapt it to a computer. They only eliminate the clerical part of the job.
Analysis, programming, definition of a problem required 85%, coding and debugging 15$, of the preparation time. Automatic coding materially reduces the latter time. It assists the programmer by defining standard procedures which can be frequently used. Please remember, however, that automatic programming does not imply that it is now possible to walk up to a computer, say "write my payroll checks", and push a button. Such efficiency is still in the science-fiction future.
in the High Speed Computer Conference, Louisiana State University, 16 Feb. 1955, Remington Rand, Inc. 1955 view details
Let me elaborate these points with examples. UNICODE is expected to require about fifteen man-years. Most modern assembly systems must take from six to ten man-years. SCAT expects to absorb twelve people for most of a year. The initial writing of the 704 FORTRAN required about twenty-five man-years. Split among many different machines, IBM's Applied Programming Department has over a hundred and twenty programmers. Sperry Rand probably has more than this, and for utility and automatic coding systems only! Add to these the number of customer programmers also engaged in writing similar systems, and you will see that the total is overwhelming.
Perhaps five to six man-years are being expended to write the Alodel 2 FORTRAN for the 704, trimming bugs and getting better documentation for incorporation into the even larger supervisory systems of various installations. If available, more could undoubtedly be expended to bring the original system up to the limit of what we can now conceive. Maintenance is a very sizable portion of the entire effort going into a system.
Certainly, all of us have a few skeletons in the closet when it comes to adapting old systems to new machines. Hardly anything more than the flow charts is reusable in writing 709 FORTRAN; changes in the characteristics of instructions, and tricky coding, have done for the rest. This is true of every effort I am familiar with, not just IBM's.
What am I leading up to? Simply that the day of diverse development of automatic coding systems is either out or, if not, should be. The list of systems collected here illustrates a vast amount of duplication and incomplete conception. A computer manufacturer should produce both the product and the means to use the product, but this should be done with the full co-operation of responsible users. There is a gratifying trend toward such unification in such organizations as SHARE, USE, GUIDE, DUO, etc. The PACT group was a shining example in its day. Many other coding systems, such as FLAIR, PRINT, FORTRAN, and USE, have been done as the result of partial co-operation. FORTRAN for the 705 seems to me to be an ideally balanced project, the burden being carried equally by IBM and its customers.
Finally, let me make a recommendation to all computer installations. There seems to be a reasonably sharp distinction between people who program and use computers as a tool and those who are programmers and live to make things easy for the other people. If you have the latter at your installation, do not waste them on production and do not waste them on a private effort in automatic coding in a day when that type of project is so complex. Offer them in a cooperative venture with your manufacturer (they still remain your employees) and give him the benefit of the practical experience in your problems. You will get your investment back many times over in ease of programming and the guarantee that your problems have been considered.
Extract: IT, FORTRANSIT, SAP, SOAP, SOHIO
The IT language is also showing up in future plans for many different computers. Case Institute, having just completed an intermediate symbolic assembly to accept IT output, is starting to write an IT processor for UNIVAC. This is expected to be working by late summer of 1958. One of the original programmers at Carnegie Tech spent the last summer at Ramo-Wooldridge to write IT for the 1103A. This project is complete except for input-output and may be expected to be operational by December, 1957. IT is also being done for the IBM 705-1, 2 by Standard Oil of Ohio, with no expected completion date known yet. It is interesting to note that Sohio is also participating in the 705 FORTRAN effort and will undoubtedly serve as the basic source of FORTRAN-to- IT-to-FORTRAN translational information. A graduate student at the University of Michigan is producing SAP output for IT (rather than SOAP) so that IT will run on the 704; this, however, is only for experience; it would be much more profitable to write a pre-processor from IT to FORTRAN (the reverse of FOR TRANSIT) and utilize the power of FORTRAN for free.
in "Proceedings of the Fourth Annual Computer Applications Symposium" , Armour Research Foundation, Illinois Institute of Technology, Chicago, Illinois 1957 view details
in [ACM] CACM 1(06) (June 1958) view details
in [ACM] CACM 1(06) (June 1958) view details
in E. M. Crabbe, S. Ramo, and D. E. Wooldridge (eds.) "Handbook of Automation, Computation, and Control," John Wiley & Sons, Inc., New York, 1959. view details
in [ACM] CACM 6(03) (Mar 1963) view details
The first large scale electronic computer available commercially was the Univac I (1951). The first Automatic Programming group associated with a commercial computer effort was the group set up by Dr. Grace Hopper in what was then the Eckert-Mauchly Computer Corp., and which later became the Univac Division of Sperry Rand. The Univac had been designed so as to be relatively easy to program in its own code. It was a decimal, alphanumeric machine, with mnemonic instructions that were easy to remember and use. The 12 character word made scaling of many fixed-point calculations fairly easy. It was not always easy to see the advantage of assembly systems and compilers that were often slow and clumsy on a machine with only 12,000 characters of high speed storage (200 microseconds average access time per 12 character word). In spite of occasional setbacks, Dr. Hopper persevered in her belief that programming should and would be done in problem-oriented languages. Her group embarked on the development of a whole series of languages, of which the most used was probably A2, a compiler that provided a three address floating point system by compiling calls on floating point subroutines stored in main memory. The Algebraic Translator AT3 (Math-Matic) contributed a number of ideas to Algol and other compiler efforts, but its own usefulness was very much limited by the fact that Univac had become obsolete as a scientific computer before AT3 was finished. The B0 (Flow-Matic) compiler was one of the major influences on the COBOL language development which will be discussed at greater length later. The first sort generators were produced by the Univac programming group in 1951. They also produced what was probably the first large scale symbol manipulation program, a program that performed differentiation of formulas submitted to it in symbolic form.
in [AFIPS JCC 25] Proceedings of the 1964 Spring Joint Computer Conference SJCC 1964 view details
History And Bibliography
Linear lists and rectangular arrays of information kept in consecutive memory locations were widely used from the earliest days of stored-program computers, and the earliest treatises on programming gave the basic algorithms for traversing these structures. [ For example, see J. von Neumann, Collected Works, vol. V, 113-116 (written 1947); M. V. Wilkes, D. J. Wheeler, S. Gill, The Preparation of Programs for an Electronic Digital Computer (Reading, Mass.: Addison-Wesley, 1951), subroutine V-1. ] Before the days of index registers, operations on sequential linear lists were done by performing arithmetic on the machine language instructions themselves, and this type of operation was one of the early motivations for having a computer whose programs share memory space with the data they manipulate.
Techniques which permit variable-length linear lists to share sequential locations, in such a way that they shift back and forth when necessary, as described in Section 2.2.2, were apparently a much later invention. J. Dunlap of Digitek Corporation developed these techniques in 1963 in connection with the design of a series of compiler programs; about the same time the idea independently appeared in the design of a COBOL compiler at IBM Corporation, and a collection of related subroutines called CITRUS was subsequently used at, various installations. The techniques remained unpublished until after they had been independently developed by Jan Garwick of Norway; see BIT 4 (1964), 137-140.
The idea of having linear lists in non-sequential locations seems to have originated in connection with the design of computers with drum memories; the first such computer was developed by General Electric Corporation during 1946-1948, and the IBM 650 was the most notable later machine of this type. After executing the instruction in location n, such a computer is usually not ready to get its next instruction from location n + 1 because the drum has already rotated past this point. Depending on the instruction being performed, the most favorable position for the next instruction might be n + 7 or n + 18, etc., and the machine can operate up to six or seven times faster if its instructions are optimally located rather than consecutive. [ For a discussion of the interesting problems concerning best placement of these instructions, see the author's article in JACM 8 (1961), 119-150. ] Therefore the machine design provides an extra address field in each machine language instruction, to serve as a link to the next instruction. Such a machine is called a "one-plus-one-address computer," as distinguished from MIX which is a one-address computer." The design of one-plus-one-address computers is apparently the first appearance of the linked-list idea within computer programs, although the dynamic insertion and deletion operations which we have used so frequently in this chapter were still unknown.
Linked memory techniques were really born when A. Newell, C. Shaw, and H. Simon began their investigations of heuristic problem-solving by machine. As an aid to writing programs that searched for proofs in mathematical logic, they designed the first " listprocessing" language IPL-II in the spring of 1956. (IPL stands for Information Processing Language.) This was a system which made use of links and included important concepts like the list of available space, but the concept of stacks was not yet well developed; IPL-III was designed a year later, and it included "push down" and "pop up" for stacks as important basic operations. (For references to IPL-II see "The Logic Theory Machine," IRE Transactions on Information Theory IT-2 (Sept. 1956), 61-70; see also Proc. Western Joint Comp. Conf. (Feb. 1957), 218-240. Material on IPL- III first appeared in course notes given at the University of Michigan in the summer of 1957.)
The work of Newell, Shaw, and Simon inspired many other people to use linked memory (which was often at the time referred to as NSS memory), mostly for problems dealing with simulation of human thought processes. Gradually, the techniques became recognized as basic computer programming tools; the first article describing the usefulness of linked memory for "down-to-earth" problems was published by J. W. Carr, III, in CACM 2 (Feb. 1959), 4-6. Carr pointed out in this article that linked lists can readily be manipulated in ordinary programming languages, without requiring sophisticated sub-routines or interpretive systems. See also G. A. Blaauw, IBM J. Res. and Dev. 3 (1959), 288-301.
At first, one-word nodes were used for linked tables, but about 1959 the usefulness of several consecutive words per node and "multilinked" lists was gradually being discovered by several different groups of people. The first article dealing specifically with this idea was published by D. T. Ross, CACM 4 (1961), 147-150; at that time he used the term "plex" for what has been called a "node" in this chapter, but he subsequently has used the word "plex" in a different sense to denote a class of nodes combined with associated algorithms for their traversal.
Notations for referring to fields within nodes are generally of two kinds: the name of the field either precedes or follows the pointer designation. Thus, while we have written "INFO(P)" in this chapter, some other authors write, for example, "P.INFO ". At the time this chapter was prepared, the two notations seemed to be equally prominent. The notation adopted here has been used in the author's lectures since 1962 and it has also been independently devised by numerous other people, primarily because it is natural to use mathematical functional notation to describe attributes of a node. (For example, see the article by N. Wirth and C. A. R. Hoare, CACM 9 (1966), 413-432.) Note that "INFO(P)" is pronounced "info of P" in conventional mathematical verbalization, just as f(x) is rendered "f of x." The alternative notation P.INFO has less of a natural flavor, since it tends to put the emphasis on P, although it can be read "P's info" the reason INFO(P) seems to be more natural is apparently the fact that P is variable, but INFO has a fixed significance when the notation is employed. By analogy, we could consider a vector A = (A [ l ] , A [ 2 ] , ..., A [ l00 ] ) to be a node having 100 fields named 1, 2,..., 100. Now the second field would be referred to as "2(P)" in our notation, where P points to the vector A; but if we are referring to the jth element of the vector, we find it more natural to write A [ j ] , putting the variable quantity "j" second. Similarly it seems most appropriate to put the variable quantity "P' second in the notation INFO(P).
Perhaps the first people to recognize that the concepts "stack" (last-in-first-out) and "queue" ( first-in-first-out) are important objects of study were cost accountants interested in reducing income tax assessments; for a discussion of the "LIFO" and "FIFO" methods of pricing inventories, see any intermediate accounting textbook, e.g., C. F. and W. J. Schlatter, Cost Accounting (New York: Wiley, 1957), Chapter 7. In 1947 A. M. Turing developed a stack, called Reversion Storage, for use in subroutine linkage (see Section 1.4.5). No doubt simple uses of stacks kept in sequential memory locations were common in computer programming from the earliest days, since a stack is such a simple and natural concept. The programming of stacks in linked form appeared first in IPL, as stated above; the name "stack" stems from IPL terminology (although "pushdown list" was the more official IPL wording), and it was also independently introduced in Europe by E. W. Dijkstra. "Deque" is a term introduced by E. J. Schweppe.
The origin of circular and doubly linked lists is obscure; presumably these ideas occurred naturally to many people. A strong factor in the popularization of these techniques was the existence of general List-processing systems based on them [principally the Knotted List Structures, CACM 5 (1962), 161-165, and Symmetric List Processor, CACM 6 (1963), 524-544, of J. Weizenbaum ] .
Various methods for addressing and traversing multidimensional arrays of information were developed independently by clever programmers since the earliest days of computers, and thus another part of the unpublished computer folklore came into existence. This subject was first surveyed in print by H. Hellerman, CACM 5 (1962), 205-207. See also J. C. Gower, Comp. J. 4 (1962), 280-286.
Tree structures represented explicitly in computer memory were described for the first time in applications to algebraic formula manipulation. The A-1 compiler language, developed by G. M. Hopper in 1951, used arithmetic expressions written in a three-address code; the latter is equivalent to the INFO, LLINK, and RLINK of a binary tree representation. In 1952, H. G. Kahrimanian developed algorithms for differentiating algebraic formulas represented in the A-1 compiler language; see Symposium on automatic Programming (Washington, D.C.: Office of Naval Research, May 1954), 6-14.
Since then, tree structures in various guises have been studied independently by many people in connection with numerous computer applications, but the basic techniques for tree manipulation (not general List manipulation) have seldom appeared in print except in detailed description of particular algorithms. The first general survey was made in connection with a more general study of all data structures by K. E. Iverson and L. R. Johnson [ IBM Corp. research reports RC-390, RC-603, 1961; see Iverson, A Programming Language (New York: Wiley, 1962), Chapter 3 ] . See also G. Salton, CACM 5 (1962), 103-114.
The concept of threaded trees is due to A. J. Perlis and C. Thornton, CACM 3 (1960), 195-204. Their paper also introduced the important idea of traversing trees in various orders, and gave numerous examples of algebraic manipulation algorithms. Unfortunately, this important paper was hastily prepared and it contains many misprints. The threaded lists of Perlis and Thornton actually were only "right-threaded trees" in our terminology; binary trees which are threaded in both directions were independently discovered by A. W. Holt, A Mathematical and Applied Investigation of Tree Structures (Thesis, U. of Pennsylvania, 1963). Postorder and preorder for the nodes of trees were called "normal along order" and "dual along order" by Z. Pawlak, Colloquium on the Foundation of Mathematics, etc. (Tihany, 1962, published by Akademiai Kiado, Budapest, 1965), 227-238. Preorder was called "subtree order" by Iverson and Johnson in the references cited above. Graphical ways to represent the connection between tree structures and corresponding linear notations were described by A. G. Oettinger, Proc. Harvard Symp. on Digital Computers and their Applications (April, 1961), 203-224.
The history of tree structures as mathematical entities, together with a bibliography of the subject, is reviewed in Section 220.127.116.11.
At the time this section was written, the most widespread knowledge about information structures was due to programmers' exposure to List processing systems, which have a very important part in this history. The first widely used system was IPL-V (a descendant of IPL-III, developed late in 1959); IPL-V is an interpretive system in which a programmer learns a machine-like language for List operations. At about the same time, FLPL (a set of FORTRAN sub-routines for List manipulation, also inspired by IPL but using subroutine calls instead of interpretive language) was developed by H. Gelernter and others. A third system, LISP, was designed by J. McCarthy, also in 1959. LISP is quite different from its predecessors: programs for it are expressed in mathematical functional notation combined with "conditional expressions" (see Chapter 8), then converted into a List representation. Many List processing systems have come into existence since then, of which the most prominent historically is J. Weizenbaum's SLIP; this is a set of subroutines for use in FORTRAN programs, operating on doubly linked Lists.
An article by Bobrow and Raphael, CACM 7 (1964), 231-240, may be read as a brief introduction to IPL-V, LISP, and SLIP, and it gives a comparison of these systems. An excellent introduction to LISP has been given by P. M. Woodward and D. P. Jenkins, Comp. J. 4 (1961), 47-53. See also the authors' discussions of their own systems, which are each articles of considerable historical importance: "An introduction to IPL-V" by A. Newell and F. M. Tonge, CACM 3 (1960), 205-211; "A FORTRAN-compiled List Processing Language" by H. Gelernter, J. R. Hansen, and C. L. Gerberich, JACM 7 (1960), 87-101; "Recursive functions of symbolic expressions and their computation by machine, I" by John McCarthy, CACM 3 (1960), 184-195; "Symmetric List Processor" by J. Weizenbaum, CACM 6 (1963), 524-544. The latter article includes a complete description of all of the algorithms used in SLIP. In recent years a number of books about these systems have also been written.
Several string manipulation systems have also appeared; these are primarily concerned with operations on variable-length strings of alphabetic information (looking for occurrences of certain substrings, etc.). Historically, the most important of these have been COMIT (V. H. Yngve, CACM 6 (1963), 83-84) and SNOBOL (D. J. Farber, R. E. Griswold, and I. P. Polonsky, JACM 11 (1964), 21-30). Although string manipulation systems have seen wide use and although they are primarily composed of algorithms such as we have seen in this chapter, they play a comparatively small role in the history of the techniques of information structure representation; users of these systems have largely been unconcerned about the details of the actual internal processes carried on by the computer. For a survey of string manipulation techniques, see S. E. Madnick, CACM 10 (1967), 420-424.
The IPL-V and FLPL systems for List-processing did not use either a garbage collection or a reference count technique for the problem of shared Lists; instead, each List was "owned" by one List and "borrowed" by all other Lists which referred to it, and a List was erased when its "owner" allowed it to be. Hence, the programmer was enjoined to make sure no List was still borrowing any Lists that were being erased. The reference counter technique for Lists was introduced by G. E. Collins, CACM 3 (1960), 655-657; see also the important sequel to this paper, CACM 9 (1966), 578-588. Garbage collection was first described in McCarthy's article cited above; see also CACM 7 (1964), 38, and an article by Cohen and Trilling, BIT 7 (1967), 22-30.
Unfortunately, too few people have realized that manipulation of links is not a magic, complex process that must be done by fancy List manipulation routines. There is still a widespread tendency to feel that the advantages of link manipulation can be obtained only by using a large List-processing system; actually, as we have seen in many examples throughout this chapter, it is quite easy to design linking algorithms which operate more efficiently than a general List-processing subroutine could possibly do. The author hopes that these detailed examples will help to convey a better-balanced attitude towards the use of structural links in data.
Dynamic storage allocation algorithms were in use several years before published information about them appeared. A very readable discussion is given by W. T. Comfort, CACM 7 (1964), 357-362 (an article written in 1961). The "boundary-tag' method, introduced in Section 2.5, was designed by the author in 1962 for use in a control program for the B5000 computer. The "buddy system" was first used by H. Markowitz in connection with the SIMSCRIPT programming system in 1963, and it was independently discovered and published by K. Knowlton, CACM 8 (1965), 623-625; see also CACM 9 (1966), 616-625. For further discussion of dynamic storage allocation, see the articles by Iliffe and Jodeit, Comp. J. 5 (1962), 200-209; Bailey, Barnett, and Burleson, CACM 7 (1964), 339-346; Berztiss, CACM 8 (1965), 512-513; and D. T. Ross, CACM 10 (1967), 481-492.
A general discussion of information structures and their relation to programming has been given by Mary d'Imperio, "Data Structures and their Representation in Storage," Annual Review of Automatic Programming 5 (Oxford: Pergamon Press), in press. This paper is also a valuable guide to the history of the topic, since it includes a detailed analysis of the structures used in connection with twelve List processing and string manipulation systems. See also the proceedings of two symposia, CACM 3 (1960), 183-234 and CACM 9 (1966), 567-643, for further historical details. (Several of the individual papers from these proceedings have already been cited above.)
An excellent annotated bibliography, which is primarily oriented towards applications to symbol manipulation and algebraic formula manipulation but which has numerous connections with the material of this chapter, has been compiled by Jean E. Sammet, Comput. Rev. 7 (July-August 1966), B1-B31.
in [AFIPS JCC 25] Proceedings of the 1964 Spring Joint Computer Conference SJCC 1964 view details
in Belzer, J. ; A. G. Holzman, A. Kent, (eds) Encyclopedia of Computer Science and Technology, Marcel Dekker, Inc., New York. 1979 view details