AED-1(ID:4603/aed003)

Portable AED Compiler  


Portable AED Compiler

"December 1964-May 1965: With new Integrated Packages for Valued Blocks (in-line, one-call functions); arrays of values as single call argument; GENCAL generated calls (allowing compile-and-go compile-time use of packages in PRESET phase [but never exploited]) -- all added to language -- AED-0 Compiler is considered "complete." Portable AED-1 Compiler for AED-0 language now is focus. Integrated Packages will extend semantics without requiring syntax changes. New packages for BSS Plex Dump; Marked Stacks; Recursive RWORD ("read a word" lexical input); Generalized String Package with typed pointers. Delayed Merge, with Atomic and Molecular Stepping Functions will allow complete symbolic structures to be reordered and repackaged."


Structures:
Related languages
AED-0 => AED-1   Extension of

References:
  • Wolman, Barry Lawrence "Operators for Manipulating Language Structures" pp1601-1628 view details Extract: Introduction
    Introduction
    The Computer-Aided Design Project of the M.I.T. Electronic Systems Laboratory is constructing two major programming systems for man-machine problem-solving. The purpose of the AED-I System (Automated Engineering Design or Algol Extended for Design) is to provide a powerful machine-independent compiler system for generalized computer programming. The purpose of the CADET System (Computer-Aided Design Experimental Translator) is to provide a generalized framework for setting up specialized problem-oriented design systems. Future versions of the two systems will be combined to provide a single scheme to meet both purposes, and CADET may be viewed as the experimental environment for developing the final system. Not only will CADET include provision for processing word and picture (light-pen) language in the same way, but it also will enable generalized "symbolic computation" as well. The latter feature is the subject of this paper.
    The viewpoint we have taken is that all aspects of the engineering design problem can be treated initially as statements in some appropriate pre-defined language(s). The problem description statements are converted by a table-driven language translator based upon Ross' Algorithmic Theory of Language into an intermediate form, called the first-pass structure, which models the syntax and semantics of the original statement. The characteristics of the language(s) used to describe the problem may be specified and altered via the AED Jr. System, which is incorporated into both AED-I and CADET.
    All of the processes desired in the CADET System can be considered to be the result of some operator applied to a first-pass structure so as to transform the meaning of the statement into a suitable form. The result of applying an operator may be any entity capable of representation with in the computer. An operator may yield a number of series of numbers, a truth value, an arbitrary data structure, or another first-pass structure.
    A machine language computer program, for example, results from a compilation operator. The picture on a display console may be generated by a pictorial display operator. Manipulation of algebraic expressions can be performed by the composition of operators for simplification, evaluation, equality testing, and so forth.
    The use of a common data structure for modeling all statements which arise in a design problem, whatever the form or subject matter may be, greatly simplifies the internal structure of CADET. All of the operators which reference or generate first-pass structures can be written in terms of a standard set of manipulation programs. These programs combine in one unit the many different functions that are required in the CADET System. In this paper we describe a preliminary version of the basic operator capability; which we have called CAM (for Computer Aided Manipulation). Extract: Purpose of CAM
    Purpose of CAM
    CAM is a general purpose set of programs for manipulating first-pass structures. CAM allows a programmer, coding in the AED language for instance, to reference, test, and alter the parsed representations of statements. CAM is not restricted to algebraic expressions or any other particular grammar, however. At any time, the characteristics of the strings being processed by CAM are controlled by the grammar specification designated by the user.
    The primary purpose of CAM is to provide a medium in which special purpose manipulation programs may be written. Along with the amount of time and space needed to perform manipulations, a measure of the effectiveness of CAM will be the amount of programming effort needed to program manipulation procedures.
    Extract: Foundations
    Foundations
    The design of CAM is based heavily on the properties of the first- pass structure and the means by which first-pass structures are generated. Since much of what follows in this paper will involve some familiarity with these properties, we shall describe briefly a realization employed for the preliminary version of CAM; a complete description of the theory and construction of first-pass structures may be found in the literature.
    The first-pass structure is a binary tree in which the syntactic structure of a statement is represented by the manner in which nodes are connected. The Parsing Algorithm which constructs the first-pass structure is driven by a set of tables which specify
    1) the "type" of every vocabulary word o r how to compute this "type" from the context in which the word was used, and
    2) the classes or types of phrases the vocabulary word "likes" on its left and on its right. The concepts of "types" and "likes" constitute an alternative to the more commonly used Backus Normal Form description of a language.
    The first-pass structure is constructed from nodes called first- pass beads. Each node is an n-component element which contains the following information:
    type - A meta-property of the vocabulary word. The type may be constant or it may be computed dynamically.
    voc - A pointer to the entry in the vocabulary table which contains the spelling and other properties of the word associated with this node.
    lvar and rvar - Pointers to the left and right context of the vocabulary word. They may point to entries in the symbol table (atoms) or to other nodes in the first-pass structure.
    major and minor - Pointers which specify the semantic content of first-pass structure.
    The schematic representation of first-pass beads that will be used in this paper is illustrated in Figure 1. Figures 2 through 4 show the parsed representations of statements in three different languages.
    The lvar and rvar components of the first-pass bead determine the form of the first-pass structure; the major and minor components, which constitute the "precedence string", specify the order in which each node should be examined to determine the meaning of the first-pass structure step by step, orderly manner. The major component of each node points to the node that should be considered after this node has been completely processed. If the minor component is present, it indicates that some preliminary  evaluation is required before treating this node. Frequently this involves a fresh start to evaluate the meaning of a new branch of the tree.
    The precedence string is formed so that every node will be "evaluated" only after both its lvar and rvar have been similarly treated. This obviates the need for recursive, top to bottom processing of an expression; the meaning of a first-pass structure may be determined iteratively, from bottom to top.
    The major component of the topmost node of a first-pass structure is normally empty, i.e., there is no node which comes after the topmost node. This empty major position is used to contain a pointer to the starting node of the tree, this is called wrapping up to a first-pass structure. Wrapping up a tree essentially transforms the precedence string into a closed ring of pointers; in this case the last node is actually the same as the first, the difference being in how the node is reached. A Boolean component wrapped indicates which of the two possible uses of the major component applies for a given node.
    Two complete first-pass structures for statements from two different languages are shown in Figure 5. In both examples it may be seen that the solid major precedence pointers merely act as reversed lvar or rvar pointers.
    The example on the left illustrates the use of normal precedence.
    The dotted major precedence pointers point only to atom-atom occurrences and are used always to indicate the start of a new branch in the tree. This corresponds to evaluating the statement from "the inner parentheses out".
    The second example illustrates the use of modified precedence.
    Here minor precedence components point to various spots in the tree, often to nodes with an atomic lvar and a non-atomic rvar. Since such nodes will be visited twice, they can have a modifying effect on the words which follow them in the input string and which will constitute their rvar connection.
    The use of modifiers, as such special words are called, greatly increases the richness of the language which can be processed.
    In order to simplify the design of programs which process first-pass structures, it is convenient to have a Precedence Follower program which starts at a specified node of a first-pass structure and follows the precedence string until another specified node is reached. Every time the Precedence Follower reaches a node, it calls a user program giving it the location of the current node and information about (1) how the node was reached, and (2) the surroundings of the current node, Thus the Precedence Follower acts as a "mouse" solving a maze, carrying a user function from node to node.
    Extract: Structure of CAM
    Structure of CAM
    The preliminary version of CAM consists of an integrated group of subroutines that provide a basic foundation for manipulating first-pass structures. The facilities which are available may be divided into several groups, namely: translation of expressions to and from first-pass structure form, general control and sequencing, testing, alteration, and internal housekeeping. Although special functions for dealing with algebraic expressions  are not provided, they could be easily added. This omission is in keeping with our desire for a general-purpose manipulation system.
    All of the expressions processed or used by CAM ultimately begin as a string of characters. These characters may be read from an on-line typewriter console or other input medium, from a disk storage file, or may be ".BCI. constants" incorporated into an AED program which uses CAM, in which case they are compiled as strings of characters. All of the CAM programs that expect a pointer to a first-pass structure as one of their arguments call a subroutine which uses the Parsing Algorithm to translate the string of characters. The first time the expression is referenced, it will be parsed; it will remain in first-pass structure form thereafter.
    Printing of the expression represented by a first-pass structure is accomplished by a subroutine which, in essence, recreates the original input string. Since this program is concerned only with the syntax of an expression, an operator which follows the precedence string would not be appropriate. The most elegant formulation is a recursive procedure.
    Subroutines are also available for printing the information contained in the precedence string or printing the entire first-pass structure in tree format. These latter two functions are quite useful when a manipulation program (or the grammar for a language) is being debugged.


    Mechanics of sequencing through a first-pass structure are controlled by another set of subroutines. Entries are available for taking the first step along the precedence string, taking the next step along the string, and checking to see if the last step has been taken. These sub-routines  are used by several versions of the Precedence Follower "mouse" described earlier, and by almost all of the other programs which comprise the CAM System. Since the sequencing programs have been written as "pure" procedures, several manipulation programs may be sequencing through the same or different first-pass structure s at the same time.
    What a manipulation program does when it reaches a node of a first-pass structure depends on the local context of the node and how the node was reached. All of the sequencing procedures employ a three bit "Octal Code" which contains this information in condensed form. This code and a pointer to some node of the first-pass structure uniquely specify a position along the precedence string. Table 1 lists the seven Octal Codes and their associated meanings.
    If the current node has been reached for the first time and a minor component exists, the sequencing programs will move to the node specified by the minor. The major component will be used if the node has been reached for the second time or the minor is empty the first time the node is encountered.
    Testing of first-pass structures is performed by an operator which compares two first-pass structures for equality, and by a pattern matching operator. The matching operator examines a portion, or possibly all, of an expression and attempts to create a match with a specified pattern which may contain "dummy" variables. A successful match creates values for the dummy variables; these may be used to control the alteration of the original or some other expression.
    The use of a matching operator makes it relatively easy to code other system functions in AED, using the matching operator as a base. The matching operator approach has an additional advantage in its generality; the matching operator depends only on the general characteristics of first-pass  structures and is essentially independent of the particular language being employed. Thus the CAM matching operator can be easily used for non-algebraic problems, a capability very important for the wide class of facilities to be implemented in CADET.
    Existing symbol manipulation systems such as COMIT or SNOBOL deal with unstructured streams of characters. Any structure associated with the characters being processed is entirely due to the particular manipulation program which is doing the processing. Since the CAM matching operator deals with the parsed representation of the character string, it can take advantage of the full syntactic and semantic content of the statement.

    Procedure EQ tests two items for equality, either or both may be expressions. If both items are non-atomic, EQ will use the sequencing subroutines to step through both first-pass structures. At each step EO compares the precedence codes and voc components of each tree, the lvar and rvar pointers are checked when either specify atomic symbols. If any of these quantities differ, EQ reports that the two expressions are not equal. By following the precedence string EO checks both the syntax and semantics of its two arguments, are cursive procedure would only check the syntax.

    Procedure MATCH determines if the expression specified by its first argument matches the pattern contained in the expression pointed to by its second argument. MATCH is concerned with establishing values for dummy variables; currently, any atomic symbol beginning with " " is considered to be a dummy variable. Before performing the actual matching operation, MATCH uses the sequencing programs to scan through the pattern so that the values currently assigned to dummy variables may be erased.

    Procedure M performs the actual match, operating recursively so that main connectors are considered first. The first time that a dummy variable is encountered in the pattern, a copy of the corresponding node in the other tree becomes the value of the dummy symbol. The next time this dummy is encountered, its value will be used for the comparison.
    If procedure M is called directly by the user, any dummy variable in the pattern will have retained the values set by previous calls of M or MATCH. This means that the matching operator may be used to control the pattern used in a subsequent match.
    The replacement or alteration of a portion or possibly all of a first- pass structure is controlled by a set of substitution subroutines. The subroutines in the substitution group update the precedence string to conform with the new meaning of the expression that was altered and interact with the sequencing procedures to insure that the correct position within the first-pass structure is maintained. There also is an entry which expands a first-pass structure by substituting the values of dummy variables wherever the dummy symbols occur. The expression resulting from an application of this expansion operation may be substituted for some specified node in another first-pass structure.
    The replacement of a sub-section of a first-pass structure by another binary tree involves connecting the new structure and disposing of the part that was removed. As a result of this insertion, there will usually be a minor component coming from some node higher in the tree that has to be changed. Since pointers are "one-way", major components are used to climb up the tree until the desired minor is found. When the topmost node is reached without the minor being found, the topmost major is changed since it really specifies the starting position of the entire tree.
    The matching and substitution subroutines need to copy and erase sections of first-pass structures. These capabilities are provided by the housekeeping package. First-pass structures which are no longer needed are returned to a list of free storage registers.

    Subroutine COPY creates an exact copy of the first-pass structure pointed to by its argument. If the procedure string was not present, copying a tree would be a trivial recursive operation. The need for an exact copy, including the precedence string, makes the problem much more complex. The easiest way to reproduce the precedence string is to use it, so COPY operates from the bottom of the tree towards the top.

    Procedure ERASE also uses the precedence string to sequence through a first-pass structure. A node will be returned to the free storage pool when the node has been reached for the last time. Thus far we have described a system in which the programmer need not be aware of the internal representation of first-pass structures. It would be too restrictive to prevent the more knowledgeable programmer from dealing directly with the various components that constitute a first- pass structure. Hence, users may also examine and construct portions of first-pass structure s in whatever manner t h e y wish, provided only that any structures they create be complete and correct.
    Extract: Results
    Results
    Experience gained with the preliminary version of CAM has indicated several areas that will require modification during the transition from experimental to production system. These areas include the use of common sub-expressions and a more general matching operator.
    Much time is spent with the preliminary version in copying first-pass structure s - - in one example, over 50 percent of the total execution time. This copying is initiated primarily as a result of the matching operator. Whenever a match succeeds, the values of the dummy variables are established by copying sections of the tree being checked. Similarly, substituting the values of dummy variables involves copying the replacement expression and the values of the dummy symbols.
    The match and replacement operators must copy expressions because, in the preliminary system, a first-pass structure can only be a sub-tree of one tree. The solution to this problem is the adoption of "common sub-expressions" as proposed for the AED-1 compiler. When common sub-expressions are available, copying a first-pass structure will only involve incrementing a reference counter associated with the expression.
    The matching operator of the preliminary version of CAM handles only dummy variables which can represent either atoms or sections of first-pass structure. Currently, the user has no way of specifying that he wishes a particular dummy variable to be matched only by atoms. Similarly, there is no way in which the user can specify a pattern which will be satisfied by an expression containing any one of several vocabulary words. These limitations are easily overcome by introducing attributes for atoms and expressions, and permitting the use of dummy vocabulary words {made possible by a newer version of AED Jr.).
    In the final version of CAM, symbols, vocabulary words, and first-pass structures will have a standard set of attributes and provision will be made for the user to add his own. Logically, these attributes will be an extension of the "type" currently used by the Parsing Algorithm. The user will be able to assign attributes to dummy symbols with matches succeeding only if the attributes of the item being checked are included on a list associated with the dummy, This means, for example, that a user could specify that "OP" is only to be matched by "+" or "-" and then use the pattern "A OP B".
    Extract: Summary
    Summary
    The real importance of the CAM system is the ability it provides for treating so many aspects of the man-machine problem-solving process in a uniform manner. Statements describing some aspect of a problem in a suitable language a reconverted into a common internal form. Generalized operators act on the internal form to yield an appropriate result. The only parts of the system which concern the semantics of an expression are the operators which process the expression. As is indicated in Ref. 1, such operators may be defined not only for algebraic manipulation and symbolic differentiation, but for such diverse problems as simplification of Boolean expressions, compiling computer programs, generating graphical displays, controlling simulations calculations, translating from one artificial language to another, and many other purposes.
    The use of a common data structure and a core of routines for manipulating this structure can significantly reduce the complexity of the CADET system in two ways: (a) the same algorithm may become applicable to several different parts of the problem after a common data structure is adopted and (b) different parts of the overall problem may be handled smoothly and uniformly. CAM is the first system to exhibit this generality, and will be followed by extensions and elaborations as the full AED system for computer-aided design takes shape.

          in [ACM] Proceedings of the ACM symposium on Symbolic and algebraic manipulation, 1966 view details
  • Feldman, Jerome and Gries, David "Translator writing systems" p77-113 view details Abstract: A critical review of recent efforts to automate the writing of translators of programming languages is presented. The formal study of syntax and its application to translator writing are discussed in Section II. Various approaches to automating the postsyntactic (semantic) aspects of translator writing are discussed in Section III, and several related topics in Section IV. Extract: Extendible Compilers -- Basic Concepts
    Extendible Compilers -- Basic Concepts
    Many attempts (starting with McIlroy [McI1 60]) have been made to embed macro features in compiler systems. One approach was to retain the macro syntax form but add a number of built-in features which are compiler-like. The SET system [Ben 64a] included a skeleton compiler with input-output, symbol manipulation, table handling, and list processing features. These built-in routines were combined with translation time operations (action operators) in the attempt to build a TWS. A more successful approach has been to use the structured syntax of high level languages as a basis for extension.
    Many existing compilers (including PL/I [IBM 66]) incorporate simple forms of macro expansion, the first probably being JOVIAL [Shaw 63]. The most primitive form is pure text replacement without parameter substitution.
    For example, in B5500 ALGOL one could define a macro with the statement:
    DEFINE LOOP 1 = FOR I ~ 1 STEP 1 UNTIL N
    and later write statements like
    LOOP 1 N DO A[I] ~ 0
    which would be expanded into
    FOR I ~-- 1 STEP 1 UNTIL N DO A [I] ~-- 0.
    The next step is to allow a macro definition with parameters.
    This facility has been included in the AED-0 compiler [Ross 66], among others. In AED-0 one might define a macro with the statement:
    DEFINE MACRO LOOP (P1, P2) TOBE
    FOR P1 ~-- 1 STEP 1 UNTIL P2 DO ENDMACRO.
    In this case, one could get the same result as above with the shorter statement
    LOOP(I, N) A[I] ~-- O.
    These two simple macro forms would form a useful addition to any high level language, and one might imagine developing mechanisms which parallel more sophisticated macro techniques. Although AED-0 does permit arbitrary strings as parameters, and nested definitions, features like conditional assembly do not seem to have been widely used in high level languages. One reason for this is that compilers normally depend heavily on the structure of the text; the next two sections describe the complexities that arise in trying to extend compilers with macro techniques.
          in [ACM] CACM 11(02) (February 1968) view details
  • Ross, Douglas "CAD Timeline at MIT LCS" Online resource view details
          in [ACM] CACM 11(02) (February 1968) view details