BEFAP style macro
for Self-Extending Translator
Bennet, Bell Labs 1962-4
Based on BE-FAP, leading to Build
Extendible Compilers -- Basic Concepts
Many attempts (starting with McIlroy [McI1 60]) have been made to embed macro features in compiler systems. One approach was to retain the macro syntax form but add a number of built-in features which are compiler-like. The SET system [Ben 64a] included a skeleton compiler with input-output, symbol manipulation, table handling, and list processing features. These built-in routines were combined with translation time operations (action operators) in the attempt to build a TWS. A more successful approach has been to use the structured syntax of high level languages as a basis for extension.
Many existing compilers (including PL/I [IBM 66]) incorporate simple forms of macro expansion, the first probably being JOVIAL [Shaw 63]. The most primitive form is pure text replacement without parameter substitution.
For example, in B5500 ALGOL one could define a macro with the statement:
DEFINE LOOP 1 = FOR I ~ 1 STEP 1 UNTIL N
and later write statements like
LOOP 1 N DO A[I] ~ 0
which would be expanded into
FOR I ~-- 1 STEP 1 UNTIL N DO A [I] ~-- 0.
The next step is to allow a macro definition with parameters.
This facility has been included in the AED-0 compiler [Ross 66], among others. In AED-0 one might define a macro with the statement:
DEFINE MACRO LOOP (P1, P2) TOBE
FOR P1 ~-- 1 STEP 1 UNTIL P2 DO ENDMACRO.
In this case, one could get the same result as above with the shorter statement
LOOP(I, N) A[I] ~-- O.
These two simple macro forms would form a useful addition to any high level language, and one might imagine developing mechanisms which parallel more sophisticated macro techniques. Although AED-0 does permit arbitrary strings as parameters, and nested definitions, features like conditional assembly do not seem to have been widely used in high level languages. One reason for this is that compilers normally depend heavily on the structure of the text; the next two sections describe the complexities that arise in trying to extend compilers with macro techniques.
in [ACM] CACM 11(02) (February 1968) view details
The ideas and philosophy of the BUILD approach have been developed over a period of years, starting with discussions in early 1960 between the author and Edward Fredkin, now Professor at M.I.T., about how an assembler might be built with a minimum effort, which could be extended from its input. Later that year, Fredkin wrote FRAP (Fredkin's Assembly Program) for the DEC PDP-1, and the author, DECAL (DEC Compiler, Assembler, and Linking Loader for the same machine.
In 1962, the author employed the macro facility in BE-FAP to implement a partial-word facility, a list-processing facility, and many special purpose tools, in the process of writing a large-scale communications network simulator for Bell Telephone Laboratories. Although this method of language extension was awkward and inefficient, it nevertheless proved the great value of the ability to create general- and special-purpose tools, as they are needed in a complex programming project.
Between 1962 and 1964, the author directed a project which produced SET (Self-Extending Translator) for the IBM 7094. In 1967, the author received support from the Air Force Office of Scientific Research for a year, to pull together the ideas and results of previous work. This effort culminated in a report in 1968. in which the acronym "BUILD" was first used.
In 1969, the author implemented a small BGP on a DEC PDP-9 at Berkeley Enterprises, Inc.. Newtonville, Massachusetts. This BGP was used principally to experiment with phrase-delimiting and phrase-processing algorithms. During the past two years (1972-1974) the author has implemented the current BGP (about 75% complete) as an adjunct to his work producing an assembler and other subsystems on a DEC PDP-10. Extract: Basic Language Constructs
Basic Language Constructs
A word in the BUILD system is like a word in a natural language. It is a concatenation of characters, which forms an entity whose meaning is independent of the meaning of its individual characters or sub-parts. In the present version of BGP, a word may contain up to 28 characters. Where the term "word" may be confused with a machine word, the term lexical word (LW) IS used instead to avoid confusion.
A lexical word is the same as "token". "symbol", "identifier", "operator", etc. in other systems. In the BUILD approach, the special characters form words, In the same way as alphanumeric characters do,
In the BUILD system, it is thought important to have a word-delimiting algorithm (lexical scanner) which is independent of the meaning of the words delimited. The algorithm used in the present BCP has this property, is simple, and has proved to be fully adequate for all purposes thus far encountered. The algorithm is more precisely stated in Appendix I. Briefly, each character is assigned a character class, solely for the purpose of word delimitation. (The character class does not prejudice the meaning of resulting words.) Characters of the same, or related, classes combine to form words. Characters of different and unrelated classes delimit words. Blanks, tabs, and end-of-line characters delimit words formed of characters of any class.
Alphabetic and numeric characters combine, but a pure numeric is recognized as an integer, Also, most special characters combine to form words.
The definition or declaration of a word Is specified prior to its use. The definition of a word consists of its type and an attribute list constituting its meaning. The structure and interpretation of the attribute list of a word is a function of its type.
The BGP contains many predefined words. The user can define new words of any existing type. (Eventually, the user will be able to add new word types.)
A phrase is a group of words which is interpreted in a uniform manner. Examples of phrases include a statement, a compound statement, an argument list, and an individual argument.
In the BUILD system, phrases are delimited by words of type "Phrase Delimiter" (PD). Phrase Delimiters come in 4 subtypes: Open, Close, Alternating, and Separator. The first two are matched. The third, Alternating, alternates between opening and closing phrases. The last, Separator, is used to separate elements of a list, although precisely speaking it terminates phrases.
The rules of phrase delimitation, together with "the predefined PD's in BOP, are given in Appendix IL Briefly, the user has the option of the method to be used for phrase delimitation, for style .and readability. The matched PD's, in the order of increasing closure strength:
are availably in the current BOP. (Closure strength is defined in Appendix II.) The user may select among these, or add new ones, as he chooses. He may also omit PD's, where the context permits.
In the text of this paper, parentheses ( ) are used to delimit simple phrases and brackets [ ], compound phrases. (In practice, the user has the choice between the methods of phrase delimitation, as described above.)
A prefix operator is one which precedes its operands (or arguments), if it takes any. Examples of prefix operators include the word types: Macro (ME), Procedure (PR), Machine Instruction (IN), Action Operator (AS). (The latter, AS, is defined in Section 5.)
A prefix operation is a prefix operator, together with its arguments.[...]
An infix operator is one that appears between its operands. A unary operator is considered to he an infix operator with a null left operand.
An infix operation is an infix operator, together with its operand(s). The syntax of a simple infix operation is a word of type Variable (VR), followed by a word of type Infix Operator (OP), followed by a word of type VR. The first word of type VR may be missing -- the unary operation case.
An infix operation may be nested, in the sense that one or both of its operands may be operations -- either infix operations or certain prefix operations (e.g., function calls, subscripted variables).
A prefix operation may also be nested, in the sense that, in certain cases, one or more of its arguments may be prefix or infix operations. The nesting syntax and conventions are essentially the same as for FORTRAN, ALGOL, and PL/I.
in SIGPLAN Notices 11(07) July 1976 view details