GSL(ID:4398/gsl003)

Generation Strategy Language 


for Generation Strategy Language

table-driven compiler

Places
References:
  • Warshall, Stephen; Shapiro, Robert M.: "A general-purpose table-driven compiler" view details
          in [AFIPS JCC 26] Proceedings of the 1964 Fall Joint Computer Conference FJCC 1964 view details
  • Feldman, Jerome and Gries, David "Translator writing systems" p77-113 view details Abstract: A critical review of recent efforts to automate the writing of translators of programming languages is presented. The formal study of syntax and its application to translator writing are discussed in Section II. Various approaches to automating the postsyntactic (semantic) aspects of translator writing are discussed in Section III, and several related topics in Section IV. Extract: TGS
    TGS (Plaskow and Schuman [Plas 66], Cheatham [Che 65])
    One of the most productive groups in TWS research has been the small consulting company, Massachusetts Computer Associates (COMPASS), now part of Applied Data Research. Although their TWSs have undergone many changes, the basic world view and goals of their effort have remained rather constant. They define compiling as a sixstep process: lexical analysis, syntactic analysis, interpretation of the parse, optimization, code selection, and output.
    The principal driving force behind their work has been run time efficiency, although other considerations have played an important role from time to time. The current TWS efforts of Computer Associates use a single language TRANDIR for all the steps of compilation. TRANDIR consists essentially of an algebraic section, a pattern matching section (cf. Section II.B.5), and a number of built-in functions. Other aspects of their efforts are discussed in Section II.C.5 which deals with an extendible compiler scheme within TGS.
    The first attack on the TWS problem at COMPASS was called CGS [War 64] and was quite different from their current work. Although they have abandoned this approach, we will discuss it briefly here because it seems to be rediscovered periodically. The CGS system was based on a top-down recognizer which produced a syntax tree to be used in further analysis. The input to this phase was essentially BNF. The second phase was the generation of intermediate code using a tree-matching language called GSL. The actual code selection process was written in a third language, MDL. This effort was abandoned because trees were found to be slow to build and difficult to do pattern recognition upon.
    The TGS systems differ from CGS, as well as the other systems described in this section, in the use of a single language for describing all phases of the compiler. This language, TRANDIR, is compiled into an interpretive code which is processed by the TRANGEN interpreter. If one combines the syntax and semantic loaders of Figure 12, the FSL model applies quite well to TGS. In fact, there has been good communication between these two efforts, and they have converged to a marked degree. The communication has not, however, been perfect; two concurrent implementations of TGS and FSL took place within a few hundred yards of each other without making contact.
    The TRANDIR language contains a pattern-matching subset which is essentially the same as the syntax language used in FSL (cf. Section II.B.5). The TGS version is more flexible in that it can be used on a variety of stacks and can match on properties other than identity of symbols. The pattern matching features can be used in various code optimization techniques as well as in syntax analysis.
    The remaining features in TRANDIR language are quite similar to the semantic language in FSL. There is a "symbol description" (SD) connected with each syntactic construct which is the analog of the "semantic word" in FSL.
    There are fairly elaborate facilities for declaring tables, masks, etc., for use by the translator. These various storage methods with the associated operators provide a very flexible means of recording and accessing the information needed for compiling efficient code. The FSL notion of code brackets is replaced in TGS by a series of symbol manipulation primitives to help the compiler writer produce output code. The operation of a TGS compiler can be best described by working through an example fairly completely.
    The example will be taken from a compiler for a miniature algebraic language ~t0 described in [Plas 66]. The basic compilation technique chosen is to use a tabular intermediate code as is common in COMPASS compilers [Che 66]. A typical intermediate code translation of would be
    Z,---X.Y
    (~) TIMES X Y
    (~) STORE Z (~)
    The intermediate code will be processed by a code selection phase which will produce the final output for later assembly.
    Consider the first TGS statement:
    ...VAR AE H EMIT(STORE, COMP(1), COMP(O));
    EXCISE; TRY(ENDST).
    The left part (up to the//) of this statement is a pattern of type (variable) (expression) which is compared with main stack (SYMLIST). If a match is attained the remainder (action part) of the statement is executed. The action EMIT produces a STORE intermediate instruction with the operands being the first and zeroth elements of the stack as matched. Since there is no resulting semantic description (SD), the action EXCISE is used to erase the two matched elements from the stack. Finally, the action TRY(ENDST) directs TRANGEN to try to match the pattern labelled ENDST.
    A somewhat more complicated routine would be used for recognizing a multiplication:
    ...VAL $. VAL // PHRASE(SYMRES(TIMES,COMP(2),
    COMP (0)));
    AESET; SYNTYP (COMP(O)) ~- AE; TRY(AE1)
    When one understands that "$." denotes the terminal symbol ".", the left part of this statement should be clear.
    The action SYMRES is a call on a routine which performs an EMIT of the same parameters and also returns an SD as its value. This SD becomes a parameter to PHRASE which uses it to replace the matched portion of the stack.
    The action labeled AESET causes the syntactic type of the new top element to be assigned the value AE. Finally, the statement TRY(AE1) leads to further expression processing.
    These two TGS statements, if executed in reverse order, would compile Z ~-- X • Y into intermediate language. In the real world, typical statements would involve table operations, string commands, conditionals, and other more complicated Tat~NDm constructs. There are also fairly sophisticated In any event, the intermediate code will itself be processed by another set of TRANSGEN routines called the code selectors. These are written in the same form as the syntax routines considered above. For example:
    // TIMES INMEM INMEM ...
    LOADMQ (XM+ 1).
    This statement has a pattern involving a predicate INMEM (meaning in memory) on stack entries rather than symbols to match. (The delimiters "//" and "..." indicate that the pattern is to be matched against the intermediate code stack.) The subroutine LOADMQ is called with a pointer to the second stack operand as parameter.
    This user-written routine will assemble a LOADMQ command if necessary and will adjust the SD in the stack to reflect the fact that one operand is now in the MQ register.
    A similar routine will be used to compile the appropriate multiply sequence. The result will be in the accumulator, and TRANGEN will eventually match the statement:
    // STORE ** .INAC ...
    IF SIGN (S YMBOL (A CHOLDS ) ) THEN
    EMIT (CHS) ;
    EMIT (STO, ARG(1));
    LINE(TEMPS) = 0;
    ACHOLDS = 0; MQHOLDS = 0;
    TO (STEP).
    The pattern here contains a "**" which is always matched and a "*", meaning indirect reference. If the operand in the accumulator, which is described by ACHOLDS, is negative, a "complement" (CHS) instruction must be emitted. The store command is emitted in any case without any tests on the variable to be replaced. The succeeding actions affect the state of the translator, reclaiming the temporaries and freeing the AC and MQ registers. Finally there is a transfer to the action labeled STEP which sequences through the intermediate code.
    The TGS system has been implemented on several computers and has been used in the construction of a variety of compilers. The compiler writers have been professionals and have not been constrained to stay within the formal system. The use of TGS has been sufficiently valuable to COMPASS that they continue to use it. on commercial compilers. More recently [Che 66], Cheatham has suggested using a declarative metalanguage £D which is meant to be translated into TRANDIR procedures, presumably by a (meta-meta) processor. The translation of the language £D is based on a mechanical constructor combining notions of Wirth and Early (el. Section II.B). To allow for more powerful languages, one can append predicates (e.g. type checking) and even arbitrary computations to the declarative syntax. Finally, there are rules for outputting intermediate code attached to the syntax rules.
    The declarative language has not been implemented, but Cheatham claims that it has proved useful for the initial formulation of Ti~AN~n~ compilers. While this is probably true, one would expect that the translation to procedurM form is not, at present, a mechanical process. Further, the sophistication required of an £D user does not seem appreciably less than that required by TRANDIR.
    The main differences between TGS and FSL accurately reflect the difference in design goals: TGS Mlows more flexibility by requiring more detailed information from the compiler writer. The efforts of Gries [Grie 67b], at Stanford, and Fierst [Fie 66], at Carnegie, are attempts to have the best of both by allowing simple code bracket statements as well as multiphase processing. Both VITAL [Mond 67] and the most recent TGS [Plas 66] are interaerive and have sophisticated trace, edit, and debug features.

          in [ACM] CACM 11(02) (February 1968) view details