Compiler writing language 

for TransMoGrifier

Early language for writing recursive descent compilers.
Macro'd from the IBM 1604 to the 709 to the 7094 to the GE635, where it was used by McIlroy and Morris to write the EPL compiler for Multics. Influential for the style it gave to the portability of C (via B/NB)

Related languages
SAL => TMG   Sibling
TMG => B   Written using
TMG => EPL   Written using
TMG => YACC   Influence

  • McClure, R.M. "TMG A Syntax-Directed Compiler" view details Abstract: THIS PAPER reports on a compiler writing system called TMG, basically a syntax directed system. It has, however, some interesting differences which make it easier to handle errors and declarative information. The original objective of this system was to make it as easy as possible to construct a simple one-pass translator for some specialized language. In TMG, emphasis is placed on scanning and analysis of input text and efficient production of straightforward translation of the input. For this reason, there are substantially no facilities for handling program topology, efficient register assignment, common subexpression removal, or complicated storage assignment. It was felt that, if required, these should be handled by a specially written post-processor. To explain how one writes a translator in TMGL, we will describe in some detail a compiler which translates a simple algebraic language (SAL) into symbolic machine code for the IBM 7040.
          in [ACM] Proceedings of the 1965 20th National Conference 1965 , Cleveland, Ohio, United States view details
  • Feldman, Jerome and Gries, David "Translator writing systems" p77-113 view details Extract: TMG
    TMG (McClure [McC1 65])
    The TMG system was developed at Texas Instruments as a tool for writing simple one-pass compilers that produce symbolic output. The syntax technique is a simple topdown scan with backup. However, the embedding of semantic rules allows the recognizer to be more efficient by eliminating some syntactically possible goals on semantic grounds.
    The basic TMG statement form is a sequence of actions separated by spaces. Any action may be preceded by a There is a character-based symbol table which is built from input strings using the primitives MARKS and INSTALL. Consider the following example: INTEGER: ZERO* MARKS DIGIT DIGIT* INSTALL.
    The action ZERO* scans all leading zeros; then MARKS notes the current value of the input-string pointer. The action DIGIT DIGIT* scan all characters in the class The built-in routines include conditional arithmetic expressions, number conversions, and a few input-output functions. There are also some system cells, such am J, the input pointer, and S YMNRM, the length of the last string entered. Output is also character-oriented, as the following example will show: LABELFIELD: LABEL = $($P1 / BSS / 0//$).
    This statement would be used to process the label in some language. The "=" symbol signMs an output routine which is bounded by "$(" and "$)". The body of the output statement will form one line of assembly code value (P1) BSS O.
    The symbol SP1 is a command to evaluate the first construct to the left of the =, presumably the symbolic name of the label. The / says insert a tab and BSS and 0 represent themselves. Finally, the // places a carriage return in the output. The output routines operate from top to bottom on the intermediate tree representation of a program. Thus a SPn in an output routine may refer to a subtree and the evaluation of SPn will then involve a recursive call on another output routine. It is also possible to pass parameters by value to the inner routine. The paper gives several examples of these functions and includes a brief discussion of the error recovery capabilities of TMG.
    The TMG effort was a pilot project and its clumsy syntax would be easy to fix. It has been used to write a number of compilers, and a related system, TROL, has been used by Knuth for teaching compiler writing. The EPL (Early PL/I) used in MULTICS was written as a two-pass system, using two sets of TMG definitions, to get better code. The TMG system does not seem to be as coherent am some of the systems considered below and would benefit from another iteration.
    Abstract: A critical review of recent efforts to automate the writing of translators of programming languages is presented. The formal study of syntax and its application to translator writing are discussed in Section II. Various approaches to automating the postsyntactic (semantic) aspects of translator writing are discussed in Section III, and several related topics in Section IV.
          in [ACM] CACM 11(02) (February 1968) view details
  • Balzer, R.W. et al, "APAREL: A Parse Request Language", view details Abstract: APAREL is described: this language is an extension to an algorithmic language (PL/I) that provides the pattern-matching capabilities normally found only in special purpose languages such as SNOBOL4 and TMG. This capability is provided through parse-requests stated in a BNF-like format. These parse-requests form their own programming language with special sequencing rules. Upon successfully completing a parse-request, an associated piece of PL/I code is executed. This code has available for use, as normal PL/I strings, the various pieces (at all levels) of the parse. It also has available, as normal PL/I variables, the information concerning which of the various alternatives were successful. Convenient facilities for multiple input-output streams, the initiation of sequences of parse-requests as a subroutine, and parse-time semantic checks are also included. APAREL has proven convenient in building a powerful SYNTAX and FUNCTION macro system, an algebraic language preprocessor debugging system, an on-line command parser, a translator for Dataless Programming, and as a general string manipulator. DOI
          in [ACM] CACM 12(11) (Nov 1969). view details
  • Corbató, F. J. "PL/I As a Tool for System Programming five years with a temporary compiler" view details Abstract: My vantage point is that of a system designer-implementer concerned with the over-all system performance and the degree that the system reaches the goals that it was designed for. This gives me a little more detachment from the issue of whether the language is just right or not. For that reason some of my remarks will not be completely unequivocal but rather will be shaded.
    The basis of the PL/I experience that I wish to talk about is mostly on the Multics system, which is being done as a cooperative project by the Bell Laboratories, the General Electric Company and Project MAC of MIT using the GE 645 computer which is derived from the GE 635. However, I am not giving an official Multics view but rather only my own opinion as a member of the design team. In fact, it's a preliminary view because it is still early to be certain that we have analyzed exactly what is happening. Further, one has to be cautious in forming final judgments on a language, even though it is already a de facto standard, since there still is a need for a great deal of diversity in the computing field so that different techniques can be evaluated.

    Extract: which compiler?
    which compiler?
    So the question was: What compiler to use when developing Multics? We chose PL/I. The reasons go somewhat like this. One of the key reasons that we picked the language was the fact that the object code is modular, that is, one can compile each subsection of the final program separately, clean up the syntax, and test it on an individual basis. This latter point seems obvious, perhaps, because object code modularity is in several languages, like JOVIAL, FORTRAN , or MAD but it wasn't in the ALGOL implementations available and it blocked us from considering it.

    The second reason for picking PL/I was the richness of the constructs, especially the data structures and data types which we considered to be very powerful and important features. We had a task on our hands with fairly strong requirements and with unknown difficulty. We viewed the richness as a mixed blessing, however, because we certainly were a little wary of the possible consequences. But it certainly seemed the right direction to start and maybe to err on and to cut back. As I'll get to later, it was a little too rich.

    Another reason for choosing PL/I was that it was roughly machine independent. Our object in doing the system has not been to compete with normal manufacturing. Instead, our object has been to explore the frontier and see how to put together effectively a system that reaches and satisfies the goals that were set out. We are trying to find out the key design ideas and communicate these to others, regardless of what system they are familiar with. Hence, a language that gets above the specific details of the hardware is certainly desirable, and PL/I does a very effective job of that. In other words, it forces one to design, not to fiddle with code. And this has turned out to be one of its strong points.

    Another reason that we considered PL/I was that we thought the language would have wide support. To date it has had the support of one major manufacturer. And the final key reason for PL/I was that two persons associated with the project, especially Doug McIlroy and Robert Morris at Bell Labs, offered to make a subset of it work. In addition, a follow-on contract with a vendor was arranged for a more polished version of the compiler. This is basically why we chose PL/I. We have certainly debated, somewhat casually other choices but these were the essential reasons why we picked the language.
    Extract: TMG and EPL
    The language that was used to implement EPL was TMG, short for "transmogrifier," which is a language system developed by Bob McClure. It's a clever, interpretive system specifically designed for experimental language writing or syntax analysis. However, it is not easy to learn and use and, therefore, it is hard to pick up the work of somebody else written in the language.

    The EPL translator was initially designed as two passes, the first one being principally a syntax analyzer and the second one basically a macro expander.

    The output of the second pass in turn led into an assembler which handled the specific formatting for the machine. Later a third pass was added intermediate between the first two in an attempt to optimize the object code.
    The quick-and-dirtyness came through when the original language subset specs had only a single diagnostic, namely, ERROR. That has been expanded so that maybe now there are half a dozen, but the only help you get is that the message appears in the neighborhood of the statement that caused the trouble. The compile rate, which was never a major issue, turned out to be a few statements per second. It has been improved a little with time, but more critically the object code that is generated has improved to a respectable 10 instructions per executable statement. (There's obviously a large variance attached to these figures.)

    The environment that the EPL compiler had to fit into is significant. First of all, we had adopted as a machine standard the full ASCII character set of 95 graphics plus control characters, so one of our first projects was trying to map a relationship with EBCDIC—the IBM standard.

    We also intended to use the language in a machine with program segmentation hardware in which programs can refer to other sections of programs by name. Fortunately, we could use the $ sign as a delimiter to allow us to have two-component names. We also expected the compiler to generate pure procedure code which was capable of being shared by several users each with their own data section who might be simultaneously trying to execute the same procedure. We also wanted to establish as a normal standard, although not a required one, the use of recursive procedures by means of a stack for the call, save, and return sequence, linkage information, and automatic temporary storage. We also wanted to allow the machine to have a feature which we've called "dynamic loading" in the sense that an entire program isn't loaded per se; the first procedure is started and, as it calls on other procedures, these procedures in turn are automatically fetched by the supervisor on an as-needed basis rather than on a pre-request basis. This, of course, is in conflict with any language which allows storage to be pre-declared by the INITIAL specification within any possible module that is ever used by the program. (This problem also comes up in FORTRAN.)

    We also had a feature in the machine, which we call segment addressing that allows one to talk about a data segment without having to read it in through input/output; rather, one merely references it and the supervisor gets it for one through the file system. In other words, we were trying to design a host system capable of supporting software constructs which make it easier for people to write software subsystems.

    External link: Online at Multics
          in Datamation 15(5) May 1969 view details
  • Sammet, Jean E. "Computer Languages - Principles and History" Englewood Cliffs, N.J. Prentice-Hall 1969. p.636. view details Extract: TMG
    One of the apparently successful attempts at a compiler-writing system is TMG. According to its developer, McClure, "The original objective of this system was to make it as easy as possible to construct a simple one pass translator for some specialized language."{5} An example of a statement in TMG is the following:
    This statement says that the scanning mechanism should skip over an arbitrary number of leading zeros, mark the start of the string, find at least one digit and then space over all additional consecutive digits, and put the symbol in the symbol table and the output tree. Arithmetic can be done during complication using a function COMPUTE which has an assign merit statement as its argument, e.g., COMPUTE (NEXT--VALUE = LAST--ONE-t-2). A conditional statement can also be written, e.g., IF (INTVAL .LE. LAST-LABEL). There are a number of built-in functions in TMG; e.g., ARBNO ( ) looks for an arbitrary number of occurrences of the syntactic unit which is its argument, CLOT spaces over the input string to the next card boundary, etc. The reader interested in pursuing this system in detail will find a completely worked out example in [MZ65a]. A small section of this is shown in Figure IX-21.
    TMG was used to write ALTRAN (see Section VII.5). It has been implemented on a few machines.

          in Datamation 15(5) May 1969 view details
  • Sammet, Jean E. "Roster of Programming Languages for 1973" p147 view details
          in ACM Computing Reviews 15(04) April 1974 view details
  • Stock, Marylene and Stock, Karl F. "Bibliography of Programming Languages: Books, User Manuals and Articles from PLANKALKUL to PL/I" Verlag Dokumentation, Pullach/Munchen 1973 623 view details Abstract: PREFACE  AND  INTRODUCTION
    The exact number of all the programming languages still in use, and those which are no longer used, is unknown. Zemanek calls the abundance of programming languages and their many dialects a "language Babel". When a new programming language is developed, only its name is known at first and it takes a while before publications about it appear. For some languages, the only relevant literature stays inside the individual companies; some are reported on in papers and magazines; and only a few, such as ALGOL, BASIC, COBOL, FORTRAN, and PL/1, become known to a wider public through various text- and handbooks. The situation surrounding the application of these languages in many computer centers is a similar one.

    There are differing opinions on the concept "programming languages". What is called a programming language by some may be termed a program, a processor, or a generator by others. Since there are no sharp borderlines in the field of programming languages, works were considered here which deal with machine languages, assemblers, autocoders, syntax and compilers, processors and generators, as well as with general higher programming languages.

    The bibliography contains some 2,700 titles of books, magazines and essays for around 300 programming languages. However, as shown by the "Overview of Existing Programming Languages", there are more than 300 such languages. The "Overview" lists a total of 676 programming languages, but this is certainly incomplete. One author ' has already announced the "next 700 programming languages"; it is to be hoped the many users may be spared such a great variety for reasons of compatibility. The graphic representations (illustrations 1 & 2) show the development and proportion of the most widely-used programming languages, as measured by the number of publications listed here and by the number of computer manufacturers and software firms who have implemented the language in question. The illustrations show FORTRAN to be in the lead at the present time. PL/1 is advancing rapidly, although PL/1 compilers are not yet seen very often outside of IBM.

    Some experts believe PL/1 will replace even the widely-used languages such as FORTRAN, COBOL, and ALGOL.4) If this does occur, it will surely take some time - as shown by the chronological diagram (illustration 2) .

    It would be desirable from the user's point of view to reduce this language confusion down to the most advantageous languages. Those languages still maintained should incorporate the special facets and advantages of the otherwise superfluous languages. Obviously such demands are not in the interests of computer production firms, especially when one considers that a FORTRAN program can be executed on nearly all third-generation computers.

    The titles in this bibliography are organized alphabetically according to programming language, and within a language chronologically and again alphabetically within a given year. Preceding the first programming language in the alphabet, literature is listed on several languages, as are general papers on programming languages and on the theory of formal languages (AAA).
    As far as possible, the most of titles are based on autopsy. However, the bibliographical description of sone titles will not satisfy bibliography-documentation demands, since they are based on inaccurate information in various sources. Translation titles whose original titles could not be found through bibliographical research were not included. ' In view of the fact that nany libraries do not have the quoted papers, all magazine essays should have been listed with the volume, the year, issue number and the complete number of pages (e.g. pp. 721-783), so that interlibrary loans could take place with fast reader service. Unfortunately, these data were not always found.

    It is hoped that this bibliography will help the electronic data processing expert, and those who wish to select the appropriate programming language from the many available, to find a way through the language Babel.

    We wish to offer special thanks to Mr. Klaus G. Saur and the staff of Verlag Dokumentation for their publishing work.

    Graz / Austria, May, 1973
          in ACM Computing Reviews 15(04) April 1974 view details
  • Sammet, Jean E "Roster of programming languages for 1976-77" pp56-85 view details
          in SIGPLAN Notices 13(11) Nov 1978 view details
    • Oral History interview by Mahoney with McIlroy
      MSM: Was B ever used in Multics?

      McIlroy: No.

      MSM: So, that history of BCPL to B to C, is it all here?

      McIlroy: All here.

      McIlroy: And TMG fed into that too. Some of things like the two-address assignment operators were in TMG here first and then were adopted by B and by C. I can?t say I invented them because they also came from ? they were also in Algol 68 at the same time. B is where the unusual express? uh, declaration syntax of C came from. That was Ken?s invention, that the declaration should look like any? should have the same syntax as an expression.

      MSM: Since we?re on C. One of the? I?ll ask Dennis this when I get to it, but one of the features is, that struck me about C was when I was writing a LISP interpreter for ? in it. This property of C, of always? of any statement bringing back a value, in the type, so that all operators have values, and so I found that at a certain point my core C LISP was beginning to look like LISP expressions, and at a certain point it just seemed automatic to go over to a LISP library.because I was just stacking parenthesis in C. Where did that come from? Is that part of B? Or ? the notion that all operators have values?

      McIlroy: Yeah. It was also in Algol 68. In BCPL, which came out of CPL, they had the very, very strong distinction between functions and commands. An assignment was a command. So, they did not have? I do not think the assignment was an operator in the expression. But, it was in Algol68. So, that was in? So that happened sort of everywhere at the same time. In fact, the first place I saw it was McClure?s proposal called Linear C, which was way before the language C. Just liked it because it sounded nice. Like after Linear A and Linear B.

      MSM: Oh I see, I see

      McIlroy: It was an obscure looking language and it was linear, because you wrote tremendous long expressions.

      MSM: Must have placed you on the borderline between a procedural and functional language?

      McIlroy: Yes. It did. And roughly speaking what it had was a "break" and a "continue" statement. "Continue" simply went back to the last parenthesis, and "break" simply jumped over to the next one, and those were the major controls, plus an "if" of course, those were the major controls in the language. So, there wasn?t an actual key word "for". Just the fact that you said "continue" meant you would jump back.

      external link
    • RISM's page on TMG
      external link