Compiler Language for Information Processing
Compiler Language for Information Processing. 1958-1959. Based on IAL, led to JOVIAL. One of the first languages used to write its own compiler.
CLIP A compiler Language for Information Processing,
Santa Monica, CA
CLIP contains four data declarations: (Fig. 1)
1. A type declaration specifies the type and size of unsubscripted variables. Each I is an identifier of a variable and each ì defines the type and size of the I's immediately following it.
In the example shown, variables A and B would be signed, two-digit integers, as indicated by the presence of the "+" sign, the number "2" and the letter "D", respectively. Similarly C and D characters, E, F. and G would be Boolean, and H would be an unsigned, three-digit integer.
2. A Table declaration specifies the subscripted variables or items which make up a table. The declaration may also specify initial values which the items are to possess. J is the identifier of the table, and C is an unsigned integer whose value specifies the number [ 2 ] of entries in the table. The entries are numbered consecutively starting from one. The I's are the identifiers of the items which make up the table, and the ì as in a type declaration, specify the type and size of the items. The D's, which may be omitted, specify the initial values which the items are to contain. If present, the D's must be listed for each entry in the same sequence in which the I's are listed.
The example shown defines a table called TAB comprising three entries, each entry containing values for items W. X, and Y. W [ 1 ] would contain the characters FLEAS, X [ 1 ] would contain the number 39, Y [ 1 ] would contain the number 29, W [ 2 ] would contain the characters G blank HAS blank, and so on. Getting ahead of myself I would like to point out how this declaration would be translated for the IBM 709. Notice from the diagram beneath the declaration that more than one item may occupy a machine word. This has been done in an effort to economize on storage space, and to save input-output transfer time. Notice also that the table is arranged in parallel fashion; that is, all the values for a particular item are contiguously located. This has been done for greater ease in indexing.
3. A string declaration specifies up to 120 character positions. J is the identifier of the string, and C is an unsigned integer whose value Specifies the number of characters in the string. ì is an unsigned integer followed by the letter "H". The integer portion of ì specifies the number of D's where each D is a character. An expression of the form J [ C':C" ] is called a string variable and refers to the C'th through C"th character positions of the string J.
In the example shown, CHAR, is a string of ten characters composed of the first ten letters of the alphabet. CHAR [ 7:9 ] is an expression referring to the substring containing the letters GHI.
The string and string variable were devised in order to avoid the necessity for rigid input and output format definition. Since any number of contiguous positions in a string may be collectively referenced, maximum flexibility of format should be obtainable.
4. An origin declaration specifies the arrangement of tables, string, and/or unsubscripted variables in machine storage. The X's are identifiers of variables, table identifiers followed by empty subscript brackets, or string identifiers followed by brackets enclosing only a colon. The colons shown in the form description are optional, and separate the X's into ordered lists. All variables tables, or strings referenced in a fiat will be assigned contiguous storage in the order Specified in the declaration. Furthermore, all[ 3 ] lists occurring in the same origin declaration will be originated at the same storage location.
In the example shown variable A and string B will be assigned
contiguous storage in the order specified; similarly for table
TAB and variables C and D. In addition, A and TAB will be
assigned the same point of origin.
INPUT AND OUTPUT
Input and output of data are handled by a group of procedures which are called by the object program without their being declared. The following two calls, dealing with tapes, is representative of the group: (Fig. 2)
1. READ specifies that one record is to be read. U is an integer which symbolically defines the input unit, and each X is either an identifier of a variable or a parameter triplet. A triplet specifies an entire table or string or a portion thereof. The first parameter of the triplet is either a table identifier followed by empty subscript brackets, or a string identifier followed by brackets enclosing only a colon. The second two parameters are expressions which define the first and last entries or character positions to be transferred. ERROR: is an optional statement label which defines the next statement to be executed in the event that the data transfer cannot be successfully performed. If ERROR: is omitted from the call, a standard routine will be executed if necessary.
In the example shown, one record would be read from the unit designated symbolically by the number one. This record would be placed in the storage areas assigned to the variable C, the Ath through (B+4)th entries of table TAB1, and the first through nth character positions of string S.
2. The WRITE procedure call is completely analogous.
A COMPILER FOR CLIP is transformed into a particular absolute machine language by a compiler which may be divided into four processes or phases (Fig. 3). These are, in order of operation, conversion, analysis, translation, and assembly. Phase I produces from CLIP the source language analysis tables which are the data analysis table, containing all declarative information, and the statement [ 4 ] analysis table, containing all imperative information. Phase II produces from these the statement label table, and the intermediate language imperatives, which, together with the data analysis table, make up the intermediate language. The imperatives are expressed in a two-address, parenthesis-free code, which is the lowest level of language produced which is machine independent. That is to say, it is the lowest level of language which may be translated into more than one particular machine code.
The generation of the intermediate language imperatives from the statement analysis table is one of the important parts of the compilation process. This task is accomplished by the use of an algorithm which has been termed the "Anchor Point Method". It can handle expressions containing a mixture of arithmetic, relational, and logical operators, and the amount of working storage generated in the process is nearly minimal. The optimization of working Storage is believed complete except by virtue of the facts that identical sub-expressions are not recognized, and the original expression is not reformed.
The intermediate language is transformed by the third phase into a symbolic code, which is finally assembled into absolute machine language by a standard assembly program which is assumed available.
A compiler for CLIP, based on the design outlined is being presently coded for the IBM 709. Phase N will be the SCAT assembly program, and this compiler will be compatible with the SHARE 709 system.
As an exercise and as a test for the completeness and efficacy of CLIP, as well as for an aid in coding, phases I and II are being written in the source language. [ 5 ] However, there is, perhaps, a more important reason why the machine independent phases are being coded in CLIP, and that is because once a CLIP compiler has been coded and checked out for a particular computer, phases I and II theoretically never need be recorded for another machine. Only the translator, and conceivably the assembly program which are direct functions of hardware must be handmade' SO to speak. Hence, all future modifications to phases I and II may be made in CLIP. The rationale for this is as follows:
Let us assume that we have phases I and II coded and checked out on computer A. Furthermore, let us assume an operative translator and assembly program for computer B. Finally, assume that the output from computer A is, or can be made, compatible with the input requirements for computer B. The following procedure might then be used:
1. Phases I and II in CLIP are fed as input to themselves on computer A. The output would then be phases I and II in the intermediate language.
2. This intermediate language is then translated and assembled on computer B. the result being a complete compiler operative on machine B. A process bearing certain similarities to the one just described is mentioned in the discussions of UNCOL in the August and September 1958 issues of the ACM Communications.
In conclusion, I would like to mention some problems which are of concern but for which no satisfactory solutions have yet been found.
1. Input-output leaves much to be desired. There are such problems as the use of buffer arena and the optimum use of input-output equipment, and automatic report generation.
2. How 1B the moat efficient use of secondary storage obtained?
3. The problem of program segmentalization; that is, if a program exceeds the machine capacity, how should it be segmentalized so as to ensure minimal total operating time. [6 ]
4. Assuming a machine with index registers, how may they be used most efficiently?
5. How is fixed point arithmetic to be performed?
6. In what ways may the code of the object program be optimized?
7. Finally, there is the entire area of debugging. In view of these problems, it ahoy be emphasized that the version of CLIP and its compiler dealer herein described la by no means the final objective of the project.
This paper was prepared on the basis of the Joint research efforts of Erwin Book, Harvey Bratman, Ellen Clark, Donald Englund, Howard Manelowitz, Wills Myer and the author, all of the System Development Corporation.
[Figure containing following caption omitted: Fig.1]
[Figure containing following caption omitted: Fig.2]
[Figure containing following caption omitted: Fig.3]
Our experience with CLIP taught us a number of lessons which
we are applying in the development of Jovial. The first was that
it is quicker to code and check out a program written in a higher
level language than one coded in machine language. Modifications
involving techniques of proeessing or analysis can be put into
the compiler after it is working with comparative ease.
ISBITZ, HAROLD. CLIP, a compiler language for information processing. System Development Corp., Santa Monica, Calif., 1959,9 PP.
This short book is a description (in brief) of a data processing compiler system that is being implimented currently on an IBM 709. The nature of treatment is such that the basic compiler, when once defined in its own language, will be capable of adapting itself for another machine. The book is of interest primarily for two reasons: (1) The interesting discussion on table construction; (2) The description of the various steps taken to make CLIP as versatile as possible.
E. D. P. Gross, Jr., West Hartford, Conn.
in ACM Computing Reviews, January-December 1960 view details
in ACM Computing Reviews, January-December 1960 view details
The objectives of the research are to develop a language to express problems that are of interest to the System Development Corporation and to investigate various designs for compiler processors.
Programming problems at SDC are typically information-processing problems. Pertinent properties of a large number of objects are maintained in tabular form; transfer functions are evaluated by examining and making complex logical decisions on the status of certain properties, and evaluation of functions frequently change the status of certain properties.
It was recognized, for instance, that the compiler itself was an application of information processing. Therefore, an experiment was undertaken to design a language suitable for describing a compiler process. This language is a first step toward the development of an ideal information-processing language.
The approach has been to specify and establish both a suitable source language and a compiler to translate this language. This formal source language is CLIP:--Compiler and Language for Information Processing--and is, in many ways, similar to the Algorithmic Language, ALGOL, with the addition of declarations for data description and operations for string manipulation.
The CLIP language has been divided into two parts: the statements and the data description. The first part uses algebraic and logical expressions and seven sequential operators to describe the flow of a problem. The second part describes the type, size, and composition of data. The characteristics of this language posed many problems for the compiler. One of the chief problems was the generation of instructions for manipulating data packed into part of a machine word.
The compiler structure was greatly influenced by requirements for keeping the language independent of any one computer yet translatable to several. Therefore, the first half of our compiler processes the source language and produces an intermediate language which is still machine independent yet has been subject to a great deal of analysis. The second half processes the intermediate language and produces instructions for the IBM 709. If the compiler has to produce instructions for a different computer, only the second half needs changing. The first handwritten operational version of the CLIP compiler has been checked out. The program was used to compile statements which define the CLIP compiler. This generated version of the CLIP compiler has been used successfully to compile itself again. As a byproduct of this work, a technique has been evolved for writing other compilers in CLIP language and a method has been developed for using a compiler on several computers.
Future plans include redesign of the compiler structure to facilitate recompilation, automatic debugging features, and improvement in the language.
in [ACM] CACM 4(03) (March 1961) view details
Extract: CLIP Language
The CLIP language is based on ALGOL but has the following additional data deelarations which were found to be needed in information processing:
1. TABLE declarations specify the subscripted variables or items which make up the table. The size and form (Boolean, alpha-numeric, integer, signed integer) of each item are declared, and provision is made for packing items into parts of a machine word. Initial data may be supplied if desired.
2. STRING declarations define a contiguous set of alphanumeric characters. Initial data may be given. Operations are permitred on any contiguous subset of a string.
3. ORIGIN declarations permit the programmer to specify the sequencing and overlapping of tables, strings, and simple variables.
4. LOCAL declarations in a procedure permit the programmer to limit the scope of identifiers to the range of the procedure. Identifiers declared to be local are not synonymous with identifiers of the same name outside the procedure. Or, if more convenient, an XPRES declaration can list those identifiers which are to have the same meaning outside the procedure, identifiers not declared to be expressed are local. Extract: CLIP Table Packing
CLIP Table Packing
CLIP, as a language for information processing, was designed to handle information packed into part, words. Most information processing problems operate on large masses of data. These data are stored as table entries which in turn are comprised of several items. For example, each Crop dictionary entry describes an identifier of the object program by means of such items as name, class, form, size, etc. These items can often be contained in a few bits. To conserve space, it is desirable to pack as many items as feasible into one machine word or block.
This packing creates problems for the translator since "getting" an item into the accumulator involves extracting and positioning, and "putting" an item involves depositing without disturbing other items in the same word. The programmer has several options for specifying table packing. He may give detailed packing information for each item in a table. This method permits him to define one item as the concatenation of two or more other items. In addition, two modes of automatic packing by the translator are available. The dense mode of automatic packing insures that a minimum number of blocks will be used for a table. The medium mode of automatic packing takes advantage of the faster access time of natural word fragments of the 709.
The algorithm for automatic table packing is a simple: one. All items of the table are sorted on required number of bits as determined by the declared form, size and mode of packing, One block is completely packed before starting a new block. The largest unassigned item is assigned first : when a new block is started. The largest item which will fit into the remaining bits is assigned next. This process is repeated until the block is completely assigned or the number of bits remaining in the block is less than the smallest item to be assigned. The last item assigned to the block is positioned so as to minimize the time required to manipulate that item.
Regardless of how the packing is specified, if an item fails into a natural word fragment, the translator tailormakes all "get" and "put" instruction sequences. It may require as little as one instruction to "get" or "put" one of these items, whereas the 709 generally requires three instructions to "get" an item and five to "put" an item. Masking and shifting instructions are eliminated wherever possible.
The data generation portion of the translator computes all information concerning shifts and extraction masks which may be needed in the instruction generation portion, In some cases the information is saved both in binary and in the alphanumeric code used internally in the CLIP compiler. Computations and conversions are done only once for each item although the program may have many instances in which it, "get" or "puts" the same item. Extract: Current Status of CLIP
Current Status of CLIP
CLIP has been written in its own language and has suecessfully reproduced itself. Currently, CLIP is being used to write JOVIAL compilers. JOVIAL, the compiler language adopted by SDC as its standard programming language, is discussed in the SDC publication, SP-176 "Using Compilers to Build Compiles". JOVIAL is being implemented for four computers, IBM 709, Philco 2000, IBM AN/FNSQ7, and IBM Military Computer. A common machine independent generator phase will be used by all four compiler programs. The generator transforms the JOVIAL statements into the intermediate language. A separate translator will be written for eaeh machine to transform this intermediate language into the machine language code of its respective machine.
The generator and translators are being written in CLIP and JOVIAL. The object programs produced will run on the 709. By the procedure outlined in S1-116, "Using Compilers to Build Compilers", JOVIAL compilers will be produced to run on the four different computers without writing them in lnaehine language.
Besides being used to write other compilers, CHP will be used for experiments with new, automatic coding techniques. The debugging aids to be incorporated into the object program and the problem of partial recompilation of programs to pernfit rapid modification or correction are two such areas of study being considered.
in [ACM] CACM 4(01) (Jan 1961) view details
in [ACM] CACM 4(01) (Jan 1961) view details
The CLIP work was literally an early attempt to define a language which would be useful for writing compilers; however, the designers rapidly reached the conclusion that this was not an application significantly different from a more general information processing problem, hence the acronym CLIP is for Compiler Language for Information Processing. The designers used ALGOL 58 (nee IAL), but they made the essential and obvious additions to it in the area of data manipulation and declarations and input/output. Specifically, they added a type declaration to specify the type and size of unsubscripted variables. For example, Type(10,A,B: 6H,C.: D,E,F)
declares A and B as integers less than the value 10; C is an alphanumeric symbol of 6 characters; and D, E, and F are Boolean variables. in addition to this, a table declaration is used to specify subscripted variables. A string declaration specifies up to 120 character positions. Finally, there is an origin declaration to specify the arrangement of tables, string and/or unsubscripted variables, in machine storage. READ and WRITE statements are used for input and output.
CLIP was one of the first illustrations ore compiler used to write itself since portions of it were written in CLIP and hand-translated to 709 machine code. As discussed earlier, in the description of JOVIAL, the latter was an outgrowth of CLIP, and JOVIAL itself has been used to write many of its own compilers. CLIP has thus served its purpose and faded away.
in [ACM] CACM 4(01) (Jan 1961) view details
in [ACM] CACM 15(06) (June 1972) view details