SQUOZE(ID:5458/squ007)


SHARE Symbolic programming language for the IBM 709

Named for "bastard past participle of the verb to squeeze"

Boem and Steele, SHARE Group


Related languages
SAP => SQUOZE   Augmentation of
SQUOZE => SCAT   Subsumed

References:
  • Boehm, E. M. and T. B. Steel "The Share 709 System: Machine Implementation of Symbolic Programming" view details
          in Preprints of papers presented at the 13th national meeting of the Association for Computing Machinery, June 11-13, 1958, Urbana, Illinois view details
  • Boehm, E. M. and T. B. Steel "The Share 709 System: Machine Implementation of Symbolic Programming" view details Abstract: As emphasized in the preceding paper, a fundamental requirement for effective utilization of high speed computing devices such as the 709 is a man-machine communication link that is at once rapid and unambiguous. Use of symbolic languages with conventional conversion techniques has resulted in reasonable initial specification of computational problems, but excess input handling has made such procedures uneconomical for the tedious corrections incumbent on the best of programmers. Also, checkout is frequently awkward, due to the inability of current systems to permit dynamic requests and responses in a language familiar to the writer. It is an objective of the system design to provide a mechanism for preservation of symbolic language communication between coder and calculator during the entire process of preparing, verifying and executing a program. Decisive in the accomplishment of this aim is recognition of a distinction between the information content of a language and the methods of encoding that language for presentation. Standard routines for translation of symbolic to absolute machine language uniformly result in information loss. This is not the case in the 709 system; only the syntax, not the semantics, are changed. Apart from mnemonic aids, the principal feature of symbolic coding is that referents and their relata maintain the same association when intervening elements are altered in number. A price is paid for this freedom as symbolic languages are uninterpretable by machine circuits without aid. A special program is required—an assembler. The principal disagreeable characteristic of an assembly program is the amount of time it occupies in execution. By attempting to minimize the frequency of reassembly, the programmer is forced to employ two languages in parallel. Logic shows that the only escape from this dilemma is to place some burden of the assembly on that program which loads codes into the machine storage for immediate execution. At once a question arises concerning the propriety of performing even a partial assembly at each machine pass.It is apparent to anyone familiar with historical assembly programs that the major time involved comes not from the internal processing but from the considerable read-in time associated with the usual methods of representing symbolic information. As this information is originally written by a human coder and then reproduced into, say, punched cards, again by a human operator, it is evidently impractical to start the process with binary encoding or some other form of compact expression. It is, however, possible to reduce this original writing to some new form of condensed encoding by computer processing and to use this form exclusively in subsequent operations. With care this can be done and not destroy the desirable features of the original representation.The basic concept in this system is a second encoding schema for the information concerning a program originally described in the symbolic language. This encoding, called the “SQUOZE encoding,” is at variance with the programmers' encoding in several respects. First, it is fundamentally binary, employing only two basic marks, while the programmer enjoys the use of 50. Secondly, the organization of information is quite different in the SQUOZE encoding. All information concerning a particular instruction occurs together in the original expression. This is not the pattern in the SQUOZE encoding. For example, the location symbols associated with the various instructions and storage cells are collected in a single, separate list. This proves to be most desirable for computational purposes when consideration is given to the special role played by such symbols. Finally, and with perhaps the most significance to the casual observer, the SQUOZE encoding is far more conservative of bits than the pseudo-English used by the programmer. It is from this phenomenon that the scheme derives its name, “SQUOZE” being a bastard past participle of the verb “to squeeze.” This condensation, which is on the order of ten to one, does not alter the functions of the system, but it does provide a considerable decrease in the time involved in processing program input, a major criterion for acceptability to those concerned with machine rentals. The critical difference from other systems inherent in the SQUOZE encoding is the retention and reorganization of information concerning symbols. It is this element that makes possible the more desirable modification and checkout procedures. Condensation does nothing more than make this sophistication cheaper and therefore more palatable. While clearly less redundant than the usual mode of expression, the SQUOZE encoding is not based on a minimum redundancy code. There are two reasons for this. In the first place, the construction of a true minimum redundancy code is not possible without precise knowledge of the probabilities of the various elements involved. Such data would be hard to come by for a programming language on an untried machine. Also, in the general case, a minimum redundancy code requires bit by bit examination for decoding. In the 709 this would require more internal processing time than would be saved on input read-in. It is evident that a compromise is necessary.A single example will suffice to illustrate the pattern of exchange between decoding time and input time. When the various macro instructions are adjoined to the actual 709 instructions, the number of operation codes which the loading program must recognize approaches 250. Allowing for later expansion, a principle that has been followed throughout the system, nine bits would be required for an elementary table description of the operation code list. Had this mechanism been adopted, only a single table lookup function would be required in the decoding program for the determination of the operation codes.If the exact frequency ranking of the operation codes were known, assuming a logarithmic distribution as a best example, approximately five and one-half bits would be the mean needed for representation of the operation codes in a minimum redundancy scheme. In this case no table lookup would be performed, but an average of five and one-half bit examinations would be essential. Evaluation of the computing and input times involved shows that the simple scheme enjoys an advantage by a factor approaching two. Additionally, in a machine with a finite high speed storage, attention must be given to the amount of program and table space needed for each procedure. Invariably the logic of present day machines favors the simpler table lookup approach.Certain improvements are possible, however. Although the precise frequencies are unknown, some a priori data are available regarding the relative probabilities for the occurrence of the various operation codes. For example, it is clear that a STORE instruction will occur more often than a REWIND TAPE instruction. By dividing the operation codes into two groups, a gain can be made. This is the procedure actually employed in the SQUOZE encoding. The operation codes representing those 32 instructions expected to be the most frequent are assigned a five bit code, while the remainder are encoded in nine. It is, of course, necessary to adjoin an additional bit to each code in order to provide an indicator for the category into which the code falls. Thus, frequent instructions require six bits for specification of their operation codes while the others demand ten. A mean of seven and one-half bits is estimated for the length of an operation code in the SQUOZE encoding.The decoding procedure for a SQUOZE operation code requires a single bit test and a table lookup. There is a slight advantage in this case over the straight table lookup. Eighteen microseconds per instruction are gained in input time at the expense of sixteen and one-half microseconds of decoding time. Clearly any more elaborate encoding would be less efficient.Talk of fractional microsecond savings may seem like woolgathering, but this time is saved on each occasion of loading an instruction into the machine. With some reasonable assumptions on the ratio of loading time to processing time, this one feature would account for some ten minutes of computer time each year. However, there are perhaps 100 such effects in the entire SQUOZE encoding and a similar number of machines involved. When translated through the rental formula, this represents some 50,000 dollars annually. For ease of discourse, the SQUOZE encoding schema will be discussed in terms of its representation on punched cards, the “SQUOZE deck.” The deck is divided logically into four parts: preface, introduction, dictionary, and text. The preface contains sundry information concerning the layout of the remainder of the deck and is utilized in allocating high speed storage effectively.The introduction contains information about those elements of the symbolic program not bearing directly on the resulting machine language code; such things as comments and data concerning which words were generated by the assembler initially as a result of library call cards and macro instructions. The dictionary is essentially a list of the location symbols encountered in the program, together with a description of the format of the cells these symbols name. The text, finally, contains the specific description of the instructions to be developed into 709 binary words and loaded into the high speed storage of the machine.Consider the functions the deck must perform. The loading process is, in fact, an assembly process. Although there is an option for the assembly program to produce an absolute binary program deck, the system is designed so that the normal modus operandi will be the production of a SQUOZE deck from original symbolic cards, this deck being used in all subsequent operations. A SQUOZE deck is not in a form that can be inserted directly into the machine for program execution. Basically, the squoze deck is the result of the first pass of a traditional assembly. For the derivation of an executable code it is necessary to perform the second pass and this is done by the loader just before execution time. As a mechanism is provided to permit symbolic modifications to the SQUOZE deck, both passes of assembly must be made on the disturbed material. In view of the possibility of altering the code through the medium of the loader, it is essential that the loader be able to produce, on demand, a new deck and a listing of the deck in symbolic language. In addition to all this, the loader must, surprisingly, be capable of loading the assembled code into storage and executing a control transfer to the object program.Making modifications to a program in terms of its symbolic referents calls for a knowledge of the number of machine words occurring between those words having symbolic names. This information is contained in the dictionary and is employed in updating the absolute assignments to location symbols just prior to actual decoding of the text.The dictionary is the primary source of information concerning the values to be assigned to symbols in the SQUOZE deck. It corresponds in many respects to the symbol tables found in assembly programs, having two parts, a symbol list and a reference list with a one-one mapping between the two. Two types of entries occur in the dictionary. First, every symbol which is to be defined during the loading process must appear in the list. Associated with each symbol is its relative placement in the original code. Secondly, to permit the loader to manage the stepping of the machine location counter during the assignment of the absolute locations, it is imperative that information be available on the details of those pseudoinstructions which have an effect on the sequence of location values. These include such things as origin specifications and block reservations. In view of the potential complexity of the requisite data, it has been found necessary to have a second table, called the “footnotes,” which contain this detail information in encoded form. There is an interesting feature related to the encoding of symbols for inclusion in the dictionary. In the usual mode of expression, symbols may be constructed from a set of 50 characters. If encoding were character by character, six bits would be required for the representation of each such character. As a symbol may contain as many as six characters, a total of 36 bits would be required for the representation of each symbol. This might seem convenient, as the length of a 709 word is exactly 36 bits, but a moment's consideration shows that it is unfortunate as it would be desirable to have a bit or two available in the same word as the symbol representation, giving a clue to the nature of the symbol. These flagging bits can be obtained. Let each character possible represent a digit in a number system having a base of fifty. Now six character symbols may be read as natural numbers in a base fifty system. If these numbers are converted to the usual base two system, only 34 bits are required for the maximum number and a gain of two flag bits has been made. This has the incidental feature of decreasing the requisite number of bits for representing the entire code, but conversion time would outweigh the saving by a significant margin were it not for the peculiar length of the 709 word. Here is a clear illustration of the critical effect the precise specifications of the machine concerned hold over the details of an encoding schema.Relative to the 709 internal location sequence, the dictionary is arranged in the numerical order of the symbols. It is obvious that this is not the order in which the symbols would occur in the original code. It is thus necessary to carry information which permits a determination of the order of the original deck. This, together with the word separation of successive symbols and the format data, comprises the basic dictionary. From it, recalling that origin specifications are included, it is possible to make an absolute determination of the value of each symbol, prior to examination of the text. Further, adjustment of these values in the dictionary is possible whenever it is necessary to incorporate insertions, deletions, or substitutions in the original text. The text is a linearly ordered string of bits representing the rest of the information required in the loading and listing processes. It consists of an encoded version of those parts of each symbolic instruction relevant to the loader's function. The singular exception is the location symbol which may be associated with an instruction. Information concerning these symbols has already been encountered in the dictionary, and, following a principle of nonrepetition, nothing covered in the dictionary occurs in the text. Although the SQUOZE encoding is emphatically not a minimum redundancy coding, it should be evident that the usual redundancy available for checking purposes is absent. Loss of a single bit will turn the entire loading operation into chaos. As a validity check, every 256th instruction possesses a peculiar, nonrecurring bit pattern which is recognized by the loader as a clue that decoding has been proceeding according to plan. Failure to meet this check is considered a catastrophic failure and loading is terminated at once. The following remarks concern the general flow of the loading procedure, although it is not possible in the space available to do justice to the complexities involved. The loading program will first accept any modification pseudoinstructions, such as CHANGE and ALTER, together with any symbolic instruction cards to be inserted, and convert them to a useful form for processing. It will then read the preface of the SQUOZE program to be loaded, from which it can allocate storage. It should be observed that the only data concerning storage allocation which may occur in this preface is information about the original state of the deck. All other necessary data must be computed from the modification decks which have already been read into the storage. Storage for these elements must have been allocated without reference to the SQUOZE deck preface and the structure of the deck to be modified. Proper allocation of storage to the various deck components and computed tables is, without question, the most difficult task faced by the system coders.Following this storage allocation exercise, the introduction is read and all ALTER modification pseudoinstructions have their card numbers reduced to a count of the type appearing in the dictionary. Indeed, this reduction is the sole function of the introduction. Were there no difference between instruction and card counts in the original deck, there would be no requirement for the data in the introduction.After the introduction is processed, the dictionary is read into storage and CHANGE cards are processed, giving the symbols encountered on them the same type of count assignment. It is the dictonary's separation count which makes this possible. Given the count for each modification, it is then possible to find the place of each such item in the dictionary and to update the count accordingly, adding new entries where required. Then the dictionary is processed to replace the count with the absolute value of each symbol.It is appropriate to note at this point that the dictionary processing is different from usual techniques. The dictionary is subjected to an iterative reprocessing until either all symbols have been defined or no further progress is possible. This implies, contrary to the situation in typical assembly programs, that definitions need not be made in a particular order in the original code. Thus, in the modification procedure, it is unnecessary to take special precautions in the order of symbol redefinition. Finally, the text will be read into storage and decoded, the resultant being placed in core storage until such time as the assigned locations overflow areas that are occupied by essential information. Should this occur, the entire code will be transcribed onto magnetic tape and the decoding continue. At the conclusion of the decoding, the code will be restored to the core and the loader will execute a control transfer to the point specified by the object code. In the event that a listing or a new deck is desired, the machine code will be written on magnetic tape and the printing and punching operations established before loading the code back into core. The emphasis of the discussion to this point has been on the loader and the modification techniques. This is as it should be for the remainder of the symbolic communication link is more or less traditional. The original translation of the symbolic information to SQUOZE deck form is in the nature of a conventional assembly, or, at least, the first pass of such a routine. Instead of absolute binary output, this assembler produces a SQUOZE deck. Hereafter the procedure is as outlined. Two other features are worth mention. For purposes of input and output, the dictionary may be preserved and used to permit the definition of input and output areas in terms of the original code symbols. Also, perhaps critical to the overall value of the system, the dictionary and its symbolic data are provided as input to the checkout routines, thus permitting a reproduction of the original symbolic language format for debugging. In summary it should be observed that the principles evident in the SQUOZE encoding technique permit the retention of information throughout the whole involvement of the machine with a computational problem, thus yielding a practical answer to the need for a man-machine communication line that is easily understandable to the human element. Careful construction has made this less expensive than one might expect from cursory examination of the usual symbolic language techniques. It remains to be seen how well all this works out in practice. Without direct experience, the constructors of the system can only proceed in the hope that their intuition has not been misleading and that, indeed, the approach taken by the 709 system is a sound departure in modern computing technology.
          in [ACM] JACM 6(2) April 1959 view details
  • McCarthy, John; Fernando J. Corbató, and Marjorie M. Daggett "The linking segment subprogram language and linking loader" view details Extract: Details of SQUOZE
    The SQUOZE output retains aIl the symbolic information of the original program, making it particularly useful for debugging programs written in machine language. When debugging, this symbolic information makes postmortems legible and changes relatively easy.
    [...]
    SQUOZE decks retain more information than is necessary for linkage and for specifying at load time those pa- rameters not set during compilation. In theory multiple SQUOZE  decks could be loaded, but this has proved difficult to realize.
    Segments of a program could not be treated as independent entities. In general, a symbol name, if used in one segment, could not be used in another segment with a different meaning. The heading feature provided some relief for this problem of name conflict by allowing a limited naming scope facility, but heading program segments presented other complexities of detail. Both in the coding and in the changing of large programs, manipulation was cumbersome and error-ridden.
          in [ACM] CACM 6(07) July 1963 view details