OCAL(ID:269/oca001)Pattern matching alnguage for cryptographic analysisfor On-Line Cryptanalytic Aid Language Edwards, MIT, 1966 Pattern matching language with array-matching capabilities and statistical calculations "OCAL is a problem-oriented computer programming language with the general area of cryptanalysis as the problem domain. OCAL is basically a synthesis of the MAD and SNOBOL computer programming languages, combined with ideas taken from SLIP and PL/I. " Language design followed experimentation with SNOBOL (but not SNOBOL 4) and METEOR (which had pattern ideas from COMIT), and their subsequent rejection. PL/I supplied compount structures and event-driven programming (ON...) Related languages
Samples: References: The kind of aid a computer would provide can be seen in Yardley's discussion of breaking the Japanese diplomatic code preceding the Washington Armament Conference of 1921-22. The clerical work in this instance required preparing 60,000 index cards with fragments of Japanese messages in both plain and code text. This preparation was done by a "corps of typists" working many hours. After the cards were prepared, they were sorted into various categories and summarized by hand onto large summary sheets. Tasks like this could easily be accomplished by a digital computer. Solution of ciphers also requires a certain amount of routine bookkeeping, such as counting letter frequencies and looking for repeated digraphs. Also, Colonel Friedman's advice about using a soft pencil with a big eraser is well taken, for in solving cryptograms by hand the eraser is used almost as frequently as the pencil. Let us again examine the idea of using a computer, this time with a CRT display. Why not have the computer allow an operator to make a guess and watch the computer work out the consequences? If the guess does not "prove out", the operator can erase the guess and its consequences with a single key stroke. The advent of modern time-shared computer systems, complete with CRT displays, places all of the above-conjectured uses of a computer within the realm of practicability, because an expensive computer need not be tied up while the analyst is trying to figure out what to do next. The problem then resolves to: what language can a cryptanalyst use to program an on-line computer to perform the various tasks pertaining to solving a cryptogram? Let us list some of the requirements for such an On-line Cryptanalysis Aid Language (OCAL) and then examine some existing languages in light of these requirements. First, the OCAL must handle strings of alphabetic characters and manipulate these strings easily. Second, the OCAL must handle algebra with ease, including matrix operations. Third, the OCAL should be embedded in a machine environment which allows the cryptanalyst to write and check out programs on-line. Finally, the OCAL must be reasonably efficient in its use of computer time and storage, if reasonable response times are desired in a time-shared computer environment. Available languages for programming computers include basic machine language, LISP and its derivatives, the ALGOL family of languages, and string-processing languages such as METEOR and SNOBOL. Machine language, even with macros, is rejected because it is much too hard to program and quickly check ideas. The OCAL should be a tool which a cryptanalyst can use easily, while machine language, even in the hands of a skilled programmer, is a blunt instrument at best. LISP on the other hand, while not easy to learn, is a powerful language for many types of complex data manipulation tasks. LISP handles algebraic tasks with moderate ease, matrix manipulations with some difficulty, and strings with still more difficulty. Finally, storage efficiency leaves much to be desired, and this objection is especially critical when the problem of using large dictionaries in the OCAL is considered. Therefore, LISP is rejected as the OCAL. The other LISP-like languages, such as SLIP, threaded lists, and IPL (the machine language of list processing) suffer similar deficiencies. Next, the ALGOL family of languages, such as ALGOL, MAD, AED, PL/I, and even FORTRAN is considered. These languages handle algebra with ease, but their string-handling abilities are almost non-existent. Furthermore, none of these languages is particularly well adapted to on-line use. This, coupled with the difficulty of adding good string-processing features to any current time-sharing version, leads us to look elsewhere for the OCAL, Finally, let us examine the rather interesting string-processing language SNOBOL. The heart of SNOBOL is an elegant pattern-matching algorithm which allows a string to be tested for a complicated pattern in one statement. In order to test the suitability of SNOBOL for cryptanalysis, a program to solve the simple railfence cipher was written and debugged in about 15 man-hours using the Compatible Time-Sharing System at Project MAC. (See Appendix B for a discussion of the railfence cipher and a resulting SNOBOL program.) Writing the railfence program revealed several weaknesses in SNOBOL. First, the arithmetic was workable but somewhat awkward. Second, there was no provision for arrays, which made the solution scoring by digraphs rather difficult. This problem was solved in the railfence program by making the digraph scoring array into a series of fixed strings which were accessed by the pattern-matching statement. The most serious deficiency of SNOBOL was the lack of a functional argument provision in the pattern-matching statement. That is, pattern elements could be fixed strings, arbitrary strings, arbitrary strings of fixed length, or repeats of previously-matched pattern elements. Missing was provision for making a pattern element into an arbitrary string, subject to a predicate procedure which could examine the state of the pattern match to that point. (This deficiency is not present in the string-processing language METEOR which is an improved LISP implementation of the string-processing language COMIT. However, METEOR still suffers from the same problems as LISP, regarding efficient use of time and storage.) These deficiencies ruled out SNOBOL as the OCAL, but the pattern-matching concept was considered important and was extended along the lines of allowing a pattern element to specify a predicate procedure. This extended SNOBOL statement was then incorporated in the final design of OCAL. With no single language suitable for the OCALf two courses of action were open. Either take an existing language and extend it to overcome deficiences, or design a new language aimed specifically at the field of cryptanalysis. The first alternative was rejected, because extending an existing language does not usually allow one to insert new ideas without redesigning the entire language. The author was interested in what could be done from scratch, and therefore he chose the second alternative, design of a new language. Hence, the specific goal of this thesis is to specify and demonstrate an On-line Cryptanalytic Aid System (OCAS) which will permit a computer programmer, who is already familiar with Cryptanalytic procedures, to easily program and test an attack on any of the 30 different cipher systems that regularly appear in The Cryptogram (Again, see Appendix A). Extract: ON-LINE CRYPTANALYTIC AID LANGUAGE 3.1 ON-LINE CRYPTANALYTIC AID LANGUAGE (OCAL) OCAL is a problem-oriented computer programming language with the general area of cryptanalysis as the problem domain. OCAL is basically a synthesis of the MAD and SNOBOL computer programming languages, combined with ideas taken from SLIP and PL/I. This section describes the basic features of OCAL. (A complete description of OCAL syntax can be found in Appendix C.) 3.1.1 Basic Pata Types The following quantities comprise the OCAL basic data types: a) Logic - a two-bit quantity standing for True, False, Neither, or Undefined. The reason for introducing a basic four-value logic is to make the results of certain logical comparisons more obvious to the programmer. For instance, the question "Is ten greater than an orange?* could be answered "Undefined* because the quantities involved in the comparison are not comparable. An example of use for logic value "Neither" might be in response to the question "Given that cipher A stands for plaintext Q in a simple substitution cipher, does cipher text MKF stand for plaintext THE?" The answer "Neither" in this case means undecided, for the information given is insufficient. Situations requiring a simple Boolean decision can be made on a "True" or "Not True" (e.g., "False", "Neither", or "Undefined") basis. b) Integer - the standard computer quantity used for integer arithmetic and subscripting expressions for compound data structures. c) Real - floating-point numbers used primarily in arithmetic calculations. d) Character - a two- to eight-bit representation of a member of the ASCII character set. Each character is associated with an alphabet (defined next) which gives the mapping from a particular ASCII character subset into the full ASCII character set. The Character is the basic constituent of the string (defined later) and may also be used in subscripting expressions for compound data structures. e) String - an ordered set of characters all taken from the same alphabet. A string may be arbitrarily long and is associated with an alphabet that gives the mapping of character representations into ASCII characters. Also associated with a string is an integer giving its current length in characters. f) Reader - an object which may be associated with a string. A reader may be thought of as the reading head of a Turing machine, with the associated string being the Turing-machine tape. A reader can move up and down a string, read characters out, or write characters into a string. In addition, a reader can be positioned at the head of a string, at a preset place on the string, or at an arbitrary place on the7 string. g) Alphabet - defines a mapping function from the ASCII character set {the standard OCAL alphabet) into a subset of ASCII. The alphabet concept is used to gain storage and subscripting efficiency when dealing with characters and strings. An alphabet may map any number of characters in the domain (ASCII) into a single character in the range. Characters appearing in the domain, but not in the range, are mapped into the null character (i.e., ignored). In addition, each alphabet provides two extra characters in the range corresponding to logic values "Neither" and "Undefined". This feature allows OCAL to indicate certain logical decisions or conditions within a string. Also associated with each alphabet is an integer equal to the cardinality of the mapping range, excluding the logical characters "Neither" and "Undefined". This permits character and string arithmetic to be done modulo the size of the alphabet. h) Statement Label - a special data type referring to a part of an OCAL procedure. Statement labels are data types to permit assigned GO TO statements and functional arguments in OCAL. 3.1.2 Compound Data Structures The OCAL compound data structure is taken from the PL/I language. Compound data structures can consist of any of the previously-mentioned basic data types and other compound data structures. Various parts of a compound data structure can be accessed either by name or by subscripting expressions. Thus, a real array in OCAL is simply an n-dimensional compound data structure consisting of real numbers. 3.1.3 Declarations Declarations are used in OCAL to associate data types with the local variables used in a procedure. All variables must be declared at the head of the procedure or block in which they appear. Variables may be either local or global in scope: local variables are defined only within the block or procedure containing the declaration, and global variables are defined in all blocks and procedures. Declarations are also used to define compound data structures; in which case all the elements of the declaration must be basic data types or already-declared compound data structures. That is, recursive definition of a compound data structure is not permitted. 3.1.4 Statements Statements in OCAL may be either simple or compound. Simple statements are terminated by a semi-colon, or the end of the line on which they appear, unless the continuation character "." (period) appears as the first character on the following line. Executable statements may be symbolically labeled with one or more labels. Compound statements are groups of statements enclosed within the statement parentheses, BEGIN and END. A compound statement is called a block, and blocks may be nested to any depth. OCAL statements fall into the following categories: a) Declarations - type identification, data structure, and procedure structure; b) Control - GO TO, conditional, and iteration) c) String pattern matching - similar to the basic SNOBOL string pattern-matching statement! d) Assignment - assigns values to symbolic quantities; e) Execute - calls a specific procedure, but ignores any values returned; f) Error control - allows an OCAL procedure to retain control even though a called procedure has taken an error return. (A detailed list of statements with their syntax is in Appendix C.) 3.1.5 Procedures Procedures may have a fixed or variable number of arguments or parameters. If the procedure has a variable number of parameters, the global integer variable "UUMBEROFPSETS" gives the number of parameter sets for any particular procedure call. Parameters are referenced by the local name which is given procedure declaration. Procedures may be defined recursively and keep their working storage on push-down lists. Procedure calls are made in the form fn.(al,a2, ...,an) where "fn" is the procedure name and the period [.] distinguishes a procedure call from a subscripted variable. The "ai"s are the parameters for the called procedure. A procedure with no arguments is called by the procedure name followed by a period. A procedure may be given a value by the statement VALUE e where "e" is any expression. There are two procedure returns in OCAL; first, the normal return is specified by the statement RETURN e or by executing the last statement of a procedure, and the second return is given by the statement ERROR s where "s" is a string. On executing an error return, control is returned to the last procedure which executed the statement ON ERROR, s where "s" is any simple or compound statement (usually a GO TO statement, or a block ending with a GO TO or DISMISS statement). 3.1.6 Relations These are logical operators that compare integer, real, character, and logical quantities. The value of a comparison is the logical quantity "True" if the relation holds, "False" if the relation does not hold, and "Undefined" if the quantities are incomparable (e.g., is "blue" equal to 3.147). 3.1.7 Arithmetic Normal infix operators may be used in arithmetic expressions in OCAL. Each operator takes operands whose type is character, integer, or real and produces a result which is the same type as the highest type of any operand; the ranking between types is character lowest, integer next, and real highest. Furthermore, if characters appear in any arithmetic expression, the result is taken modulo the alphabet size associated with the first-mentioned character. This feature may be suppressed if desired. 3.1.8 Logical Expressions Standard logical infix operators are available in OCAL. Each operator takes two arguments whose type is logic, character, or integer. The logical operators produce a result which is the same type as the highest types of any operands; the types being ranked with logic lowest, character next, and integer highest. The value of a logical operator is the bit-wise combination of the operands after type transfers (if any) have been performed. Extract: OCAL Manual This appendix describes current specifications of the on-line cryptanalytic aid language. OCAL is intended to be a problem- oriented computer programming language, designed to make the solution of cryptograms easier by providing a cryptanalyst with computer aid. The ideas incorporated in OCAL have been taken from many languages, such as MAD, PL/I, SNOBOL, LISP, and SLIP. However, OCAL was not intended to have the full generality of a language such as PL/I. Instead, OCAL was originally specified for easy implemen- tation on a computer such as the Digital Equipment Corporation PDP-6. As the design continued, some compromises were made to provide more features in the language, so that some of the specifications may change when the language is finally implemented on a computer. C.1 SYNTAX NOTATION In this appendix, meta-variables will be typed in small letters without intervening blanks, as the following: identifier label boolean-expr Capital letters indicate words that are part of the language, such as: PROCEDURE DO STRING BEGIN The meta-symbol ... is used to indicate that an arbitrary number of the preceding meta-symbol can follow. All other punctuation marks such as . , : ; must appear as indicated. Optional portions of definitions will be set off using pairs of slashes [/]. For example, LABEL namel/,name2,. ../ means that the declaration LABEL is followed by at least one name and optionally, an arbitrary number of names separated by commas. C.2 BASIC PROGRAM ELEMENTS C.2.1 Character Set The basic character set for OCAL is the revised ASCII character set. This character set is used for both language and data. C.2.2 Identifiers An identifier is a string of 29 or fewer alphanumeric characters; the initial character must be alphabetic. Identifiers are used for variable names, array names, statement labels, procedure names, and keywords. C.2.3 Use of Blanks Identifiers and constants (except string constants) may not contain blanks. Identifiers and/or constants may not be immediately adjacent. They must be separated by an operator, equal sign, paren, colon, semi-colon, period, or blank. All format effecters, such as horizontal tab, vertical tab, and line feed are treated as blanks, and multiple blanks are treated as one blank. C.2.4 Comments If the first character at the beginning of a line (i.e., after a Carriage-Return Line-Feed [CRLF] combination) is a star [*] then the entire line up to the next statement terminator (i.e., semi- colon or CRLF) is treated as a comment and is ignored in OCAL. C.2.5 Statements A statement is any single statement found in the language and is terminated by a semi-colon or a CRLF. Sometimes a statement can contain another statement as a sub-piece. (For example, see the IF statement). If a complete statement does not fit on one line, it may be continued on the next line by making the first character on the next line a period [.]. In this case, both the CRLF and the period are ignored by OCAL. This is true even within string constants. C.2.6 Blocks A block is a group of statements enclosed between the statements BEGIN and END. BEGIN and END act as statement paren- theses and define a block. Blocks may be nested to any depth. A block may appear anywhere in the language a statement can appear, except that a block cannot appear in place of a declaration or PROCEDURE statement. C.2.7 Statement Labels Statements may be labeled to permit reference to them. A statement label has the form, id:/id:.../ statement where "id"s are identifiers. In this case, the identifiers are called statement labels and may be used interchangeably to refer to the labeled statement. Labels before procedures are special cases and are called procedure names (see Section C.7.1, PROCEDURE Statement). Only one label may appear before a PROCEDURE statement. Statement labels appearing before declarations in OCA1 are ignored. C.3 BASIC DATA TYPES C.3.1 Logic A four-value logic is used in OCAL. The values and their meanings are: T! - true Fi - false Nl - neutral or neither Ul - undefined The logic values are ranked from lowest to highest, with N! lowest/ then F!, TI, and Ul highest. The result of logic constants combined under the operation .A. [AND] produces the lowest of the operands. Similarly, the operator .V. [inclusive OR] produces the highest of the operands. The operator .N. [NOT] inverts T! with F!, and Nl with u!. The operator .X. [exclusive OR] behaves like .V. [inclusive OR] unless both operands are the same, in which case the result is the .N. [NOT] of the first operand. c.3.2 Integer An integer is an optionally-signed string of decimal digits, or an optionally-signed string of octal digits, followed by the letter K. For an octal integer, the K may be followed by an octal exponent given as a one- or two-digit decimal integer. The maximum size of an integer depends upon the particular OCAL implementation. On the PDP-6, up to ten decimal digits or twelve octal digits are permitted. C.3.3 Real A real number is an optionally-signed string of decimal digits including a decimal point [period]. In addition, a real number may have an exponent, indicated by the letter E, followed by an optionally signed one- or two-digit, decimal-integer exponent. The maximum precision of real numbers is dependent on the particular implementation of OCAL. On the PDP-6, the exponent magnitude must be less than 10-to-the-38th power and the precision is limited to eight decimal digits. C.3.4 Character A character is a two- to eight-bit quantity representing an element of the ASCII character set mapped by an associated alphabet (see Section C.3.7). Characters are indicated in the language by a double quote mark ["] followed by one ASCII character or by a number sign It] followed by exactly three octal digits. Characters may be mapped by alphabets from the ASCII character set to a subset of ASCII and back again. For example, the ASCII character A may be represented by either of the following: "A #201 C.3.5 String A string is an arbitrarily long sequence of ASCII characters delimited by single quote marks [']. A string may contain any combination of ASCII characters. The characters single quote [*]» double quote ["J, and number sign [#] have special meaning when denoting a string in OCAL. Single quotes delimit the string, which means that one double quote mark is ignored and the character immediately following it is inserted in the string, no matter what that character may be. The double quote mark is used as a "quote" character; so that a single quote may be inserted in the string using the double quote mark. Since not all eight-bit ASCII characters can be generated from a normal teletypewriter keyboard, a special quote character, the number sign [#] , is used to insert untypable characters in a string. A number sign must be followed by three octal digits, from 000 to 377, inclusive. This octal number represents the desired ASCII character. Note that the carriage return and line feed characters may appear in a string. If a desired string will not fit on one line, the statement continuation convention may be used, in which case neither the CRLF nor the following period will appear in the string. For example, the following all represent the same ASCII string in OCAL: 'ABC1 'A"B"C' ?#201B#203' C.3.6 Reader A reader is a special data type which may be associated with a given string. Using special reader functions, a reader may be moved up and down the string. A reader can also read characters from a string and write characters into a string (see Section C.9, Reader Functions). The reader was introduced into OCAL as a flexible way of transforming character strings into characters, and vice versa. C.3.7 Alphabet An alphabet specifies a mapping from the ASCII character set into ASCII. The idea was introduced into OCAL to add efficiency when dealing with characters as subscripts for compound data structures and arrays. Alphabets also allow core storage to be used more efficiently when storing character strings. In addition, alphabets can be used to exploit certain mathematical relationships often found between the characters of a particular cryptogram or cryptographic system. The alphabet declaration has two parts: the name, and the defining string given in OCAL string notation. In addition to the characters in the defining string, each alphabet includes two extra characters in the domain, standing for the logic values N! and U!. These are included to give OCAL the ability to indicate certain logical decisions within a string. However, the character corresponding to NJ and U! are not included in the cardinality of the alphabet. The declaration of an alphabet defines two objects within OCAL. First, a mapping function is called like an OCAL procedure which converts an ASCII string or character into a string or character in the given alphabet. Under this mapping, any character appearing in the domain (ASCII), but not in the range, is mapped into the null character (i.e., ignored). Second, the declaration permits the alphabet name to be used as a global integer variable whose magnitude is equal to the cardinality of the the defined alphabet. An alphabet can also specify the mapping of many characters in the domain into one character in the range. This is accomplished by observing the following conventions in the defining string. All characters enclosed within parentheses in the defining string are mapped into the same character as the first character after the open parenthesis. If either of the literal characters open parenthesis "(" or close parenthesis ")" is desired in the range, it must be preceded by a double quote mark in the defining string. (NOTE: a double quote mark is introduced into an OCAL string using the form "".) For example, the following will declare a five-letter alphabet called A5, consisting of the characters A B C ( and ). In addition, the ASCII characters D and E will be mapped into the character C. ALPHABET A5('AB(CDE)""("")') Using the alphabet AS, the ASCII string 'ABCDEF(ABZ)' will be mapped into the string 'ABCCC(AB)', C.3.8 Type Transfer Procedures The following procedures are available to transform quantities from one basic type to another. They are: CHARACTER.(q) where "q" is a logic, integer, or real quantity and the result is a character in the ASCII alphabet; STRING.(q) where "q" is a logic, character, integer, or real quantity and the result is a string in the ASCII alphabet; LOGIC.(q) where "q" is a character, integer or real quantity; INTEGER.(q) where "q" is a logic, character, ASCII string of digits, or real quantity; and REAL.(q) where "q" is a logic, character, ASCII string of digits in REAL form, or integer quantity. The procedure ASCII.(s) will transform the string "s" in any alphabet to an ASCII string. C.4 BASIC DECLARATIONS In an OCAL procedure, each variable must be declared before it is used. The following forms are used to declare variables in an OCAL procedures LOGIC id/,id,id.../ INTEGER id/,id,id.../ REAL id/,id,id.../ CHARACTER id/,id,id.../ STRING id/,id,id.../ READER id/,id,id.../ ALPHABET id(st) LABEL id/,id,id.../ EXTERNAL id/,id,id.../ GLOBAL id/,id,id.../ where "id" is an identifier and "st" is an OCAL string. The LABEL declaration means that the variable stands for a statement label. The GLOBAL declaration means that the variable is to be made available to all OCAL procedures and is always defined. The EXTERNAL declaration means that the variable is a GLOBAL variable defined by some other OCAL procedure. The variables mentioned in a GLOBAL or EXTERNAL declaration must also appear within one of the type declarations. Variables not mentioned in a GLOBAL or EXTERNAL declaration are defined only within the procedure or block which contains the declaration. C.5 COMPOUND DATA STRUCTURES The compound data structures in OCAL are taken from the data structures found in the programming language PL/I. To avoid repetition of material, the following sections in Chapter 2 of the PL/I manual (IBM Form C28-6571-0) should be implemented in OCAL: DATA AGGREGATES - page 43 ARRAYS - page 44 STRUCTURES - page 44 ARRAYS OF STRUCTURES - page 44 NAMING - page 45 SIMPLE NAMES - page 45 SUBSCRIPTED NAMES - page 45 QUALIFIED NAMES - page 46 SUBSCRIPTED QUALIFIED NAMES - page 46 The only restriction on the data structures in OCAL is that blanks are not permitted within qualified names. In implementing these data structures in OCAL, it should be noted that each element of a compound data structure must be previously declared to be one of the basic data types, or must be a previously declared compound data structure. The recursive definition of a compound data structure is expressly prohibited in OCAL. C.6 EXPRESSIONS C.6.1 Arithmetic Expressions The following infix operators are available for arithmetic expressions in OCAL: + addition - subtraction * multiplication / division Arithmetic is performed on character, integer, and real data; the data types being ranked with character lowest, integer next, and real highest. The operands of any operator are converted to the type of the highest operand, and the result is of that type unless one of the operands was a character. In that case, the result of the arithmetic expression is of character type and is taken modulo the size of the alphabet corresponding to the first character encountered. If this action is not desired, the following "dotted" operator set may be used! .+. addition .-. subtraction .*. multiplication ./. division .R. remainder The "dotted" operators perform only the necessary type matching and indicated arithmetic. C.6.2 Relational Expressions Relational expressions return logic values and are used in making comparisons between various quantities. The relational operators are: .G. greater than .GE. greater than or equal .L. less than .LE. less than or equal .E. equal .NE. not equal The operands may be of logic, character, integer, or real type. As in arithmetic expressions, type conversion takes place between character, integer, and real data types. However, if one operand is of logic type, then they both must be of logic type or the result will be U! [undefined]. Normally, the result of a relational expression is T! [true] if the relation holds and F! [false] if it does not. C.6.3 Logic Expressions The logical operators available in OCAL are: .A. and .V. inclusive or .X. exclusive or .N. not The operands of a logical operator may be of logic type (ranked lowest), character, or integer (ranked highest). The result is of the same type as the highest operand and is the bit-wise combination of the operands according to the operator, unless both operands are of logic type. In this case, the truth tables indicated in Section C.3.1 are used. C.7 STATEMENTS C.7.1 PROCEDURE The PROCEDURE statement marks the beginning of an OCAL function or procedure. It gives both the procedure name and the list of parameters the procedure is to receive. OCAL procedures may be recursively defined without any special declaration. The parameter list for a procedure may specify either a fixed or variable number of parameters. The form of the PROCEDURE statement for a fixed number of parameters is id: PROCEDURE/(name1,name2,...)/ where "id" is the identifier giving the procedure name and the optional parameter list is enclosed in parentheses. Names in the parameter list give dummy names for arguments used by the procedure. Each dummy name must appear in a type declaration statement in the procedure. For a variable number of parameters, the PROCEDURE statement has the form id: PROCEDURE (/f,f,.../(v,v,...)/,f,f.../) where "id" is the procedure name and the "f"s indicate optional parameters that are always present in the procedure call. The "v"s in parentheses indicate a set of parameters which may be repeated zero or more times in any procedure call. Again, all the dummy parameter names must appear in type declaration statements for the procedure. At each activation of the procedure, the global integer variable NUMBEROFPSETS will contain the number of parameter sets in this procedure call. Individual members of a parameter set may be referenced by the convention parn [n]/(subs)/ where "parn" is the dummy name in the procedure parameter list, [n] is an integer or integer variable referring to a particular parameter set, and the optional (subs) is any subscripting expression associated with the parameter. Note that it does not make sense for the value of n to exceed the value of the integer variable NUMBEROFPSETS. An OCAL procedure is terminated by an END statement (see next section). If control reaches an END statement for a procedure, it is equivalent to executing a RETURN statement with no return expression specified. C.7.2 BEGIN AND END The BEGIN statement or block marks the beginning of a compound statement which may appear any place a single statement can appear (except for a PROCEDURE statement or declaration). In addition, a compound statement may start with type declaration statements, declaring local variables defined only within that compound statement or block. Variables used but not declared within a block are assumed to be declared in the procedure or in a block which encloses this one. The statement END /statement-label/ is used to terminate both a block and a procedure. The optional statement label, if present, must match the label on the corresponding BEGIN or PROCEDURE statement. C.7.3 Assignment The = sign is used to denote assignment in OCAL. This form gives vl/,v2,v3.../ = el/,e2,e3.../ where the "v"s are either variables which may be subscripted, or certain reader functions, and the "e"s are any OCAL expressions. If more than one variable or expression occurs, the assignments are made in pairs, el assigned to vl, e2 assigned to v2, etc. If there are more expressions than variables, the excess expressions are evaluated but the values are ignored. If there are more variables than expressions, the last expression value is assigned to the remaining variables. Automatic type conversion is done within the following groups of data types: character-integer-real logic-character-integer Assignments made to a character variable are made as stated, if the expression is of character type. Otherwise, the expression is taken modulo the size of the alphabet (if any) associated with the character. C.7.4 PROCEDURE calls Procedures are called with procedurename./(pl,p2,...)/ This uses the MAD convention of following the procedure name with a period to differentiate it from a subscripted variable. The "p"s are optional parameters which, if present, are enclosed in parentheses. However, a statement may consist of only a procedure call, in which case any value returned by the procedure is ignored. C.7.5 Iteration The iteration statement DO allows a statement or block to be repeated zero or more times until some logical condition is met. The DO statement takes the following formss DO UNTIL loqicexpr, statement DO WHILE logicexpr, statement DO HEITHER Icxricexpr, statement The UNTIL form repeats the statement until the logical expression logicexpr is not F! [false]. The WHILE form rewats while logicexpr is T! (true]. The NEITHER form repeats statement while logicexpr is N! [neither or neutral]. C.7.6 Conditional The conditional statement takes the form IF logicexpr, statement If the logical expression "logicexpr" is T! [true], the statement is executed. Otherwise, the statement is skipped. C. 7.7 GO TO The GO TO statement has the form GO TO label where label is a statement label or variable of LABEL type. C.7.8 VALUE The value returned by on OCAL procedure may be indicated by the statement VALUE expr where expr is any expression. C.7.9 RETURN A particular activation of an OCAL procedure, is terminated by executing the END statement associated with the procedure or by executing the statement RETURN /expr/ The value returned by the procedure is the value of the optional expression "expr". If expr is not present, the value is taken from the last VALUE statement executed in the procedure. If expr is not present and no VALUE statement has been executed, the procedure returns a null value. C. 7.10 ERROR A particular OCAL procedure may be terminated by the statement ERROR /string/ Executing this statement causes control to return to the last ON ERROR statement executed (see ON statement). The value of the optional string associated with the last error statement is found as the value of the global string variable ERRORSTRING. C.7.11 ON The ON statement (an idea taken from PL/I) allows a programmer to retain control in spite of certain interrupts which might cause the OCAS job to terminate. The form of the ON statement is ON condition, statement where the "statement" (usually compound) is executed when the interrupt corresponding to "condition" is found. The interrupt conditions which the programmer can intercept with the ON statement are: ERROR - error return from an OCAL procedure CLOCKTICK - every time the system clock ticks PDLOVERFLOW - overflow of push-dovn list STORAGEFULL - no free storage left DISBUFFERFULL - overflow of display buffer DISPLAYSTOP - the display has executed a stop instruction STORAGEUSED - allotted storage has been used (see Storage Allocation, Section C.10) KEYSTROKE - one character has entered the on-line teletype buffer If appropriate, the programmer can return control from the interrupt to the statement OCAL was executing when the interrupt occured by executing the statement DISMISS This permits OCAL to resume processing the previous calculation after some interrupt processing has been done. The effect of an ON statement may be canceled by leaving the procedure in which the ON was executed, or by the statement REVERT condition which causes any interrupts corresponding to condition to be handled by an ON statement executed in a higher procedure. The system may be requested to handle interrupts by the statement SYSTEM condition This instructs the system to do normal processing (if any) of any interrupt corresponding to this condition. The effect of a SYSTEM statement is canceled by leaving the procedure in which it was executed, or by executing a REVERT or ON statement specifying the same condition. An interrupt on a particular condition may be simulated by the program by executing the statement INTERRUPT condition This has the same effect on OCAL as if the interrupt correspnding to condition had happened when the INTERRUPT statement was executed. Once an interrupt corresponding to a certain condition has happened, further interrupts for the same condition are inhibited until a DISMISS statement has been executed or until an ON, REVERT, or SYSTEM statement specifing the same condition is executed. C.7.12 SNOBOL Pattern Matching The pattern-matching statement in OCAL is taken directly from the SNOBOL string-processing language. The basic forms of the SNOBOL statement are: input /pe pe .../ input = st st /st .../ input pe pe ... = /st st .../ where "input" is a string or string variable, "pe"s are pattern elements (defined later), and "st"s are strings or string variables. The SNOBOL statement works in this manner: the input string is scanned from left to right for a match against the pattern elements in the given order. If a match is found and the = sign is present, matched pattern elements are replaced by the concatenation of strings "st" (if any). Pattern elements may be string constants, string variables or arbitrary strings found in the input string itself. Arbitrary strings are denoted by string variables bracketed by stars. For example: *A1* *HOLD* Arbitrary strings match any substring in the string input, including the null string. Arbitrary strings may be subject to a number of conditions. An arbitrary string designated *AA/3* will match a substring exactly three characters long. The general form of a fixed-length arbitrary string is *name/n* where "name" is a string variable and "n" is an integer or integer variable. An arbitrary string may be subject to the condition of containing a matching number of left and right parentheses. This condition is designated by *(name)* where "name" is a string variable. An arbitrary string may be subject to a condition specified by a. general logical procedure by using the form *name/proc.(argl,arg2,...)* where "name" again is a string variable, "proc" is a logical procedure, and the "args" are any procedure arguments. The "args" may specifically contain string variables which are substrings matched earlier in the SNOBOL pattern-matching statement. The logic procedure should return the value T! [true] if the proposed contents of name are satisfactory, N! [neither] if the proposed contents of name are not satisfactory because the string is too short, and the value F! [false] if the proposed contents of name are unsatisfactory for any other reason. If the logic procedure returns the value U! [undefined], the SNOBOL pattern scanner will take an ERROR return with the input string as the ERROR string. After the pattern match is complete, the arbitrary string-variable names contain copies of the strings they matched in the input. These names may be mentioned in the concatenation section of the SNOBOL statement or in any other statement following the pattern-matching statement. Note also that string-variable pattern elements may have the same name as arbitrary pattern elements matched earlier in the pattern-matching statement. This makes it possible to search the input string for non-overlapping repeats of an arbitrary pattern element. If the SNOBOL pattern match succeeds, the global logic variable SCANFLAG is set to T! (true]. Failure to find a match causes SCANFLAG to be set F! [false]. This condition can be tested by the IF or DO statements. C.8 INPUT/OUTPUT PROCEDURES Input/output procedures in OCAL will initially be limited to handling strings. Since the OCAL character set (ASCII) is quite general, strings can be converted to any other data type in OCAL. Conversely, output material can be converted to ASCII strings in OCAL. Two basic procedures are furnished with OCAL. They are: READ, (file/,terrain/) WRITE.(file,string) The argument file is either 'PTR', "PNCH1, 'TTY1 or 'namel name2' specifing photoelectric tape reader, paper tape punch, on-line teletype, or file names on backup storage (DECtape on the PDP-6). Only one file from backup storage may be open for reading and one file ophn for writing at a time. If the optional second argument "termin" is present in the READ call, the READ procedure returns as value the ASCII string of all characters up to and including the firs; match of the string termin. If termin is not present, the value of the READ procedure is all characters then in the input buffer. An end-of-file on backup storage is signaled by having the last character be ASCII character EOT. The second argument of the WRITE procedure is the output string. A file on backup storage may be closed by using the call CLOSE.(file) where "file" is a string 'namel name2' as described above. Examples: INP = READ. ('TTY1,'1215*212') will read one line from the on-line teletype, up to and including the Carriage-Return (#215) Line-Feed (#212). The resulting string will be placed in the string variable INP; WRITE. ('PNCH' ,OUT) will punch the contents of the string OUT on the paper tape punch; IN - READ. ('ALPHA DICT',' ') will read from backup storage file ALPHA DICT the first string up to and including a space. C.9 READER FUNCTIONS Special functions for using the READER data type are available in OCAL. The general form of these functions is $fn/fnfn.../.(readv) where the "fn"s are elementary reader functions and "readv" is a variable of reader type. The elementary reader functions are: C - Write one Character into a string if this appears on the left side of an assignment statement, otherwise read one character out of a string. V - Set the reader position to the integer Value if this appears on the left side of an assignment statement. Otherwise return the integer value, of the current reader position in characters from the head of the string. I - Increment the reader position which moves the reader one character position forward on the string. If an attempt is made to pass the end of the string, the global logic variable ENDSTRING is to T! [true]. Otherwise, the ENDSTRING is set to PJ [false]. If the "I" is on the left side of an assignment statement and an attempt is made to pass the end of the string, the string is extended one character position and the global logic variable EXTENDSTRING is set to T! [true]. In any other case EXTENDSTRING is set to F! (false], and attempts to pass the end of the string leave the reader position unchanged and set the ENDSTRING variable. D - Decrement the reader position which moves the reader one character position towards the beginning of the string. Any attempt to pass the beginning of the string will leave the reader position unchanged and the global logic variable BEGINSTRING set to T! [true]. If no attempt is made to pass the beginning of the string, BEGINSTRING is set to F! [false]. RI - Rotary Increment. This behaves like I [increment], except that passing the end of a string will position the reader at the beginning of the string. RD - Rotary Decrement. This behaves like D [decrement], except that attempts to pass the beginning of a string will position the reader at the end of the string. No global variables are altered by RI and RD. M - Mark. This notes the Current position of the reader on the string for future reference. P - Position. Return the reader to the position set by the last M [mark]. N - Initialize. Return the reader to the beginning of the string. A reader may be attached to a given string by calling the ATTACH procedure with ATTACH.(rdr,st) where "rdr" is a variable of the READER type and "st" is any non-null string. Example: (The following declarations hold throughout this example: R is a READER variable, S is a STRING variable, C and D are CHARACTER variables, and I is an INTEGER variable. The initial contents of S are 'LMNOPQ'.) ATTACH.(R,S) [attach reader R to string S] C = SC.(R) [set C equal to the character L] D = $IC.(R) [set D equal to the character M] I = $VM.(R) [set I equal to 2 and remember the value as a mark] $V.(R) = 4 [position the reader over the character O] $IC.(R) = D [replace the character P with the character M] $II.(R) [this will produce no value but will set the global logic variable ENDSTRING to Tl [true]. The reader will be left positioned over the character R] $IC.(R) = C [set the global variable EXTENDSTRING to T! [true] and will append the character L to the end of the string] $P.(R) [return the reader to the mark. The reader will be positioned over the first M on the string] $N.(R) [return the reader to the head of the string] As a result of previous reader functions, the string S will now contain 'LMNOMQL'. C.10 RESOURCE ALLOCATION Two resource allocation statements are available in OCAL. The statement ALLOT PUSHDOWNLIST n will allot "n" registers to the system push-down list where n is an integer or integer variable. The push-down list space allotment may be changed at any time, but an insufficient push-down list will cause a system interrupt. The statement LIMIT STORAGE n will cause a system interrupt after n words have been used from free storage. The number of words of storage used since the beginning of the current OCAS job is found in the global integer variable STORAGEUSED. Push-down overflow or storage-limit interrupt may be handled in OCAL by using the ON statement. These features allow the OCAL program to limit large searches or catch certain procedures that are in an infinite loop. The OCAS(On-Line Cryptanalytic Aid System) of Edwards is an on-line system designed to ease the work of a cryptanalyst. It contains a display generator, control program, and also a language OCAL (On-Line Cryptanalytic Aid Language). The language requirements include the need for manipulating strings of alphabetic characters and also for doing algebraic computations, including matrix operations. OCAL is primarily a synthesis of MAD and SNOBOL, together with some ideas from SLIP and PL/I, OCAL includes declarations and statements for sequence control, string pattern-matching, assignment, and error control. The exact number of all the programming languages still in use, and those which are no longer used, is unknown. Zemanek calls the abundance of programming languages and their many dialects a "language Babel". When a new programming language is developed, only its name is known at first and it takes a while before publications about it appear. For some languages, the only relevant literature stays inside the individual companies; some are reported on in papers and magazines; and only a few, such as ALGOL, BASIC, COBOL, FORTRAN, and PL/1, become known to a wider public through various text- and handbooks. The situation surrounding the application of these languages in many computer centers is a similar one. There are differing opinions on the concept "programming languages". What is called a programming language by some may be termed a program, a processor, or a generator by others. Since there are no sharp borderlines in the field of programming languages, works were considered here which deal with machine languages, assemblers, autocoders, syntax and compilers, processors and generators, as well as with general higher programming languages. The bibliography contains some 2,700 titles of books, magazines and essays for around 300 programming languages. However, as shown by the "Overview of Existing Programming Languages", there are more than 300 such languages. The "Overview" lists a total of 676 programming languages, but this is certainly incomplete. One author ' has already announced the "next 700 programming languages"; it is to be hoped the many users may be spared such a great variety for reasons of compatibility. The graphic representations (illustrations 1 & 2) show the development and proportion of the most widely-used programming languages, as measured by the number of publications listed here and by the number of computer manufacturers and software firms who have implemented the language in question. The illustrations show FORTRAN to be in the lead at the present time. PL/1 is advancing rapidly, although PL/1 compilers are not yet seen very often outside of IBM. Some experts believe PL/1 will replace even the widely-used languages such as FORTRAN, COBOL, and ALGOL.4) If this does occur, it will surely take some time - as shown by the chronological diagram (illustration 2) . It would be desirable from the user's point of view to reduce this language confusion down to the most advantageous languages. Those languages still maintained should incorporate the special facets and advantages of the otherwise superfluous languages. Obviously such demands are not in the interests of computer production firms, especially when one considers that a FORTRAN program can be executed on nearly all third-generation computers. The titles in this bibliography are organized alphabetically according to programming language, and within a language chronologically and again alphabetically within a given year. Preceding the first programming language in the alphabet, literature is listed on several languages, as are general papers on programming languages and on the theory of formal languages (AAA). As far as possible, the most of titles are based on autopsy. However, the bibliographical description of sone titles will not satisfy bibliography-documentation demands, since they are based on inaccurate information in various sources. Translation titles whose original titles could not be found through bibliographical research were not included. ' In view of the fact that nany libraries do not have the quoted papers, all magazine essays should have been listed with the volume, the year, issue number and the complete number of pages (e.g. pp. 721-783), so that interlibrary loans could take place with fast reader service. Unfortunately, these data were not always found. It is hoped that this bibliography will help the electronic data processing expert, and those who wish to select the appropriate programming language from the many available, to find a way through the language Babel. We wish to offer special thanks to Mr. Klaus G. Saur and the staff of Verlag Dokumentation for their publishing work. Graz / Austria, May, 1973 |