Polish notation algebraic language 

Language used at Johns Hopkins Applied Physics Laboratory Feature prefixed operator (Polish) notation.

Designed to be a natural way of coding, using PN and an algebraic subset.

Used at the Hopkins APL for the Tabledex system

  • Rich, R P, "APT, a common computer language" pp141-159 view details
          in Goodman, Richard (ed) "Annual Review in Automatic Programming" (2) 1961 Pergamon Press, Oxford view details
  • Barron, D. W. review of Goodman, Richard (ed) "Annual Review in Automatic Programming", Vol. 2 view details Abstract: This is the second volume in the series produced under the auspices of the Automatic Programming Information Centre at Brighton. It contains a series of independent papers in two groups, one concerned with scientific programming languages, the other with commercial programming languages. The Editor's aim "to exhibit current trends by a sample collection of original reports," is only partially achieved by a disjointed series of papers of widely varying standards. Some important trends are not mentioned, but there is a promise that the omissions will be rectified in a later volume. Extract: APT
    Also of great interest is "APT, a common computer language," by R. P. Rich of the Johns Hopkins University. The interest here lies in a complete departure from the general trend towards algebraic languages, in favour of the Polish notation. This makes for easy compiling, of course, but this is not the primary object: Polish notation has been chosen because it is thought that a programming language should be "a compromise between the machine and the programmer, not between the machine and the problem sponsor." The author puts forward the heretical view that "the departure from mathematical ?  notation is of small consequence if the language is easy for a trained programmer to learn and use," which seems rather unrealistic in view of the current shortage of programmers.
          in The Computer Bulletin June 1962 view details
  • Bemer, R "ISO TC97/SC5/WGA(1) Survey of Programming Languages and Processors" December 1962 view details
          in [ACM] CACM 6(03) (Mar 1963) view details
  • Olmer, Jane and Rich, Robert "A Flexible Direct File Approach to Information Retrieval on a Small to Medium Size Computer" view details Abstract: Much has been published on the theories of information retrieval and various means of handling and expediting the ever-growing accumulations of information which are swamping our civilization. Not much has been said, however, about any attempt to make use of a small
    to medium scale computer as a tool. This is an eighteen months' progress report of such an attempt undertaken at the Johns Hopkins University Applied Physics Laboratory by a team consisting of computer people and librarians.
    Dr. Robert P. Rich, head of the University Computing Center, and the speaker represented the computer viewpoint. Mr. Fenton L. Kennedy,
    head of the APL Document Library, and Mrs. Mary E. Brown are the librarians on the team. Mr. Robert A. Lambert, formerly of the
    APL Document Library, took part in the early stage of development. Extract: The Search Program
    The Search Program
    The heart of the retrieval of the information on the computer is search Program PARSE (Paragraph Search Extended). The earlier version PARS, written for the 4K computer, has been incorporated into this program with a few vital features such as an input tape label check and tape read and write error subroutines as well as some new chimes and gongs.
    This program searches a master file on tape in a manner specified by one or a series of search records. The latter are read into the computer on punched cards, one or more cards for each search.
    A combination of descriptors and logical operators appear in Polish notation in the search record. (Examples of this notation are given below.) They also may be preceded and/ or followed by free text. A logical operator consists of a single character symbol preceded by a dollar sign ($). The $ preceding an operator replaces the / following a descriptor when an operator happens to follow a descriptor in a search record.
    In addition to the logical operators "not" ($N), "and" ($A) and "or" ($O), three magnitude operators may be used, e.g., "equal" ($E), "less than or equal" ($L), "greater than or equal" ($G), as shown in Figure 3. All operators except the "not" are binary operators. The first two operators shown are each applied to the next two rightmost descriptors or results of operations. The magnitude operators are applied to pairs of descriptors in the search record. First, an identical descriptor is sought in the master file for the right hand member of the pair. If it is found, the magnitude test is applied to the next left descriptor in the master file, comparing it with the left hand member of the pair in the search record. For example, all descriptors written immediately to'the left of the descriptor ;YEAR/ in the master file may be tested to find those which are less than or equal to 1960 by writing
    in the search record.
    Magnitude for descriptors affected by operators $L and $G is based on the IBM collating sequence and may include both numerics and alphabetics as well as special printing characters except, of course, the / and $. All numerical descriptors written as the left hand member of such a pair in the master file or search record must consist of a set number of digits specified for the particular right hand member. Any pair of words in the master record regardless of length may be sought separately or linked by the operator $E. A string of several words may be linked by using an operator $E for each successive pair, repeating words as necessary.
    The advantage of the Polish notation in applying logical operators to combinations of descriptors in the search question is the avoidance of nested parentheses and of ambiguity.
    For example, in Figure 4, the formula
    (A + B) x C
    may be written in Polish notation as
    where, since all operators such as x and + are binary operators, examtning the expression from right to left clearly indicates that the pair of terms A and B are linked by the operator +, the resulting term and C are linked by the operator x.
    Using the set of operators described above for PARSE, referring to sucessive lines of Figure 5,
    would constitute a search for all records containing both terms "subsonic" and "test."
    describes a search for all records containing either one term or the other or both.
    would produce a listing of all records containing "test" but not "subsonic." If it were desired to obtain all records containing /SUBSONIC/ TEST/ in that sequence and to avoid such false drops as /SUBSONIC/FLOW/ or /SUPERSONIC/ TEST/, the searcher would specify $E/SUBSONIC/TEST/.
    On the other hand, to find /SUBSONIC/ and /FLOW/ in juxtaposition we write $E/SUBSONIC/FLOW/.
    The operators $L and $G permit making more
    specific searches for ranges of values such as
    given years, temperatures, accession numbers,
    ete. For example,
    in a search card would create an output file of all records on the input tape which contained the terms /SUBSONIC/FLOW in that order or else a mach number of 00.55 or less, provided each report was dated 1960 or later. The English language description of the question may be included as free text in the search cards.
    Our experience shows that a user, a librarian, for instance, easily acquires mastery of this techniqoe of writing questions, At a demonstration of the program at APL, about 30 visitors from local special libraries and other installations were taught the method and produced 10 questions to be processed against the unclassified file of the APL dociunents library.
    All of these questions, some of which mere fairly complex, were found by the computer to be grammatically correct and were correctly processed. A simple means of checking grammatical accuracy included in the half-hour workshop was as follows-counting from the right, add one for each term and subtract one for each operator. If the sum ever falls below one or the final total is not exactly one, the expression is erroneous.
    The program provides also for a vacuous search. That is, a search record consisting entirely of free text will seek out all master file records consisting entirely of free text (no descriptors). By error or design, records not yet processed completely could be added to the file, and it might be worthwhile to print these out in their entirety, or to provide a listing of, say, accession numbers only. This could be accomplished by introducing into the PARSE program any search question not containing a slash-any expression from a single character to a paragraph of explanation. The appropriate parameter card would enable the PRINT or another printing program to produce the entire record or a selected part of it.
    A practical method for cutting down search time consists of batching a number of searches by use of the logical operator $0. A single pass over the entire master file yields, usually, a considerably smaller tape file against which the individual search questions may then be processed. A special, much faster, single term search program SCAN has also been written to subdivide a master file and/or to reduce multi-reel to single reel files.
    The search record is identified and printed on line as well as being written on an output tape. (See Figure 6 for an example of an online printout.) It is followed on the output tape by each master record which constitutes a hit. The number of such hits is printed on line and the master tape is rewound.
    The next search is then undertaken, or if there are no more searches, the output tape is rewound. An option permits writing of successive searches on separate tapes if desired. An input label check and creation of an output label are included, but options permit bypassing this subroutine or overriding it. (Figure 7 shows a portion of such an output tape printed, including as the first hit record the familiar one created from the 3x5 card of Figure I and the preprinted sheet of Figure 2.)
    Extract: Computer Processing of the Search
    Computer Processing of the Search
    The computer program PARSE starts out by measuring the length of the program (in case new features have been added; it is designed to permit adding subroutines for additional operators, for example) and the size of the computer memory being used. The storage space available is computed including the area in which this part of the program lies since it will wipe itself out before proceeding to the main part of the program. Half of this storage area is assigned for containing the master file record as it is read in, half is assigned for the search record to be assembled from one or more successive punch cards, unless this is not enough space for the longest file record as specified by a parameter card. In this case the file record is assigned the needed space and the remainder is assigned to the search record.
    The parameter card also may designate specific areas to be used if desired or may, once the program becomes a production run, specify that a preset constant be used as the maximum file size. Any attempt by the human to do something irrational such as to process a record longer than the number of characters available on the computer, leaving no room for the search record, will be rejected by the program with an angry message and a wind-up of tape.
    It is at this time that the label of the input tape may be checked if memory space permits reading the first record of the tape. If there is not enough space this is bypassed. Then all this preliminary coding erases itself and the search proper begins.
    The search record is assembled from successive cards, squeezing out marginal spaces, inserting a space between the ending word of one card and the first word of the next card unless the first card ended with a / (no space inserted in order not to affect descriptors). A character by character scan records the location of the left-most and right-most slashes so that subsequent search scans need not waste time on text other than the descriptors. Word marks are placed under slashes to permit picking up an entire term at a time from right to left, permitting efficient processing on the IBM 1401.
    An operations table is created of a string of symbols consisting of a zero for each descriptor and a copy of each logical operator symbol from the search record. Here again one proceeds from right to left. This table permits use of the conditional branch instruction on a single designated character to select the most efficient path through the appropriate subroutines during the actual search of each file record. The editing of the search record and creation of the operations table are carried out only once for searching an entire file.
    Next, a dummy file record consisting of a single slash is created and a pseudosearch is made. This, together with a couple of other tests, automatically checks the grammar of the search question. Specific error codes will be printed on line if applicable.
    Each record on the master file is edited in a manner similar to that described for the search record. Guided by the operations table, each term in the search record is checked in the master file and the results recorded in a funclion table, a zero if the term is not found, a one if it is. Each time a logical operator is encountered, the last two digits in the function table are combined. Depending on the operator the resnlt of the combination is recorded as a single zero or one. Since each term increases the size of the function table by one character and each operator decreases it by one, the table can never contain less than one character and must, at the end of the search, contain exactly one. This corresponds to the grammatical check mentioned earlier.
    If the final character in the function table is a one, a hit is recorded and the record copied onto the output tape. If the character is a zero, the next file record is processed. Any other character would lead to an alarm printout and discontinuance of that search. Efficiency is increased by looking ahead two characters in the operations tahle after each new entry in the function table. Thus the program can skip over one term for the "and" ($A) operator when a previous no-hit has been recorded. Similarly, the search is shortened when an "or" ($0) operator follows a hit. When a magnitude operator is encountered it is processed appropriately.
    At the end of each complete search of a file the program winds up the input tape. If another search card follows, the program begins the next search, starting once more by assembling all cards of the next question as previously described. If an option has been exercised by depressing a panel switch, the search results are written on separate tapes. Otherwise the output files follow each other on the same tape, each being identified by the search record at its head.
          in [AFIPS JCC 24] Proceedings of the 1963 Fall Joint Computer Conference FJCC 1963, Las Vegas, Nev., Nov. 1963, view details
  • Ashby, L. D. Review of Olmer and Rich view details Abstract: This article constitutes a description of a set of programs to be used in conjunction with an IBM 1401 computer to create, maintain and update a file on a single reel of magnetic tape and to retrieve information from the file as effectively as pos~ible.

    The characteristics built into the system include: simplicity for user without computer orientation; output to be directly usable without reference to other documents; outputs to be acceptable as input files for further increasingly restrictive searches and corollary editing and printing routines able to accept all files.

    The problem is, in general, divided into two parts: a semantic part (choice of material and choice of terminology for inclusion in the file); and a computer syntactic part. It is concluded that the lion's share of the development expense is attributable to the semantic problems left to the librarians.

    The direct file approach was used. That is, in the master file there is a single record for each document, including search terms and any text such as abstracts and bibliographic information. This stands in contrast to the more complex alternative use of a record per term and cross reference to a master file with the desired information. The authors feel that the simpler direct file approach with its simpler programming has justified itself for small or medium-sized computer use. Though relatively simple the programs incorporate automatic grammar check on search questions and compatibility with Tabledex, a new IBM 7094 indexing system.

    There has been about 18 months experience with the system at Johns Hopkins' Applied Physics Laboratory, the University of Texas M.D. Anderson Hospital, and Western Electric Company Engineering Research Center. Among the important findings have been: 1) a user (for example a librarian) easily acquires mastery of the technique of putting questions to the file; 2) timing runs have been made which show each of the runs to be highly dependent on the length of the master file (3 to 10 records were searched per second for single term search and about 2 records per second for more complex questions); and 3) the cost of processing one scientific report up to the point of search is estimated at $4.00.

    In conclusion the authors claim that a contribution has been made to the solution of the syntactic part of the information retrieval problem. A flexible, efficient and economical program exists whereby a medium-size computer can accommodate files of limited size. At the same time it is designed to permit further expansion and development.
          in ACM Computing Reviews 5(04) July-August 1964 view details