INFOL(ID:3088/inf003)

Querying language 


for INFOrmation Language

Data querying language, T WIlliam Olle, CDC 1965


Related languages
INFOL => UL/1   Evolution of

References:
  • 3600/3800 INFOL. Reference Manual, Control Data Corporation No. 60170300, published 1966. view details
  • Olle, T.William "INFOL: A generalized language for information storage and retrieval applications", Proc. 3rd Annual National Colloquium on Information Retrieval, May 1966, pp. 177-190. view details
  • Olle, William "UL/1: a non-procedural language for retrieving information from data bases" view details Abstract: The differences between procedural and non-procedural languages are discussed and a case made for a continued trend towards the non-procedural. Three levels of user interface are defined for the information retrieval language and the role of each user discussed. The main part of the paper describes a user-oriented language developed for data base problems.  It has four divisions: establishment, interrogation, update and revision. An important component of all divisions, called the criterion language, is used for retrieval, validation and update. The criterion language permits a user to express conditions on several properties of a data item, including existence, value, picture, length and content. Extract: INTRODUCTION
    1. INTRODUCTION
    The diversity of terminology in current use causes earlier work in the area discussed in this paper to be classified under a number of headings. Generalized file processing, data management, data base management and information retrieval are phrases all of which are used as descriptive terms for software products which have contributed to the concepts to be discussed.
    Data management is perhaps the most ambiguous term since it is used by IBM in the 360 operating system concepts [1] to refer to that part of the operating system which handles the transfer of data files and data records without specific regard for their information content. On the other hand, the Systems Development Corporation's time-shared data management system [2-4] is a system which is designed to permit a non-programmer to extract any information contained in a data base without giving him any control over the way in which data are stored or transferred in the data base. To avoid confusion, the term data base management is preferred to describe the SDC system, reserving data management for the IBM usage.
    Extensive work has been done in the area of data base management and the most useful evaluations are contained in the proceedings [5,6] of the two Systems Development Corporation symposia on data base management. This author, in an earlier paper [7], classified systems which could be called generalized first in terms of the logical level of enquiry possible and secondly,

    for systems which are on the most complex level of generality   in terms of the five main design features: language, data structure, implementation mode, data base storage medium and internal data organization.  The latter feature is referred to as storage structure throughout the paper.
    Extract: DATA DESCRIPTION
    2. DATA DESCRIPTION
    Although the present paper is primarily concerned with the language features, data structure and storage structure require an explanation in order to clarify the language discussion. Historically, using procedural languages, there has been an evolvement along functional lines. A program to carry out a specific function has one or more files associated with it. It is necessary that the program contains a description of the data as it is organized in each file. Hence, the description of the data resides completely with the program.  Each time the file is accessed, the data description must be fed in; whether or not it is compiled each time is immaterial.
    The procedural language programmer is completely aware of the storage structure, since he is required to specify it in his program.  Therefore, it has never been necessary to identify or separate the meanings of internal storage structure and external data structure.  The pioneering work of Bachman and Williams with IDS [8] over the past five years has set the trend in the direction of a separation. In IDS/COBOL [9], as implemented for the General Electric 600 series, part of the data description is stored with the file and part with the COBOL program. Although some IDS users have developed schemes   or storing the whole data description with the file, the COBOL programmer who wishes to optimize his use of the file needs to understand how the IDS scheme is implemented. IDS, which was conceived to permit more effective use of disc storage, establishes the trend away from the purely functional. There may be several programs, and hence several programmers, using the same data base. The COBOL language is enhanced with certain non-procedural commands to permit the programmer to access records through pre-stored chains.
    Other systems such as TDMS [2-4], GIS [10, 11], INFOL [12, 13], MANAGE [14], and the Bolt, Beranek and Newman system [15] go further away from functionalism by having the complete data description stored with the file, so that it is ho longer necessary for each person accessing the file to specify its description. Data description data are best thought of as all data in the file which are not part of the actual file data but which help to organize the actual file data. Data description data would include pointers, links, separators, primary and secondary indexes and all the representations of which a user need not be aware in order to access a file using a non-procedural language.
    Data structure and storage structure are separate concepts. The first is something of which the user must be aware. The fact that a given data item is part of a repeating group, which may occur several times for each occurrence of a higher level group, is a data structure concept. The sophisticated user may find it useful to be aware of certain facets of storage structure, for instance the fact that a secondary index has been formed and is maintained with respect to a given data item, facilitates a rapid answer to questions based on that data item.
    Different systems give the user different levels of control over the storage structure.  IDS/ COBOL, being an essentially procedural language, gives the user almost complete control. INFOL, TDMS, and the Bolt, Beranek and Newman system, which are completely non-procedural, give the user no control. In many applications, this is not a serious restriction. Responsibility for an efficient storage structure rests with the implementors of the above systems and the non-procedural programmer/user does not necessarily have to concern himself with it.
    Extract: DATA DESCRIPTION
    5.  LANGUAGE COMPONENTS
    A language, called UL/1, embodying the principles discussed above has been designed and is the main topic of this paper. UL/1 stands for User Language/1 where the name is intended to emphasize that the language is not designed for programmers but for non-programmers, more conveniently referred to as users.
    Although the principal components of the language have already been mentioned in the preceding discussion, they are described  in more detail in this section. It is convenient to follow COBOL in regarding the complete language is consisting of a number of divisions, with each division having several sections.
    The divisions are establishment, interrogation, update and revision, where establishment and revision can be regarded as privileged and for the possible exclusive use of the data base administrator.  These division names follow those of INFOL [12] where they are called major phases.
    Unlike in COBOL, certain sections can be used in two or more divisions. The concept of a program comprising all of the four divisions, as in COBOL, does not hold either. A run may consist of the use of one of the four divisions although it is reasonable to implement in such a way that interrogation and update can be carried out in one access to a file.  This was, in fact, implemented in INFOL for the Control Data 3600 and 3800.
    5.1.  Establishment
    As described in section 4, establishment is a formal file-oriented process which results in a file being added to a data base in a form standard to the system. Standard in this sense means that the data description data are stored with the file - or at least in the data base - so that it can be used in an interrogation or update to  the file. The establishment division consists of sections to specify an identification for each data item, a type such as alphanumeric or numeric for each data item, a validation criterion to edit data entering the file, lists of the expanded forms for certain data items which may be abbreviated in the file in a coded form and a specification of the data structure relationships between data items. Establishment also requires the provision of the set of data records which comprise the first edition of the file.
    5.2.  Interrogation
    The process of interrogation is that most frequently described in papers dealing with generalized data base systems, and is hence the easiest to discuss. In a truly generalized system, the user'should be able to ask any questions of the data base.  Simply expressed, this involves placing criteria on any data item in the record and extracting any set of data items for the records which satisfy the overall criterion specified. The extraction process may result in a report containing values extracted from the  data base, a frequency count of the number of  item values
    in certain classes, or, on the simplest level, a count of the number of records which satisfy the criterion placed on the record.
    When values are extracted, as opposed to counts of values, they may be included in a printed report. As indicated in section 4, the format of the report may be standard for the system or it may be user'specified.  Furthermore, the extracted values may be included in a mechanized sub-file which is to be used as input to a program written in a procedural language such as COBOL or FORTRAN.
    5.3.  Update
    The updating of a data file is a process which depends considerably on implementation mode, although the update language need not. Updating may take place on the record level, data item level, or on the level of a character string contained in a data item. The facility to modify on the character string level is one of the links between data base systems and text editing systems. If the data base system can handle long character strings, such as whole documents, and the character level updating facilities are powerful enough, then two systems which are usually separately conceived, such as IBM's Datatext and GIS, could well be merged.
    In updating, which involves deleting or modifying existing records, a record is identified by specifying the value or values of one or more special data items. These data items, usually only one, are such that the value set in each record is unique in the file. Examples of such data items are social security number, employee number and part number. It is a less frequent requirement to be able to update a file selectively by specifying a criterion which a record must satisfy in order for an update to take place. The facilities for specifying an update criterion are identical to those for specifying a retrieval criterion.
    5.4.  Revision
    The concept of revision is the most novel of the four divisions, but it is nevertheless significant and must not be confused with update. It should be regarded as privileged and for possible exclusive use by the data base administrator. It is closely allied to establishment in the sense that many of the sections are the same. In revision, the user may add or subtract data items for each record.  This is not the same as changing the values of already defined data items, which is an update function. Another revision function may be to alter the structure as was previously defined in the establishment division or in a previous revision. It is also possible to redefine the validation criterion which was previously specified. Revision does not require data to be specified to the file as is done in establishment and update.  The revision division may result in a change to the data description data stored with the file. It may also result i   the removal of actual data from the file if certain data items are removed. A change to the validation criterion which makes the criterion more stringent can have the result that data previously put in the file no longer satisfies the validation criterion.  This is a problem for the data base administrator. Any data which is considered invalid can be changed or removed in an update division. Conceptually, however, it is desirable that a revision division, which changes the validation criterion, should result in a report listing all data in the file which have become invalid. Extract: DATA DESCRIPTION
    6.  CRITERION LANGUAGE
    An important section which   may be used in all divisions is the criterion language. In the above description of the separate divisions, reference was made in the discussion on establishment and revision to a validation criterion, in the interrogation division to a retrieval criterion and in the update division to an update criterion.  The fact that the facilities.for specification of update criteria can be the same as those for retrieval criteria, was recognized in the design of INFOL, TDMS, and apparently in GIS. However, in each case there are separate facilities for validation, or editing as it is often called   for data entering the file. This is a considerable waste both in terms of language design, implementation and loss of potential power to the user. Most systems validate data items in terms of their form using an editing mask or picture, while retrieval and updating take place depending on the satisfaction of conditions placed on value. Logical complexity of concatenated criteria is usually possible for retrieval, and therefore for update, but not in validation. By incorporating i: to the criterion language the facility for placing criteria on form as well as value, a powerful language is available for all three purposes.
    To outline in more detail the concepts embodied in this idea of a criterion language, the more important properties of a data item, on which conditions may be place, are described.
    6.1. Existence
    Before a condition on other properties of a data item can be evaluated for a record, it must be certain that it is present in that record. In validation, an existence criterion would mean that the data item is required to exist for the record to enter the file. Assuming the very desirable language and system property of handling incomplete data, then there is a requirement for specifying explicit conditions on the existence and non-existence of data items. In validation, an existence criterion, logically connected with criteria on values and on other properties, is a powerful tool for ensuring the correctness of the data in the file.
    6.2.  Value
    Procedural languages have been developed largely for the handling of values of data items. The value is indeed the most important property. If a data item has no value, then conceptually it also lacks most other properties. Criteria on the value of a data item are relational, which means that the relationship of the data item to a user'specified reference quantity is tested.  To specify such a criterion, six standard relational operators are permitted:
    equals                            EQ
    does not equal                NE
    greater than                   GT
    less than                       LT
    greater than or equal to GE
    less than or equal to       LE
    6.3.   Picture
    Criteria on a data item's picture are most useful to the data base administrator in developing a validation criterion which each record must satisfy for admission to the file. However, the facility to give a choice of two or more pictures, one of which a data item must satisfy, is achieved by using the logical connector OR which is required in the criterion language in any case. This effect can be achieved in the Bolt, Beranek and Newman system [15]. A picture criterion must use only the relational operator EQ.
    6.4.  Length
    Criteria on the length of a data item are useful in the validation criterion and also to the data base administrator who can obtain information about the distribution of lengths of data items in the file in order to exercise available facilities for organizing the file. Being able to place criteria on the length of a data item implies that the system handles variable length data. The length of a data item may vary within a maximum prespecified by the data base administrator in the validation criterion, otherwise within the overall limits imposed by the language implementation. A length criterion may use any of the six standard relational operators.
    6.5.  Repeats
    Some data items are single-valued, such as a date of birth or an employee number   others are multiple-valued, such as a skills profile or a descriptor set.  The number of values  or number of repeats in a multiple-valued item is identified as the repeats.  In INFOL, this was called TOTAL which caused confusion since it implied some form of summation. In COBOL, PL/ 1 and GIS the maximum permitted number of repeats of a multiple-valued item or of a repeating group has to be specified. Conceptually there is no reason why specification of a maximum should be required, although it is easier to implement.  The data base administrator may limit the number of repeats for a multiple-valued data it 'm in the validation criterion. A specifier user may place a retrieval criterion on, for example, the number of languages spoken by an employee where this number is not an explicit data item. Again any of the six relational operators may be used.
    6.6.   Content
    Criteria on the content of the value of a data item are not particularly useful in validation criteria. In interrogation, and to a much lesser extent in update, the facility to ask v nether a value, such as that of document title   contains some substring is extremely useful.  This facility is available in GIS, but not in INFOL or TDMS. As with existence criteria described earlier, it is important to be able to state the condition both positively and negatively. Relational operators are not used with content criteria but with the special forms CONTAINS and DOES NOT CONTAIN.
    6.7.  Other properties
    There are other properties on which it is useful to place criteria.  Most of these are properties of multiple-valued items only. One facility is the ANY facility available in INFOL, which reduces the effort required to specify certain otherwise lengthy criteria.  For example, to find a document having any two of a set of four classification descriptors would require the listing of the six different pairings of descriptors possible. It is also desirable to place criteria on quantities derived from the set of values of multiple-valued numeric items. Such quantities may
    be derived using a system-supplied procedure such as SUM or MEAN exactly as in INFOL.  In addition, a user'supplied procedure may be invoked as the subject of the criterion.
          in Morrell, A. J. H. (Ed.): Information Processing 68, Proceedings of IFIP Congress 1968, Edinburgh, UK, 5-10 August 1968 view details
  • Stock, Karl F. "A listing of some programming languages and their users" in RZ-Informationen. Graz: Rechenzentrum Graz 1971 124 view details Abstract: 321 Programmiersprachen mit Angabe der Computer-Hersteller, auf deren Anlagen die entsprechenden Sprachen verwendet werden kennen. Register der 74 Computer-Firmen; Reihenfolge der Programmiersprachen nach der Anzahl der Herstellerfirmen, auf deren Anlagen die Sprache implementiert ist; Reihenfolge der Herstellerfirmen nach der Anzahl der verwendeten Programmiersprachen.

    [321 programming languages with indication of the computer manufacturers, on whose machinery the appropriate languages are used to know.  Register of the 74 computer companies;  Sequence of the programming languages after the number of manufacturing firms, on whose plants the language is implemented;  Sequence of the manufacturing firms after the number of used programming languages.]
          in Morrell, A. J. H. (Ed.): Information Processing 68, Proceedings of IFIP Congress 1968, Edinburgh, UK, 5-10 August 1968 view details
  • Sammet, Jean E., "Roster of Programming Languages 1972" 133 view details
          in Computers & Automation 21(6B), 30 Aug 1972 view details
  • Stock, Marylene and Stock, Karl F. "Bibliography of Programming Languages: Books, User Manuals and Articles from PLANKALKUL to PL/I" Verlag Dokumentation, Pullach/Munchen 1973 297 view details Abstract: PREFACE  AND  INTRODUCTION
    The exact number of all the programming languages still in use, and those which are no longer used, is unknown. Zemanek calls the abundance of programming languages and their many dialects a "language Babel". When a new programming language is developed, only its name is known at first and it takes a while before publications about it appear. For some languages, the only relevant literature stays inside the individual companies; some are reported on in papers and magazines; and only a few, such as ALGOL, BASIC, COBOL, FORTRAN, and PL/1, become known to a wider public through various text- and handbooks. The situation surrounding the application of these languages in many computer centers is a similar one.

    There are differing opinions on the concept "programming languages". What is called a programming language by some may be termed a program, a processor, or a generator by others. Since there are no sharp borderlines in the field of programming languages, works were considered here which deal with machine languages, assemblers, autocoders, syntax and compilers, processors and generators, as well as with general higher programming languages.

    The bibliography contains some 2,700 titles of books, magazines and essays for around 300 programming languages. However, as shown by the "Overview of Existing Programming Languages", there are more than 300 such languages. The "Overview" lists a total of 676 programming languages, but this is certainly incomplete. One author ' has already announced the "next 700 programming languages"; it is to be hoped the many users may be spared such a great variety for reasons of compatibility. The graphic representations (illustrations 1 & 2) show the development and proportion of the most widely-used programming languages, as measured by the number of publications listed here and by the number of computer manufacturers and software firms who have implemented the language in question. The illustrations show FORTRAN to be in the lead at the present time. PL/1 is advancing rapidly, although PL/1 compilers are not yet seen very often outside of IBM.

    Some experts believe PL/1 will replace even the widely-used languages such as FORTRAN, COBOL, and ALGOL.4) If this does occur, it will surely take some time - as shown by the chronological diagram (illustration 2) .

    It would be desirable from the user's point of view to reduce this language confusion down to the most advantageous languages. Those languages still maintained should incorporate the special facets and advantages of the otherwise superfluous languages. Obviously such demands are not in the interests of computer production firms, especially when one considers that a FORTRAN program can be executed on nearly all third-generation computers.

    The titles in this bibliography are organized alphabetically according to programming language, and within a language chronologically and again alphabetically within a given year. Preceding the first programming language in the alphabet, literature is listed on several languages, as are general papers on programming languages and on the theory of formal languages (AAA).
    As far as possible, the most of titles are based on autopsy. However, the bibliographical description of sone titles will not satisfy bibliography-documentation demands, since they are based on inaccurate information in various sources. Translation titles whose original titles could not be found through bibliographical research were not included. ' In view of the fact that nany libraries do not have the quoted papers, all magazine essays should have been listed with the volume, the year, issue number and the complete number of pages (e.g. pp. 721-783), so that interlibrary loans could take place with fast reader service. Unfortunately, these data were not always found.

    It is hoped that this bibliography will help the electronic data processing expert, and those who wish to select the appropriate programming language from the many available, to find a way through the language Babel.

    We wish to offer special thanks to Mr. Klaus G. Saur and the staff of Verlag Dokumentation for their publishing work.

    Graz / Austria, May, 1973
          in Computers & Automation 21(6B), 30 Aug 1972 view details