SPECOL(ID:3229/spe008)Querying languagefor Special Customer Oriented Language HIgh level declarative language to expressive sets etc in natural language References: External link: Online copy in The Computer Journal 11(2) August 1968 view details Extract: The approach to a solution The approach to a solution In the last few years there has been an increasing interest in general programming systems which attempt to solve this problem by providing some special language or program which takes advantage of the basic similarity of all tasks which consist of extracting and presenting data. Within this broad classification we can however define two main divisions. These we may call Technical Information Retrieval (TIR) exemplified by the retrieval of relevant abstracts from scientific journals and Management Information Retrieval (MIR) exemplified by sales analyses. Table 1 displays the difference between the two types of retrieval situation. A typical TIR request might be: 'Print the complete abstracts of any article written in 1969 and appearing in the Operational Research Quarterly or the Journal of the 0. R. Society of America which deals with the use of simulation or model building in marketing or with the implementation of marketing models, but excluding any article written by J. Smith or any article referring to Dynamic Programming Techniques.' A typical MIR request: 'Print the customer, product, date, sales reference number and value of all invoices with a value of less than £10 for product group X, area Y, during May. Compute the gross margin as a percentage of invoice value and print this next to the value. Sequence the results by date within product within customer and give totals of values and average of gross margin on change of customer or product.' Although most information retrieval programming systems are capable of work in both areas, the impetus for their creation has usually come from one in particular. FIND (I.C.L., 1966) and SPECOL (Smith, 1968), for example, are aimed primarily at Technical Information Retrieval. Systems directed at Management Information Retrieval include NITA and FILETAB (N.C.C., 1969), CRESTS (Craig, 1966), the Sales Management Information Retrieval System of the Metal Box Company (Gearing, Reynolds & Sears, 1968) and all R.P.G. Systems. in The Computer Journal 13(2) 1970 view details in The Computer Journal 13(2) 1970 view details Basic philosophy The basic philosophy of SPECOL is that it should be seen as a bridge between natural language and the language of logic and sets. Natural language, although adequate for normal discourse, is too ambiguous for putting questions to a computer; the language of logic and sets, on the other hand, although ideal for the computer and the specialist, is not so acceptable to the ordinary person. In order to bridge the gap between the two languages, SPECOL exploits two features that are common to both languages, Jirstly the almost identical use of the words, AND, OR, and NOT; and secondly the way in which data is often classified or addressed in groups. The first feature is important, since it can be shown that by using AND, OR and NOT, it is possible to specify literally any conceivable combination of data. The second feature, the grouping of data into classes, is also important, since it can be shown that by attaching names or descriptors to the classes, it is possible to address vast amounts of data that otherwise might not be accessible at all. Putting information into classes is also an excellent way of organising one's thoughts. Take, for example, the game of Twenty Questions, in which, by asking a few well-chosen questions on classes, it is possible to range over the entire universe, and yet, still light on one required fact thought up by one of the players. In logic, classes are referred to as sets and sub-sets of data; in SPECOL we think of them in terms of a book, that is consisting of paragraphs and sentences and so on. The following is a full list of the terms and their data processing equivalents. [...]The format is extremely flexible. There may, for example, be any number of sentences in a paragraph and any number of paragraphs in a record. Headers and sentences, too, may be of varying lengths, although as shown in the diagram these are usually mentally padded up to some maximum length. As another instance of flexibility, not all classes have necessarily to appear in all files: some files, for example, consist entirely of headers and here the paragraph concept would not apply. Likewise, the boundaries of the classes may be thought of as quite flexible and may be changed mentally so to speak without physically altering a file. Certainly, one of the delights of set theory is that if we do not like the universe we are in we can change it; what we think of today as chapters and paragraphs in a file, we may think of tomorrow as paragraphs and sentences. It is this concept of movable boundaries, and of looking at data as if it were part of a book, i.e. in terms of paragraphs and sentences that is so important to SPECOL and allows it to be used on almost any file. example, be any number of sentences in a paragraph and any number of paragraphs in a record. Headers and sentences, too, may be of varying lengths, although as shown in the diagram these are usually mentally padded up to some maximum length. As another instance of flexibility, not all classes have necessarily to appear in all files: some files, for example, consist entirely of headers and here the paragraph concept would not apply. Extract: New features New features The following are some of the more important features that have been incorporated in the language since the publication of the original paper. Multiple Questions The first SPECOL compiler allowed only one question to be asked for each pass of the data. The latest compiler allows several questions to be asked at a time, the precise number depending on the amount of core space available. Each question requires about 6K positions of core. Allowing therefore for a 20K compiler plus a maximum record size of, say, 20K, it should be possible in a core partition of 100K to ask up to 10 questions per run. With a partition of 200K, it would be possible to ask 26 questions. The restriction that multi-SPECOL places on record size is because, unlike single SPECOL which deals completely with one line of data at a time, multi-SPECOL requires the lines again for each subsequent question. Maximum record size can be increased by making more core space available or by using drum or disk back-up. There is of course no limitation on file size. There are good reasons for using both single-and multi-SPECOL and it is envisaged that both versions of the program will be maintained. Variable length field names Field names may now be any combination of one to eight characters. They must still, however, begin with a letter. New mode number Mode 4 has been added to the list of Mode numbers indicating what part of a record is to be selected for output. Mode 4 indicates that only sentences that contain matched data are to be saved. Mode 1 has a similar meaning for headers, and Mode 2 for paragraphs. Mode 3 indicates that a whole record is to be saved. Extract: The connective The connective ANDX The connective ANDX has been introduced to allow searches to be made across sentences. The rule states that if data is required to occur in the same sentence, use AND; if it may appear in different sentences, use ANDX, eg. GIST (NUCLEAR) ANDX GIST (REACTOR) FNME (JOHN) AND SNME (SMITH) SNME (SM.) AND SNME (@ SON) In the first example, NUCLEAR may appear in one sentence and REACTOR may appear in another. In the second example JOHN and SMITH must appear in one sentence, and in the third example, the data must appear not only in the same sentence but also in the same field, i.e. a name is required that begins with SM and ends in SON. The ANDX feature is similar to the TYPBX feature used for searching across paragraphs. Counting There are now three count commands: OVCNT calling for overall counts on selected records; INCNT calling for counts within records and overall; and INCNTP for counts within paragraphs, within records, and overall. Given that a file contains census data in Town and Postal district order, the following three short statements would call for records of male workers in Devon and Somerset, aged 30 to 35; and within this class, counts, by district, by town, and overall, of engineers, builders, salaries over £2,000, and the number of people who are single. TYPA CNTY (DEV. OR SOM.) TYPB SEX(M) AND YOBTH(1935 TO 1940) INCNTP JOBD(ENGR.) (BUILD.) $2 SLRY() 2000) STAT(S) In the output, the required counts for districts and town would be set out at the side and slightly to the right of, each record. The overall totals would be shown at the end of the run. In a similar way the contents of fields may be added using the terms OVSUM, INSUM and INSUMP. The following statements would call for a list of salaries and pensions payable to single women typists, together with department and overall totals. Department totals would appear at the end of each record; overall totals plus average costs per department and per person would appear at the end of the run. TYPB STAT(S) AND SEX(F) AND JOBD (TYPIST) INSUM SALARY PENSION PNTA DEPT PNTB NAMEjSALARYjPENSION END Extract: Quantity searching Quantity searching The number of times that a set of conditions is required to occur in a paragraph or a record may be stipulated by the conventional terms n, (n,) n, NOT n, NOT (n and N0T)n written before relevant field expressions, with n being equal to any I-, 2-, or 3-digit number. The operation is called Quantity searching. The foilowing statements call for more than three doctors and not less than six women nurses, Since the field name SEX is followed by an AND connective, the effect of the NOT(6 term extends over the whole statement. Extract: Repeat-field searching Repeat-field searching Many files have the same type of field, say personal qualifications, repeated adjacently a number of times in the same sentence. It is now possible to ask for a search of each sub-field in this area, using only one field name. SPECOL recognises the repeat-field name and compiles appropriate instructions for searching at the specified intervals. For instance, if a person's language qualifications are represented in a 12 x 4 area by a string of 4-letter mnemonics, e.g. SPANITALFRENGERMRUSS the following single expression would call for a qualification in Spanish or German: QUAL (SPAN OR GERM) On output the same field name causes the subfields to be automatically spaced : SPAN ITAL FREN GERM RUSS Interrogation of packed fields and bits Routines have been written which permit interrogation of packed fields and bits using normal character digits as input in the question. On output the data may be produced either in its original packed form or in readable digits. Similar routines may be written for other forms of packing, e.g. octal, hexadecimal. These facilities however are appropriate only to data that has been packed according to System 360 conventions. To deal with data packed by other machines, it would be necessary to have different routines. This dependency on type of machine emphasises the need to represent data in a file, whenever possible, in the form in which a user envisages it, i.e. in character form. Only in this way may data be easily exchanged between different computers and easily printed on printers of different manufacture. Hints on writing SPECOL Specifying an exclusive OR The OR connective in SPECOL is inclusive, i.e. indicates 'either or', or 'both'. To obtain the exclusive OR, an expression must be followed by the appropriate negative. For example, if we require cases of malformations in children of cleft-palate or hare-lip but not both, we could write: MALF (CP OR HL) NOT MALF (CP) AND MALF (HL) The MALF field in this instance would be defined as a repeat field as described in the preceding section. Factorising common data When a request contains alternatives, it is usually good practice to factor out common data and to specify this first. For example, to specify single women or married men at BRISTOL who are under 23 we might write : TYPA UNIV (BRISTOL) AND YOBTH (> 1946) AND STAT(S) AND SEX(F) STAT(M) AND SEX(M) OR 9 This device avoids having to write out BRISTOL and 1946 more than once. Specifying figures or letters Since the character collating sequence of system 360 is letters followed by figures, it is possible to specify 'any figure' by the term, )Z and 'any letter' by the term, (0. For example, to retrieve, say, shoe codes that begin with any figure followed by any letter (such as 1A . . ., 3B . . ., or 4X . . ., etc.), we could write: SHOES () 2.) AND SHOES ((.O.) The dot before the 0 in the second expression indicates that at this point we have already dealt with the first character. Conditional output Sometimes even when a search is successful, it may still be required to output certain lines only when they contain specified data. Often, this can be done using a combination of Mode No. (2 or 4) and an OR search. If, for example, we require a list of scientific articles but only want to output a gist line if it contains the word, atomic, we may write: MODE 4 TYPB SUBJ (SCIENCE) AND STAG OR GIST (?ATOMIC) PNTA JOURNIDATE PNTB SUBJISTITLE AND GIST In Mode 4, sentences containing matched data only will be saved. In order to obtain sentences containing the word ATOMIC it is necessary to specify this in the request. The second TYPB statement (AND STAG, etc.) shows how this can be done without affecting the main search. In other words, if the subject SCIENCE is present, the field STAG, which is the S line identification field, must also be present. The statement will therefore be satisfied whether the word ATOMIC appears or not. Current implementation and remote access SPECOL is now available through IBM for most System 360 ( 0s and DOS) computers and through ICL for most System 4 computers. It is being written for ICL 1900 series computers and negotiations are also taking place with manufacturers about implementing it on other computers. To date most SPECOL questions have been put to the computer in punched card form, but the language has also been successfully demonstrated in a time-sharing mode with questions being entered from remote typewriters and graphic display units. In remote access working, the procedure adopted depends in the main on the size of the file being interrogated. On small files ?(say up to 200,000 lines of data) it is possible to receive fairly rapid replies to a SPECOL question and to display the results at a terminal; for larger files it would seem that the most likely future for remote access SPECOL is in the area known as remote job entry, where questions are entered remotely from a terminal and questions are immediately checked and compiled. If the question contains an error, a message to this effect, plus the offending statement, is sent back to the user and he can correct it there and then and re-submit it. He can also save his question in the computer for re-use or modi- fication later on. When the jobs have been run the user may display some or all of his results at his terminal or re-direct them to the printer or to some other device or user. This method is seen as a big advance over conventional batch processing, there being considerable time saving all round, perhaps the most noteworthy being the immediate correction of errors, the ability to share files, and the ability to re-direct results. The early hopes of SPECOL have now been realised and there is little doubt that fairly large scale interrogation of data is now possible by this means. SPECOL itself has been considerably enhanced by the facility of being able to ask several questions with one pass of the file, and by its use in remote access. It is in these two areas, coupled with the ever present requirement for faster turn-round of jobs, in which greatest interest is now expected to be shown, and where probably the most significant developments are likely to take place. in The Computer Journal 13(1) January 1970 view details The task The objective set in the autumn of 1968 was to create, in as short a time as possible, a file of about 6,500 senior civil servants together with an information retrieval system which would be easy to use and could provide information quickly to personnel managers in Whitehall to assist them in vacancy filling, manpower planning and statistical work. Whatever the system chosen, whether manual or computer, the method of producing results would have been similar, namely the matching of characteristics of posts to be filled against characteristics and job experience of the people on file. By clerical methods, matching can be laborious and error prone; by computer, using a retrieval program, the process of selection becomes more reliable. In this particular project, time and resources were in short supply and it was clear that an existing retrieval program would have to be found. After some investigation, SPECOL (Smith, 1968 and 1970a and b) was selected as being, for this application, the best and most advanced available software. Use of an existing program influenced to some extent the approach to the scheme and solutions to some of the problems encountered, for example, the type of record structure. Choice of SPECOL meant also that either an IBM 360 or ICL System 4 computer would have to be chosen and it also soon became apparent that the need for quick response meant remote access and in this case use of a teletype. It was decided that results from the teletype would have to be presented in such a way that they could be immediately understood by people who were not familiar with computers. This meant that all output would have to be de-coded and explanatory text words provided. In addition, because the main purpose was to provide information about people to assist in vacancy filling, ways of codifying job experience and qualifications would have to be found. Registering experience in turn meant that the people themselves would have to be involved in data capture, as only the individual himself can adequately describe his job experience. System requirements It was eventually decided that the principal requirements of the system should be: 1. Updating must be simple and error correction straightforward. 2. It must be capable of providing information quickly. 3. The retrieval language must be capable of use by nonprogrammers. 4. The method of communication with the computer during retrieval of data must be direct and uncomplicated. These main requirements were met, first by adopting a record amendment system in which the layout of each amendment form corresponds with the equivalent record entry and the whole or part of an entry may be amended as required; secondly, by a teletype to link the Civil Service Department in Whitehall with the computer; thirdly, the availability of SPECOL and, finally, by expending not a little effort to devise a simple, conversational remote access method of communicating with SPECOL which would produce output in a comprehensible form. Extract: SPECOL SPECOL As mentioned earlier it was decided to build the information retrieval system around an existing, proved, enquiry program and we were fortunate that SPECOL had been operating successfully for some time on IBM 360 and could be adapted to System 4. In SPECOL a series of comparisons are made between values specified in the search parameters of a question, and the values of data on records being interrogated. If a comparison is successful it means that the record satisfies the conditions laid down in the question. Even the most complex conditions can be expressed in terms of the three principal operators used in logic-AND, OR and NOT. In questions, these operators are used to link the 'names' which are allocated to the fields of a record. These 'names' are listed in the SPECOL program with their locations in terms of byte positions. Each SPECOL question is written in two parts; the first contains search parameters to identify records which satisfy the characteristics specified in the question and the second part stipulates what is to be printed out from each record. Information can be output in any form or position required. The SPECOL compiler is called in by the interrogation program. The question is compiled into the SPECOL program and, if valid, control is handed to this program which proceeds to search the file. In the CMSR version of SPECOL each coded field can be referred to by using up to three 'names'. For example GNOl, GAOl and GXOl each refers, for a different purpose, to the field 'Department'. GNOl (123) could appear as a search parameter of a SPECOL question and in effect means 'look for any record containing a Department with value 123'. GAOl, on the other hand, would be used in the print part of a question to output, in plain English, the Department contained in a 'hit' record. And finally, if GXOl were used in the print part of a question it would cause the actual word 'DEPARTMENT': to appear as an explanatory text. All coded fields on the CMSR are treated in this way. The terminal operator decides whether or not he wishes to use all three facilities in every question. Fig. 4 shows how the item 'Department', a three digit code in character form, is held on a CMSR record. Following Fig. 4 is a simple question to illustrate the use of the three names. Simple question Identify all people in either Fisheries Department or Factories Department (codes 123, 456) and print surname, date of birth and Department. Question written in SPECOL MODE 1 TYPA GN01 (123 OR 456) )search parameters PNTA GA05 ~ ~ 0 7data to be output from 'hit' AND GX01 GA0l records END (GA05 is surname, GN07 is date of birth) Specimen result JACKSON 12 02 20 I In these results the surname and date of birth DEPARTMENT: FISHERIES are data as held on the record. 'DEPARTMENT': is a SMITH 19 12 25 text word (from GXOI); 'FISHERIES' and DEPARTMENT: FACTORIES 'FACTORIES' are decoded versions of codes 123 and 456 (GA0I) In the above example the text 'DEPARTMENT:' is probably unnecessary as it is clear that 'FISHERIES' and 'FACTORIES' are Department names. In the case of other items, for example dates, text words are extremely useful. The other words in the question, namely 'MODE l', 'TYPA', 'PNTA' are SPECOL commands : 'MODE 1' indicates to the SPECOL compiler that output is required from 'header' fields only. Modes 2, 3 and 4 signify that output is required from combinations of trailer and header fields. 'TYPA' indicates to SPECOL compiler that the data on which search is to be made are in header part of record. (TYPB used for trailer fields.) 'PNTA specifies that fields to be printed out from hit AND' records are in header part of record. Extract: Batch SPECOL Batch SPECOL While the ability to interrogate file and obtain immediate answers over a terminal link is the main feature of the CMSR system there is, in addition, a standard batch method of interrogation again using SPECOL. This method is used in three situations; first, when the output is expected to be large and the teletype would clearly be too slow, secondly when the information is not required immediately, and thirdly, when results must be sorted. Operations When the system became operational in June 1970 the operating system would not permit simultaneous working of the Communications Control Program with that part of the operating system concerned with storing output on disc for subsequent printing. In practice this restriction has meant that CMSR can run only under an older version of the operating system and therefore only a limited number of other jobs can run simultaneously. For this reason CMSR terminal sessions take place at a fixed time daily to minimise the disruption caused by switching operating systems. This problem will be overcome with the next version of the operating system and the pattern may well be two or more shorter sessions during the day, as questions arise. The file occupies about half a tape and interrogation takes place within tape passing speed. At the start of a session the current main file is mounted and control is passed to the teletype operator. Each session is usually about 1 hour in length during which time the file is read and re-wound several times. Rewind time cannot at present be used for submitting the next question and consequently each question takes, on average, 5 minutes to answer. Actual times depend on whether all or part of the file was searched. Experience of CMSR The system has been operational for about 8 months. As expected, the updating, being dependent on people remembering to notify the centre when changes occur, is the least reliable part of the system, justifying the extensive use of check prints to encourage individuals and personnel officers to keep the records up to date. Nevertheless a reasonable standard of accuracy was attained when the file was created and we expect this to be maintained and improved upon. Practical experience of the information retrieval side of the system is encouraging. Potential users are being educated on ways in which they can use CMSR and people are'beginning to turn to it for information previously obtained from diverse and less accessible sources. During the first 6 months a considerable number of requests have been met for information and prints of individual records. These requests have come from personnel managers with a wide range of responsibilities-pay, training, statistics, manpower planning, etc.-in the Civil Service Department indicating that already the need for information held on CMSR is broadly based. It is worth stressing that although the computer is employed to produce lists of people who possess the required characteristics for vacancies, these computer selections are not final. This is so for two reasons; firstly because performance and ability are not recorded on the file and secondly because a system of this kind must permit human intervention before final judgements are made. Programs, hardware and telecommunication links have worked well although the short time the latter are used is probably not a fair test. Operator expertise both at computer and terminal have reached an acceptable level. in The Computer Journal 14(3) May 1971 view details in Computers & Automation 21(6B), 30 Aug 1972 view details The exact number of all the programming languages still in use, and those which are no longer used, is unknown. Zemanek calls the abundance of programming languages and their many dialects a "language Babel". When a new programming language is developed, only its name is known at first and it takes a while before publications about it appear. For some languages, the only relevant literature stays inside the individual companies; some are reported on in papers and magazines; and only a few, such as ALGOL, BASIC, COBOL, FORTRAN, and PL/1, become known to a wider public through various text- and handbooks. The situation surrounding the application of these languages in many computer centers is a similar one. There are differing opinions on the concept "programming languages". What is called a programming language by some may be termed a program, a processor, or a generator by others. Since there are no sharp borderlines in the field of programming languages, works were considered here which deal with machine languages, assemblers, autocoders, syntax and compilers, processors and generators, as well as with general higher programming languages. The bibliography contains some 2,700 titles of books, magazines and essays for around 300 programming languages. However, as shown by the "Overview of Existing Programming Languages", there are more than 300 such languages. The "Overview" lists a total of 676 programming languages, but this is certainly incomplete. One author ' has already announced the "next 700 programming languages"; it is to be hoped the many users may be spared such a great variety for reasons of compatibility. The graphic representations (illustrations 1 & 2) show the development and proportion of the most widely-used programming languages, as measured by the number of publications listed here and by the number of computer manufacturers and software firms who have implemented the language in question. The illustrations show FORTRAN to be in the lead at the present time. PL/1 is advancing rapidly, although PL/1 compilers are not yet seen very often outside of IBM. Some experts believe PL/1 will replace even the widely-used languages such as FORTRAN, COBOL, and ALGOL.4) If this does occur, it will surely take some time - as shown by the chronological diagram (illustration 2) . It would be desirable from the user's point of view to reduce this language confusion down to the most advantageous languages. Those languages still maintained should incorporate the special facets and advantages of the otherwise superfluous languages. Obviously such demands are not in the interests of computer production firms, especially when one considers that a FORTRAN program can be executed on nearly all third-generation computers. The titles in this bibliography are organized alphabetically according to programming language, and within a language chronologically and again alphabetically within a given year. Preceding the first programming language in the alphabet, literature is listed on several languages, as are general papers on programming languages and on the theory of formal languages (AAA). As far as possible, the most of titles are based on autopsy. However, the bibliographical description of sone titles will not satisfy bibliography-documentation demands, since they are based on inaccurate information in various sources. Translation titles whose original titles could not be found through bibliographical research were not included. ' In view of the fact that nany libraries do not have the quoted papers, all magazine essays should have been listed with the volume, the year, issue number and the complete number of pages (e.g. pp. 721-783), so that interlibrary loans could take place with fast reader service. Unfortunately, these data were not always found. It is hoped that this bibliography will help the electronic data processing expert, and those who wish to select the appropriate programming language from the many available, to find a way through the language Babel. We wish to offer special thanks to Mr. Klaus G. Saur and the staff of Verlag Dokumentation for their publishing work. Graz / Austria, May, 1973 in Computers & Automation 21(6B), 30 Aug 1972 view details |