Statistical Programming Language
Statistical progamming langunage
In order to carry out statistical analyses of large amounts of data it is first necessary to summarize the data under suitable headings. It is also essential to ensurc the validity and correctness of the data collected.
The latter is achieved by "editing" the raw data, that is by scanning it to detect onlissions, mistakes and inconsistencies, before submitting it to a process of analysis. The analysis of data is carried out by the use of a classification scheme, which depends both upon the nature of the data collected, and upon the use to which
the analysis is to be put. Thus in setting up a questionnaire
we may or may not include the question "What is your age?" If we do not include the question. The classification of informants by age cannot be attempted on the basis of answers to that questionnaire. However, the fact that this question is asked and answered does
not imply that we actually require to use the answer in the form it is given. We may. in fact, only need to consult the answer in order to place the informant in a particular age grouping such as. for exanipie, middleage or teenage. In spite of the fact that it is not initially planned to use the full information as recorded, it may
be convenient to collect the additional information in order to enable additional or amended classifications to be introduced later without necessitating the collection of further data.
Analysis normally comprises two processes, firstly the selection of data according to the classification scheme, and secondly, counting the number of items selected, possibly with varying weights attached to each item.
This selection and counting is often now carried out by the use of punched-card machines, called "countersorters," which have limited but useful facilities for sorting through required information and counting the number of items possessing a selected property.
When the desired counting has been completed, the results are normally used to prepare a suitable table of values for inspection. The figures so far obtained can also be used as the basis for statistical calculation, provided that a suitable number of sums have been made, not only of items and weighted items, but also of certain derivatives of these such as the squares of the weights.
The whole process can thus be considered as comprising three or four separate stages, namely
2. Selection and counting
4. Statistical analysis.
All of these functions can be carried out with an electronic computer. However, it is desirable that the specification of each stage should be as simple as possible for a user'such as a Market Research Officer. Rather than attempt to train these users in computer programming it is obviously desirable to provide them with
a suitable language in which to address the machinery to specify the required manipulations. Autostrrt is such a language. Extract: The Language Processor
The Language Processor
Introduction of a controlling language necessitates adding a fifth process to those described above. This comprises a language processing operation, the function of which is to translate the instructions of the user. given in Autostat language, into suitable parameters specifying the classification of the data. how it is to be
stored in the computer, what editing procedures are to be applied, what counts are to be carried out, what statistical analyses are to be employed, and in what form thc results are to be printed. This we shall designate Stage 0, Translation.
The details of the computer techniques used will be given in a later paper. Here we describe the language to be used to address the computer, and give a simple example of its application. This is not intended as a complete manual for its use, but outlincs its main
in The Computer Journal 3(2) July 1960 view details
In the first of the four papers in this session. Dr. Douglas described the work he and A. J. Mitchell have done in developing AUTOSTAT, a language in which it is possible for a user, such as a Market Research Officer, to set up operations for statistical data processing in connection with market research and similar surveys.
The data to be used as input is in some arbitrary layout on a questionnaire arranged for the convenience of an interviewer filling in the answers to a set of questions. These questions are each identified by a number. For the purposes of input one is not concerned directly with the information content of the answers, but only with providing the computer with an adequate specification of the data which will be presented to it, and with the identification of unacceptable answers.
An arbitrary label, called a "Q-label," which identifies the answer concerned, is allotted to each item to be read in from the form. Specification of the method of preparation to be used is given to the machine in a series of statements about the Q-labels.
To specify tabulations of the data, it is desirable both to relate the Q-labels to the information content of the answer recorded and also, sometimes, to regroup the data. To facilitate this, provision is made to relabel data by the use of a combination of letters. It is assumed that tabulations are to be presented on a printed page. In order to present tables involving complicated sub-divisions of the data, it is sometimes desirable to group the rows or columns and repeat a particular sequence within the elements of another group, and one can define such a table in AUTOSTAT language.
The combination of tabulation statements with those necessary for regrouping provides the information essential to the machine for selection, from among the questionnaires, of those which, for a particular tabulation, are to be counted. One requires also to specify the weight which is to be given, in counting, to each questionnaire or group of questionnaires.
It is essential to be able to specify the group to which a weight will apply and the tabulation for which the weighted count is- needed. Furthermore, the actual weight may be stated either as a number directly specified, as the answer to a question on the questionnaire, as a derivative of two or more answers, or as a number fixed by reference to an unweighted count of those questionnaires belonging, normally (but not always), to the group being weighted. Provision must thus be made for any of these methods of weighting or any combination of them. This is achieved in AUTOSTAT by a series of weighting statements.
The full paper by Dr. Douglas and Mr. Mitchell was published in The Computer Journal (Vol. 3, No. 2, July 1960).
in The Computer Journal 3(2) July 1960 view details
The OPAL program is designed to facilitate the handling of large-scale market-research surveys on an electronic digital computer, the IBM 7090. To do this it is necessary to check the data and note any errors, count the data in their various classes, apply weighting if required, and present the results as tables.
The whole system is run within the FORTRAN/FAP monitor scheme for the 7090, but embodies a compiler of its own, which is addressed in a special language AUTOSTAT).
in Popplewell, Cicely M. (Ed.) Information Processing 62, Proceedings of the 2nd IFIP Congress, Munich, Aug. 1962. North Holland Publ. Co., 1963. view details
in Symbolic Languages in Data Processing, in the Proceedings of the Symposium organized and edited by the International Computation Centre, Rome, Italy, March 2631, 1962, Gordon and Beech Science Publishers, 1962. view details
in Computers & Automation 21(6B), 30 Aug 1972 view details
in [ACM] ACM Computing Surveys (CSUR) 4(2) June 1972 view details
The exact number of all the programming languages still in use, and those which are no longer used, is unknown. Zemanek calls the abundance of programming languages and their many dialects a "language Babel". When a new programming language is developed, only its name is known at first and it takes a while before publications about it appear. For some languages, the only relevant literature stays inside the individual companies; some are reported on in papers and magazines; and only a few, such as ALGOL, BASIC, COBOL, FORTRAN, and PL/1, become known to a wider public through various text- and handbooks. The situation surrounding the application of these languages in many computer centers is a similar one.
There are differing opinions on the concept "programming languages". What is called a programming language by some may be termed a program, a processor, or a generator by others. Since there are no sharp borderlines in the field of programming languages, works were considered here which deal with machine languages, assemblers, autocoders, syntax and compilers, processors and generators, as well as with general higher programming languages.
The bibliography contains some 2,700 titles of books, magazines and essays for around 300 programming languages. However, as shown by the "Overview of Existing Programming Languages", there are more than 300 such languages. The "Overview" lists a total of 676 programming languages, but this is certainly incomplete. One author ' has already announced the "next 700 programming languages"; it is to be hoped the many users may be spared such a great variety for reasons of compatibility. The graphic representations (illustrations 1 & 2) show the development and proportion of the most widely-used programming languages, as measured by the number of publications listed here and by the number of computer manufacturers and software firms who have implemented the language in question. The illustrations show FORTRAN to be in the lead at the present time. PL/1 is advancing rapidly, although PL/1 compilers are not yet seen very often outside of IBM.
Some experts believe PL/1 will replace even the widely-used languages such as FORTRAN, COBOL, and ALGOL.4) If this does occur, it will surely take some time - as shown by the chronological diagram (illustration 2) .
It would be desirable from the user's point of view to reduce this language confusion down to the most advantageous languages. Those languages still maintained should incorporate the special facets and advantages of the otherwise superfluous languages. Obviously such demands are not in the interests of computer production firms, especially when one considers that a FORTRAN program can be executed on nearly all third-generation computers.
The titles in this bibliography are organized alphabetically according to programming language, and within a language chronologically and again alphabetically within a given year. Preceding the first programming language in the alphabet, literature is listed on several languages, as are general papers on programming languages and on the theory of formal languages (AAA).
As far as possible, the most of titles are based on autopsy. However, the bibliographical description of sone titles will not satisfy bibliography-documentation demands, since they are based on inaccurate information in various sources. Translation titles whose original titles could not be found through bibliographical research were not included. ' In view of the fact that nany libraries do not have the quoted papers, all magazine essays should have been listed with the volume, the year, issue number and the complete number of pages (e.g. pp. 721-783), so that interlibrary loans could take place with fast reader service. Unfortunately, these data were not always found.
It is hoped that this bibliography will help the electronic data processing expert, and those who wish to select the appropriate programming language from the many available, to find a way through the language Babel.
We wish to offer special thanks to Mr. Klaus G. Saur and the staff of Verlag Dokumentation for their publishing work.
Graz / Austria, May, 1973
in [ACM] ACM Computing Surveys (CSUR) 4(2) June 1972 view details