ASCOP(ID:6791/asc003)

Atlas/NCC Stats Package

Country: uk
- languages for uk
- uk/1967
Began: 1967
Type:Statistical
- Statistical
Sammet:SPC
- SPC
- SPC/1967
- SPC/uk

for A Statistical COmputing Procedure

stats package developed by B. E. Cooper of the Atlas Computer Laboratory, Science Research Council
and
distributed by Statistics and Simulation Dept, The National Computing Centre UK

References:

Cooper, Brian "ASCOP - A Statistical Computing Procedure" Technical Report Atlas Computing Lab 1966 view details Abstract: STATISTICIANS have long dreamt of being able to perform, quickly and easily, any analysis of data that occurs to them whether this be an accepted analysis or a new analysis suggested by the data during the course of analysis. Until a few years ago statisticians have been restricted to those analyses that can be performed using the strictly limited resources of a desk calculating machine and an imaginative approach to data analysis has not been possible. The increase in computing power during the last few years should make possible a completely new approach to data analysis. So far this has not happened. The needs of the statistician are very varied and none of the programs written so far possess the considerable flexibility that is necessary to satisfy these needs. Within the last two years a number of statistical systems capable of performing a number of standard analyses have been written. These permit certain sequences of analyses to be performed without re-presentation of the data and they have the advantage that the data is prepared in the same way for all analyses. Although these are making the work of the data analyst easier and are encouraging the more thorough analysis of data, they are found wanting in ease of use, both to the statistician and the experimentalist who should also be encouraged to look more closely at his data, in the number and variety of analyses that may be performed, and in the ability to build up new analyses as sequences of instructions known to the system. For example a number of systems can perform a regression analysis but none of them can reference the coefficients of the fitted function in a later analysis. ASCOP is a large system which inits first version suffered the same faults as all the other systems although it was easier to use than most. The second version currently being debugged is a major revision and attempts to allow much greater flexibility to the user. External link: Online copy at Chilton Extract: An ASCOP Program
An ASCOP Program
An ASCOP program consists of a sequence of instructions which may be divided into two main types. The first of these are equations specifying arithmetic operations on variables, parameters (single values), and coefficients. New variables, parameters, and coefficients may be created and referred to in subsequent instructions of either type. Instructions of the second type are English-like sentences or phrases specifying particular analyses, or making declarations to the system. Instructions of this type may similarly define new variables, parameters, and coefficients which may be referred to later in instructions of either type. Both types of instruction may be labelled and branching statements making reference to these labels are allowed. This enables the user to specify that the performance of some analyses and arithmetic operations is conditional on the satisfaction of a particular criterion or criteria. A number of data editing operations are available including the amalgarnation of several sets of data, the selective inclusion of points in a new set of data, and the inclusion of certain parameters, defined in an analysis, as a point in a new set of data. It is also possible to define subroutines made up of ASCOP instructions and equations and to call these many times over. Their definition and call are very similar to those in FORTRAN. It will be possible when a disc becomes available on ATLAS to have a set of standard subroutines stored on the disc and hence available on call to ASCOP users. It will also be possible for users to add to the standard set or, of course, to their own private set. Instructions are available in ASCOP to allow the user to specify that certain sets of data including data derived during analysis be written onto a private output tape in a form that can be presented again to ASCOP at a later time.

Extract: The Data Matrix
The Data Matrix
The basic organisational unit of data in ASCOP is the data matrix. The rows of a data matrix are referred to as POINTS and the columns as VARIABLES. Each variable may have more than one column in the data matrix and the number of columns for a variable is referred to as its replication. If variable A is replicated twice there will be two values of A in each POINT or in each row of the matrix. Thus a certain completeness in the data is implied, but in fact missing values are allowed for the incomplete situation. The fact that variables may be replicated introduces the possibility of references to point means, variances, standard deviations and numbers of replicates. Such reference is allowed in arithmetic operations and in analyses. Reference is allowed in arithmetic operations to a label associated with each point. The label may be read with the data or generated as the data is read.

Data matrices are, most commonly, read from cards but they may also be generated from other data matrices using edit operations, or generated using the random variable generation functions available as parts of the arithmetic operations. Arithmetic operations may be used to define new variables in the reading stage or in the editing stage and the inclusion of points in the data matrix may be made conditional on the values of the variables involved. Thus matrices may be formed containing those points that show specified properties. Data to be analysed in several different arrangements need be presented to the system only once and the reorganisation achieved using edit operations.

Cooper, Brian "ASCOP - A Statistical Computing Procedure" J. Royal Stat. Soc. Series C (Applied Statistics) 16(2) 1967 pp100-110 view details Abstract: STATISTICIANS have long dreamt of being able to perform, quickly and easily, any analysis of data that occurs to them whether this be an accepted analysis or a new analysis suggested by the data during the course of analysis. Until a few years ago statisticians have been restricted to those analyses that can be performed using the strictly limited resources of a desk calculating machine and an imaginative approach to data analysis has not been possible. The increase in computing power during the last few years should make possible a completely new approach to data analysis. So far this has not happened. The needs of the statistician are very varied and none of the programs written so far possess the considerable flexibility that is necessary to satisfy these needs. Within the last two years a number of statistical systems capable of performing a number of standard analyses have been written. These permit certain sequences of analyses to be performed without re-presentation of the data and they have the advantage that the data is prepared in the same way for all analyses. Although these are making the work of the data analyst easier and are encouraging the more thorough analysis of data, they are found wanting in ease of use, both to the statistician and the experimentalist who should also be encouraged to look more closely at his data, in the number and variety of analyses that may be performed, and in the ability to build up new analyses as sequences of instructions known to the system. For example a number of systems can perform a regression analysis but none of them can reference the coefficients of the fitted function in a later analysis. ASCOP is a large system which inits first version suffered the same faults as all the other systems although it was easier to use than most. The second version currently being debugged is a major revision and attempts to allow much greater flexibility to the user.

Cooper, Brian "Basic subroutine for the input of numbers, words, and special characters" Atlas Lab, Chilton September 1967 view details External link: Online copy at Chilton Extract: Introduction
The purpose of this paper is to describe a basic all-purpose format-free input subroutine and to show that such subroutines can be written with both efficiency and flexibility. A second purpose is to encourage the use of such subroutines to improve the often arbitrary presentation rules users normally must follow to communicate with a program. The subroutine reads one card at a time and assembles the information as a list of words, numbers, and special characters. The rules of assembly are defined in terms of two arrays of constants and may therefore be changed by program. The various decisions are taken quickly with reference to these arrays, and the programming is not machine dependent.

A great number of programs written, particularly if they pretend any generality, offer a number of different options which the user may select. The rules the user is expected to follow to make his selection often frighten potential users away. For example: if Blogg's analysis is required punch PQR in columns 14, 27 and 38 of card 4. If extra output is required punch 9 in column 54 of card 3, otherwise punch 8. If a fifth card containing a title is to be presented punch ZEBRA in columns 11 to 15 of card 2. Rules of presentation are often more difficult to understand than the numerical method employed in the program, and much time, both users' and computers', is wasted because little thought is given to lightening the users' task. Some programs, of course, do allow flexibility in the way information is presented but I think that it is true to say that there is considerable room for improvement in this respect. Free-field format subroutines have been available for many years but the problem programmer has largely ignored them and has stuck to the standard FORTRAN format-bound instructions. This paper describes a particular subroutine of this kind which has been in use for some time as the basic input subroutine for a large statistical and data filing system called ASCOP (see Cooper, 1967) and argues that the particular organisation of this subroutine has a large number of advantages. Use of such a subroutine enables the programmer to relax the restrictions he might otherwise insist on in the presentation of information to the program. Instead of insisting that the required information be punched in a rigid format, with a coding scheme for the selection of the required options, he can allow the user freedom of preparing information as though he was typing instructions for a subordinate. The selection of Blogg's method can be made conditional on the appearance of the word BLOGGS somewhere in the specification and a title can be introduced by the word TITLE itself. The assumption of default settings for parameters the user is not concerned with can be made more easily, and the user introduces only that subset of the specification he is concerned with when presenting his particular problem. In fact the programmer may go so far as to allow alternative means of introducing the same information. Different users often use different names for the same mathematical technique, and it is not hard to allow the use of two or more different words to refer to the same analysis.

Extract: General description
General description
The name and argument list of the subroutine to be described is as follows:

SUBROUTINE CARD (FLA, IFX, IWS, NIT, IND)

Subroutine CARD reads one card, performs a left to right scan and assembles the information as a list of numbers, words, and special characters in the array FLA. Numbers are in normal floating-point form and the words and special characters are stored in the appropriate FORTRAN alphanumeric form. Integer array IFX contains indicators enabling the type of the items to be identified and integer array IWS the columns on the card on which successive items terminate. The card is read initially with format (80Al) into an array in COMMON and the text of the card is therefore available to the calling routine as well. This array is left intact during the scan and it is therefore possible to re-assemble the information contained on the last card read. The number of items read is supplied in the integer scalar NIT, and IND takes one of a number of values according to the type of card read-for example, error free, with special characters, blank. Additional facilities include the reading of numbers to a base other than ten, and the continuation of information onto further cards by the use of a continuation character punched as the last item on each card to be continued. The continuation character is usually $.

Extract: Organisation

Organisation

The subroutine uses two arrays of integers, known as the character
integers and the decision integers, in the assembly process.
The use of these two sets of integers makes the program itself machine
independent and to a large extent card-code independent. Each possible
character in the computer's vocabulary is allocated to one of ten groups,
and associated with each character is an integer in the character integer
array ICHAR. In the subroutine described here we assume there are 64
different characters but this number can be easily changed if more
characters are possible. The character with internal value I has
ICHAR(I+1) associated with it, and the value of ICHAR(I+1) contains
two pieces of information. The units and tens part is the number of the
group to which the character is allocated, and the remaining part is 100
times the value the character is to have when used in assembling items.

For example, on Atlas the character 2 has internal code 18 so that
if ICHAR(l9) is set to 201 the character 2 is allocated to character
group 1 and the numerical value 2 is used in assembling this character
as part of a number.

The ten groups used in CARD are defined in Tab]e 1.

The card is scanned from left to right and the current state of
assembly plays a vital part in determining the processing of the next
character in the scan. Seven assembly states are defined as follows:

No item started.
Word started.
Number started but before the decimal point.
Number started but after the decimal point.
Exponent indicator (usually the character E) read after number.
Exponent indicator passed after a number.
Exponent started.

The scan always begins in state 1 and if, for example, a letter is
read the state moves to state 2. The state remains 2 until a character
capable of changing the state is read, that is until a character which
cannot form part of a word is read.
The scan section of the program is divided into 21 parts and the path
through these parts is decided according to the characters read and the
values of the decision integers. The decision integers are stored in
a 10 × 7 array JDIS. When a new character is encountered its group
is quickly determined from the character integer array. This together with
the assembly state determines which decision integer is appropriate.
As is seen from Table 1 the value of this integer is the number of the
part of the program to be obeyed.

Table 1: The standard values of the decision integers

CHARACTER GROUPS
	Digits	Letters	Exponent Indicator	Decimal Point	Plus Sign	Minus Sign	Special Characters	Separators	Continuation Character	Illegal Character
Assembly States	1	2	3	4	5	6	7	8	9	10
1	3	2	11	4	1	1	18	19	20	21
2	9	6	6	9	9	9	9	9	9	21
3	7	10	10	5	10	10	10	10	10	21
4	8	10	10	21	10	10	10	10	10	21
5	13	12	12	13	13	13	12	13	12	21
6	14	17	17	17	1	1	17	19	17	21
7	15	16	16	21	16	16	16	16	16	21

Details of the 21 program parts are given below:

Part	State Change	Action
1		Note the reading of a sign (Character type 5 or 6).
2	2	Start a word; treat a previously read sign as a special character; note in IW the present column on which the word begins.
3	3	Start a number before the decimal point; set AIN equal to the value of the digit. (The number will be built up in AIN.)
4	4	Start a number with a decimal point; set AIN equal to zero; set POW = 1.0.
5	4	Decimal point read whilst in the middle of a number; set POW = 1.O.
6		Continue word; do nothing.
7		Continue number before decimal point; multiply AIN by BASE and add the value of the current digit. (This operation should be protected by a test for overflow; this test is machine dependent.)
8		Continue number after decimal point; divide POW by BASE, multiply by value of current digit and add to AIN.
9	1^*	Terminate word; word begins on column IW and ends on the current column; call subroutine WORD to pack the letters appropriately.
10	1^*	Terminate number; store AIN taking account of any previously read sign.
11	2 or 5	Exponent indicator read; take as exponent and set state to 5 if following a number; take as beginning a word if following a sign or a word or if it is the first item and set state to 2.
12	2	Take exponent indicator as beginning a word.
13	6^*	Exponent indicator passed.
14	7	Begin exponent; set lEX to current digit.
15		Continue exponent; multiply IEX by IFIXF(BASE) and add the value of the current digit.
16	1^*	Terminate exponent and number. (This operation should be protected by a test for overflow based on the value of the exponent-this test is machine dependent.)
17	1^*	Store exponent indicator as a one-character word.
18	1	Assemble special character; set IND to 2.
19		Ignore character; no action.
20		Stop a1l processing.
21		Illegal character read; output diagnostic; set IND to 4.

Some program parts (marked *) cause the assembly state to be changed and the decision
integers re-inspected with the new state. This ensures that characters encountered
in certain positions can properly fulfil the two functions of terminating an item
and beginning a new item. For example if we punch CAR4 in consecutive columns on a
card the characters CAR will contribute to a word and the state will be 2 when the
digit 4 is encountered. If the decision integers are set appropriately the action wi1l
be to terminate the word and to reset the state to one.
Re-inspection of the decision integers at this point ensures that a number
is started with the value 4.

The advantages of this fragmentation of the program are:

The structure is clear and easily changed or augmented.
Assembly rules can be changed by program.
The coding is not machine dependent nor does it assume a particular
internal character code.
By use of an array of integers and the FORTRAN GO TO statement, decisions
are taken quickly and the decisions are clearly defined.
The subroutine can be easily tailored to particular requirements and
unwanted facilities discarded.
The same card may be re-processed.

Extract: Changes in the rules
Changes in the rules
Many examples of changes in the assembly rules will have occurred to the reader already. However, a number of examples are described below to illustrate the flexibility of the subroutine. The flexibility is particularly apparent when we remember that such changes can be made during the execution of the calling program.

Changing the decision integer for state 2, character type 1 from 9 to 6 causes a number encountered immediately after a word to be taken as part of the word. That is, the change would cause the sequence CAR4 to be taken as one word rather than as a word followed by a number. Similarly, a special character can be accepted as part of a word if it immediately follows the word by changing the decision integer for state 2 character type 7 from 9 to 6.

The second example is particularly interesting. A misunderstanding of the use of continuation cards when presenting data cards to a program using CARD was responsible for the inclusion of the character $ at the end of each of a large number of records in card image form on magnetic tape. The simplest change causing these additional characters to be ignored was to reallocate the character $ to type 8 for the duration of the reading of the data. This change, in fact, necessitated recompilation, but instead access to the decision and character integers could easily be passed on to the user of a program. The resulting gain in program flexibility makes this we1l worth while.

A final example concerns the use of comment cards presented with the data to a program. To achieve this the comment card must be recognised as such by CARD and the information on the card assembled in alphanumeric form. It was decided that the character * should be punched as the first item on the card to indicate a comment and this character was allocated to group 2. The calling program inspected FLA(1) for every card read by CARD and if this was * the card was printed and otherwise ignored. With these changes a card beginning with * followed by a space would be treated as a comment card and the comment printed.

Extract: Efficiency
Efficiency
The reader's first impression of the efficiency of this approach might be to believe that a lot of work is performed to achieve a relatively simple result. It must be remembered that the work involved in taking the various decisions is quickly performed by a little integer arithmetic and the GO TO statement, and that this is performed instead of the work normally involved in interpreting a Format statement. The integer arrays take up a total of 134 locations but use of these involves a reduction in the size of the program itself which more than compensates. A possible source of inefficiency may be the way in which the particular FORTRAN compiler deals with the initial reading of the card in (80A1) form. Some compilers are more efficient than others in this respect and it might repay efforts to replace the single input statement with a machine-code subroutine which reads a card with this particular format. With this modification considerable improvements in reading speeds over the usual FORTRAN statements have been achieved.

Churchhouse, R F "A Computer for all Purposes" Quest Vol 1, No 3, July 1968 view details Extract: ASCOP
statistical program package, ASCOP (B. E. Cooper)
ASCOP is a comprehensive statistical system. It has good editing and checking facilities, and data presented for it can be stored on magnetic tape for later use. Instructions are in the form of English sentences or Fortran-like equations, and may be formed into subroutines. The system is being extended continually, for example, to incorporate tabulation and graph plotting facilities, making it useful in the survey analysis field. It is currently being implemented on a number of other computers. ASCOP is useful for anyone who wishes to perform statistical analyses on data.

Cooper, Brian E "The Continuing Development of a Statistical System" view details
in R.C. Milton and J A. Nelder (Eds.) "Statistical Computation" Academic, New York, 1969 view details

Schucany, W. R.; Minton, Paul D.; Shannon, Stanley B. "A Survey of Statistical Packages" view details Extract: ASCOP
ASCOP is a statistical and data management computing system developed and written by B. E. Cooper of the Atlas Computer Laboratory, Science Research Council, Chilton, Didcot, Berks., England. This is a compiler that glues the user the capability to perform a wide range of data-editing operations, in addition to many of the standard statistical analyses. Supplementary FORTRAN routines can be added to ASCOP quite easily.

in [ACM] ACM Computing Surveys (CSUR) 4(2) June 1972 view details

Atlas Computer Laboratory (guide) 1973 view details External link: Online copy at Chilton Extract: ASCOP
in Statistics: The ASCOP system, an integrated system for data management and statistical analysis.

in [ACM] ACM Computing Surveys (CSUR) 4(2) June 1972 view details