ASCOP(ID:6791/asc003)

Atlas/NCC Stats Package 


for A Statistical COmputing Procedure

stats package developed by B. E. Cooper of the Atlas Computer Laboratory, Science Research Council
and
distributed by Statistics and Simulation Dept, The National Computing Centre UK


References:
  • Cooper, Brian "ASCOP - A Statistical Computing Procedure" Technical Report Atlas Computing Lab 1966 view details Abstract: STATISTICIANS have long dreamt of being able to perform, quickly and easily, any analysis of data that occurs to them whether this be an accepted analysis or a new analysis suggested by the data during the course of analysis. Until a few years ago statisticians have been restricted to those analyses that can be performed using the strictly limited resources of a desk calculating machine and an imaginative approach to data analysis has not been possible. The increase in computing power during the last few years should make possible a completely new approach to data analysis. So far this has not happened. The needs of the statistician are very varied and none of the programs written so far possess the considerable flexibility that is necessary to satisfy these needs. Within the last two years a number of statistical systems capable of performing a number of standard analyses have been written. These permit certain sequences of analyses to be performed without re-presentation of the data and they have the advantage that the data is prepared in the same way for all analyses. Although these are making the work of the data analyst easier and are encouraging the more thorough analysis of data, they are found wanting in ease of use, both to the statistician and the experimentalist who should also be encouraged to look more closely at his data, in the number and variety of analyses that may be performed, and in the ability to build up new analyses as sequences of instructions known to the system. For example a number of systems can perform a regression analysis but none of them can reference the coefficients of the fitted function in a later analysis. ASCOP is a large system which inits first version suffered the same faults as all the other systems although it was easier to use than most. The second version currently being debugged is a major revision and attempts to allow much greater flexibility to the user. External link: Online copy at Chilton Extract: An ASCOP Program
    An ASCOP Program
    An ASCOP program consists of a sequence of instructions which may be divided into two main types. The first of these are equations specifying arithmetic operations on variables, parameters (single values), and coefficients. New variables, parameters, and coefficients may be created and referred to in subsequent instructions of either type. Instructions of the second type are English-like sentences or phrases specifying particular analyses, or making declarations to the system. Instructions of this type may similarly define new variables, parameters, and coefficients which may be referred to later in instructions of either type. Both types of instruction may be labelled and branching statements making reference to these labels are allowed. This enables the user to specify that the performance of some analyses and arithmetic operations is conditional on the satisfaction of a particular criterion or criteria. A number of data editing operations are available including the amalgarnation of several sets of data, the selective inclusion of points in a new set of data, and the inclusion of certain parameters, defined in an analysis, as a point in a new set of data. It is also possible to define subroutines made up of ASCOP instructions and equations and to call these many times over. Their definition and call are very similar to those in FORTRAN. It will be possible when a disc becomes available on ATLAS to have a set of standard subroutines stored on the disc and hence available on call to ASCOP users. It will also be possible for users to add to the standard set or, of course, to their own private set. Instructions are available in ASCOP to allow the user to specify that certain sets of data including data derived during analysis be written onto a private output tape in a form that can be presented again to ASCOP at a later time.

    Extract: The Data Matrix
    The Data Matrix
    The basic organisational unit of data in ASCOP is the data matrix. The rows of a data matrix are referred to as POINTS and the columns as VARIABLES. Each variable may have more than one column in the data matrix and the number of columns for a variable is referred to as its replication. If variable A is replicated twice there will be two values of A in each POINT or in each row of the matrix. Thus a certain completeness in the data is implied, but in fact missing values are allowed for the incomplete situation. The fact that variables may be replicated introduces the possibility of references to point means, variances, standard deviations and numbers of replicates. Such reference is allowed in arithmetic operations and in analyses. Reference is allowed in arithmetic operations to a label associated with each point. The label may be read with the data or generated as the data is read.

    Data matrices are, most commonly, read from cards but they may also be generated from other data matrices using edit operations, or generated using the random variable generation functions available as parts of the arithmetic operations. Arithmetic operations may be used to define new variables in the reading stage or in the editing stage and the inclusion of points in the data matrix may be made conditional on the values of the variables involved. Thus matrices may be formed containing those points that show specified properties. Data to be analysed in several different arrangements need be presented to the system only once and the reorganisation achieved using edit operations.

  • Cooper, Brian "ASCOP - A Statistical Computing Procedure" J. Royal Stat. Soc. Series C (Applied Statistics) 16(2) 1967 pp100-110 view details Abstract: STATISTICIANS have long dreamt of being able to perform, quickly and easily, any analysis of data that occurs to them whether this be an accepted analysis or a new analysis suggested by the data during the course of analysis. Until a few years ago statisticians have been restricted to those analyses that can be performed using the strictly limited resources of a desk calculating machine and an imaginative approach to data analysis has not been possible. The increase in computing power during the last few years should make possible a completely new approach to data analysis. So far this has not happened. The needs of the statistician are very varied and none of the programs written so far possess the considerable flexibility that is necessary to satisfy these needs. Within the last two years a number of statistical systems capable of performing a number of standard analyses have been written. These permit certain sequences of analyses to be performed without re-presentation of the data and they have the advantage that the data is prepared in the same way for all analyses. Although these are making the work of the data analyst easier and are encouraging the more thorough analysis of data, they are found wanting in ease of use, both to the statistician and the experimentalist who should also be encouraged to look more closely at his data, in the number and variety of analyses that may be performed, and in the ability to build up new analyses as sequences of instructions known to the system. For example a number of systems can perform a regression analysis but none of them can reference the coefficients of the fitted function in a later analysis. ASCOP is a large system which inits first version suffered the same faults as all the other systems although it was easier to use than most. The second version currently being debugged is a major revision and attempts to allow much greater flexibility to the user.
  • Cooper, Brian "Basic subroutine for the input of numbers, words, and special characters" Atlas Lab, Chilton September 1967 view details External link: Online copy at Chilton Extract: Introduction
    The purpose of this paper is to describe a basic all-purpose format-free input subroutine and to show that such subroutines can be written with both efficiency and flexibility. A second purpose is to encourage the use of such subroutines to improve the often arbitrary presentation rules users normally must follow to communicate with a program. The subroutine reads one card at a time and assembles the information as a list of words, numbers, and special characters. The rules of assembly are defined in terms of two arrays of constants and may therefore be changed by program. The various decisions are taken quickly with reference to these arrays, and the programming is not machine dependent.

    A great number of programs written, particularly if they pretend any generality, offer a number of different options which the user may select. The rules the user is expected to follow to make his selection often frighten potential users away. For example: if Blogg's analysis is required punch PQR in columns 14, 27 and 38 of card 4. If extra output is required punch 9 in column 54 of card 3, otherwise punch 8. If a fifth card containing a title is to be presented punch ZEBRA in columns 11 to 15 of card 2. Rules of presentation are often more difficult to understand than the numerical method employed in the program, and much time, both users' and computers', is wasted because little thought is given to lightening the users' task. Some programs, of course, do allow flexibility in the way information is presented but I think that it is true to say that there is considerable room for improvement in this respect. Free-field format subroutines have been available for many years but the problem programmer has largely ignored them and has stuck to the standard FORTRAN format-bound instructions. This paper describes a particular subroutine of this kind which has been in use for some time as the basic input subroutine for a large statistical and data filing system called ASCOP (see Cooper, 1967) and argues that the particular organisation of this subroutine has a large number of advantages. Use of such a subroutine enables the programmer to relax the restrictions he might otherwise insist on in the presentation of information to the program. Instead of insisting that the required information be punched in a rigid format, with a coding scheme for the selection of the required options, he can allow the user freedom of preparing information as though he was typing instructions for a subordinate. The selection of Blogg's method can be made conditional on the appearance of the word BLOGGS somewhere in the specification and a title can be introduced by the word TITLE itself. The assumption of default settings for parameters the user is not concerned with can be made more easily, and the user introduces only that subset of the specification he is concerned with when presenting his particular problem. In fact the programmer may go so far as to allow alternative means of introducing the same information. Different users often use different names for the same mathematical technique, and it is not hard to allow the use of two or more different words to refer to the same analysis.

    Extract: General description
    General description
    The name and argument list of the subroutine to be described is as follows:

          SUBROUTINE CARD (FLA, IFX, IWS, NIT, IND)

    Subroutine CARD reads one card, performs a left to right scan and assembles the information as a list of numbers, words, and special characters in the array FLA. Numbers are in normal floating-point form and the words and special characters are stored in the appropriate FORTRAN alphanumeric form. Integer array IFX contains indicators enabling the type of the items to be identified and integer array IWS the columns on the card on which successive items terminate. The card is read initially with format (80Al) into an array in COMMON and the text of the card is therefore available to the calling routine as well. This array is left intact during the scan and it is therefore possible to re-assemble the information contained on the last card read. The number of items read is supplied in the integer scalar NIT, and IND takes one of a number of values according to the type of card read-for example, error free, with special characters, blank. Additional facilities include the reading of numbers to a base other than ten, and the continuation of information onto further cards by the use of a continuation character punched as the last item on each card to be continued. The continuation character is usually $.

    Extract: Organisation

    Organisation

    The subroutine uses two arrays of integers, known as the character
    integers and the decision integers, in the assembly process.
    The use of these two sets of integers makes the program itself machine
    independent and to a large extent card-code independent. Each possible
    character in the computer's vocabulary is allocated to one of ten groups,
    and associated with each character is an integer in the character integer
    array ICHAR. In the subroutine described here we assume there are 64
    different characters but this number can be easily changed if more
    characters are possible. The character with internal value I has
    ICHAR(I+1) associated with it, and the value of ICHAR(I+1) contains
    two pieces of information. The units and tens part is the number of the
    group to which the character is allocated, and the remaining part is 100
    times the value the character is to have when used in assembling items.

    For example, on Atlas the character 2 has internal code 18 so that
    if ICHAR(l9) is set to 201 the character 2 is allocated to character
    group 1 and the numerical value 2 is used in assembling this character
    as part of a number.

    The ten groups used in CARD are defined in Tab]e 1.

    The card is scanned from left to right and the current state of
    assembly plays a vital part in determining the processing of the next
    character in the scan. Seven assembly states are defined as follows:

    1. No item started.
    2. Word started.
    3. Number started but before the decimal point.
    4. Number started but after the decimal point.
    5. Exponent indicator (usually the character E) read after number.
    6. Exponent indicator passed after a number.
    7. Exponent started.

    The scan always begins in state 1 and if, for example, a letter is
    read the state moves to state 2. The state remains 2 until a character
    capable of changing the state is read, that is until a character which
    cannot form part of a word is read.
    The scan section of the program is divided into 21 parts and the path
    through these parts is decided according to the characters read and the
    values of the decision integers. The decision integers are stored in
    a 10 × 7 array JDIS. When a new character is encountered its group
    is quickly determined from the character integer array. This together with
    the assembly state determines which decision integer is appropriate.
    As is seen from Table 1 the value of this integer is the number of the
    part of the program to be obeyed.

    Table 1: The standard values of the decision integers

    CHARACTER GROUPS
    DigitsLettersExponent
    Indicator
    Decimal
    Point
    Plus
    Sign
    Minus
    Sign
    Special
    Characters
    SeparatorsContinuation
    Character
    Illegal
    Character
    Assembly
    States
    12345678910
    1321141118192021
    296699999921
    3710105101010101021
    48101021101010101021
    513121213131312131221
    6141717171117191721
    715161621161616161621

    Details of the 21 program parts are given below:

    PartState
    Change
    Action
    1Note the reading of a sign (Character type 5 or 6).
    22Start a word; treat a previously read sign
    as a special character; note in IW the present column on which the word
    begins.
    33Start a number before the decimal point; set AIN equal to the value of the digit.
    (The number will be built up in AIN.)
    44Start a number with a decimal point; set AIN equal to zero; set POW = 1.0.
    54Decimal point read whilst in the middle of a number; set POW = 1.O.
    6Continue word; do nothing.
    7Continue number before decimal point; multiply AIN by BASE and add the value of the
    current digit. (This operation should be protected by a test for overflow; this test
    is machine dependent.)
    8Continue number after decimal point; divide POW by BASE, multiply by value of
    current digit and add to AIN.
    91*Terminate word; word begins on column IW and ends on the current column;
    call subroutine
    WORD to pack the letters appropriately.
    101*Terminate number; store AIN taking account of any previously read sign.
    112 or 5Exponent indicator read; take as exponent and set state to 5 if following a number;
    take
    as beginning a word if following a sign or a word or if it is the first item and set
    state to 2.
    122Take exponent indicator as beginning a word.
    136*Exponent indicator passed.
    147Begin exponent; set lEX to current digit.
    15Continue exponent; multiply IEX by IFIXF(BASE) and add the value of the current digit.
    161*Terminate exponent and number. (This operation should be protected by a test for
    overflow based on the value of the exponent-this test is machine dependent.)
    171*Store exponent indicator as a one-character word.
    181Assemble special character; set IND to 2.
    19Ignore character; no action.
    20Stop a1l processing.
    21Illegal character read; output diagnostic; set IND to 4.

    Some program parts (marked *) cause the assembly state to be changed and the decision
    integers re-inspected with the new state. This ensures that characters encountered
    in certain positions can properly fulfil the two functions of terminating an item
    and beginning a new item. For example if we punch CAR4 in consecutive columns on a
    card the characters CAR will contribute to a word and the state will be 2 when the
    digit 4 is encountered. If the decision integers are set appropriately the action wi1l
    be to terminate the word and to reset the state to one.
    Re-inspection of the decision integers at this point ensures that a number
    is started with the value 4.

    The advantages of this fragmentation of the program are:

    1. The structure is clear and easily changed or augmented.
    2. Assembly rules can be changed by program.
    3. The coding is not machine dependent nor does it assume a particular
      internal character code.
    4. By use of an array of integers and the FORTRAN GO TO statement, decisions
      are taken quickly and the decisions are clearly defined.
    5. The subroutine can be easily tailored to particular requirements and
      unwanted facilities discarded.
    6. The same card may be re-processed.
    Extract: Changes in the rules
    Changes in the rules
    Many examples of changes in the assembly rules will have occurred to the reader already. However, a number of examples are described below to illustrate the flexibility of the subroutine. The flexibility is particularly apparent when we remember that such changes can be made during the execution of the calling program.

    Changing the decision integer for state 2, character type 1 from 9 to 6 causes a number encountered immediately after a word to be taken as part of the word. That is, the change would cause the sequence CAR4 to be taken as one word rather than as a word followed by a number. Similarly, a special character can be accepted as part of a word if it immediately follows the word by changing the decision integer for state 2 character type 7 from 9 to 6.

    The second example is particularly interesting. A misunderstanding of the use of continuation cards when presenting data cards to a program using CARD was responsible for the inclusion of the character $ at the end of each of a large number of records in card image form on magnetic tape. The simplest change causing these additional characters to be ignored was to reallocate the character $ to type 8 for the duration of the reading of the data. This change, in fact, necessitated recompilation, but instead access to the decision and character integers could easily be passed on to the user of a program. The resulting gain in program flexibility makes this we1l worth while.

    A final example concerns the use of comment cards presented with the data to a program. To achieve this the comment card must be recognised as such by CARD and the information on the card assembled in alphanumeric form. It was decided that the character * should be punched as the first item on the card to indicate a comment and this character was allocated to group 2. The calling program inspected FLA(1) for every card read by CARD and if this was * the card was printed and otherwise ignored. With these changes a card beginning with * followed by a space would be treated as a comment card and the comment printed.

    Extract: Efficiency
    Efficiency
    The reader's first impression of the efficiency of this approach might be to believe that a lot of work is performed to achieve a relatively simple result. It must be remembered that the work involved in taking the various decisions is quickly performed by a little integer arithmetic and the GO TO statement, and that this is performed instead of the work normally involved in interpreting a Format statement. The integer arrays take up a total of 134 locations but use of these involves a reduction in the size of the program itself which more than compensates. A possible source of inefficiency may be the way in which the particular FORTRAN compiler deals with the initial reading of the card in (80A1) form. Some compilers are more efficient than others in this respect and it might repay efforts to replace the single input statement with a machine-code subroutine which reads a card with this particular format. With this modification considerable improvements in reading speeds over the usual FORTRAN statements have been achieved.

  • Churchhouse, R F "A Computer for all Purposes" Quest Vol 1, No 3, July 1968 view details Extract: ASCOP
    statistical program package, ASCOP (B. E. Cooper)
    ASCOP is a comprehensive statistical system. It has good editing and checking facilities, and data presented for it can be stored on magnetic tape for later use. Instructions are in the form of English sentences or Fortran-like equations, and may be formed into subroutines. The system is being extended continually, for example, to incorporate tabulation and graph plotting facilities, making it useful in the survey analysis field. It is currently being implemented on a number of other computers. ASCOP is useful for anyone who wishes to perform statistical analyses on data.

  • Cooper, Brian E "The Continuing Development of a Statistical System" view details
          in R.C. Milton and J A. Nelder (Eds.) "Statistical Computation" Academic, New York, 1969 view details
  • Schucany, W. R.; Minton, Paul D.; Shannon, Stanley B. "A Survey of Statistical Packages" view details Extract: ASCOP
    ASCOP is a statistical and data management computing system developed and written by B. E. Cooper of the Atlas Computer Laboratory, Science Research Council, Chilton, Didcot, Berks., England. This is a compiler that glues the user the capability to perform a wide range of data-editing operations, in addition to many of the standard statistical analyses. Supplementary FORTRAN routines can be added to ASCOP quite easily.


          in [ACM] ACM Computing Surveys (CSUR) 4(2) June 1972 view details
  • Atlas Computer Laboratory (guide) 1973 view details External link: Online copy at Chilton Extract: ASCOP
    in Statistics: The ASCOP system, an integrated system for data management and statistical analysis.


          in [ACM] ACM Computing Surveys (CSUR) 4(2) June 1972 view details