Batched data analysis system 

David Armor, Department of Social Relations, Harvard University, Cambridge, Mass

System for data analysis specifically tailored to the social sciences

Related languages
DATA-TEXT => JANUS   Experience with Influence

  • Couch, A. S. The DATA-TEXT System, A Computer Language For Social Science Research, Designed For Numerical Analysis Of Data And Content Analysis Of Text: Brief Summary Of Operating System And Re-Programming Proposal view details Abstract: To make computer methods of research more readily available to social scientists, the Data-Text system was devised as a higher level control language, and a large, complex interpreter was written. The vocabulary of the Data-Text was modeled along the lines of Fortran, especially as regards arithmetic statements and input controls, but several new transformation and list processing operations were added for the kinds of data manipulation common in social science research. In addition, the Data-Text system also contains an integrated set of generalized programs for commonly required statistical analyses, such as: basic statistics, frequency distributions, cross-tabulation of contingency tables, correlations, factor analysis, factor rotation, graphical plotting, T-tests between different groups, and F-tests for analysis of variance. On the text analysis side, the Data-Text system can perform a variety of text processing procedures, including word input tagging, total word frequencies, user-defined concept counts, comparative frequency tabulations, concordance searches, and certain co-occurrence computations.

    The present version Of the Data-Text system exists as a FAP and Fortran II program for the IBM 7090,7094. The system normally operates under the standard FMS monitor. Recently, a new version has been completed which will operate under an Ibsys monitor and in Direct Couple environments.

    If financial support is available, the author proposes to make the Data- Text system available on third generation, time sharing computers, so that it can interact with users a t remote consoles. The four major goals of reprogramming the system are: computer independence, achieved by using Fortran IV or perhaps a standardized version of PL/I; interactive operation at remote consoles; increased execution efficiency, achieved by developing a compiler rather than an interpreter; and provisions for system growth.
          in XXIX SHARE Meeting August 1967 Miami, Florida view details
  • Sammet, Jean E. "Computer Languages - Principles and History" Englewood Cliffs, N.J. Prentice-Hall 1969. view details Extract: The DATA-TEXT System
    The DATA-TEXT System, developed in the Department of Social Relations at Harvard University, is a system to aid people who are doing social science research. It was implemented on the IBM 7090/94. The user is allowed to specify information about the data and invoke a number of specific routines to do calculations for him. In some cases, the raw data (assumed to be on punch cards) can be used directly; whereas in other cases it must be modified somewhat before being used as input to do statistical analyses. [...] In some cases it is necessary to transform the data. The user can also cause the reading of text material and request various kinds of content analyses [...].

          in XXIX SHARE Meeting August 1967 Miami, Florida view details
  • Stock, Karl F. "A listing of some programming languages and their users" in RZ-Informationen. Graz: Rechenzentrum Graz 1971 70 view details Abstract: 321 Programmiersprachen mit Angabe der Computer-Hersteller, auf deren Anlagen die entsprechenden Sprachen verwendet werden kennen. Register der 74 Computer-Firmen; Reihenfolge der Programmiersprachen nach der Anzahl der Herstellerfirmen, auf deren Anlagen die Sprache implementiert ist; Reihenfolge der Herstellerfirmen nach der Anzahl der verwendeten Programmiersprachen.

    [321 programming languages with indication of the computer manufacturers, on whose machinery the appropriate languages are used to know.  Register of the 74 computer companies;  Sequence of the programming languages after the number of manufacturing firms, on whose plants the language is implemented;  Sequence of the manufacturing firms after the number of used programming languages.]
          in XXIX SHARE Meeting August 1967 Miami, Florida view details
  • Armor, D. J. "The Data-Text System - An application language for the Social Sciences" view details
          in [AFIPS] Proceedings of the 1972 Spring Joint Computer Conference SJCC 40 view details
  • Armor, David J. "DATA-TEXT Primer", Free Press 1972. view details
          in [AFIPS] Proceedings of the 1972 Spring Joint Computer Conference SJCC 40 view details
  • Armor, David J. and Couch, Arthur S. "DATA-TEXT primer; an introduction to computerized social data analysis" New York, Free Press 1972 view details
          in [AFIPS] Proceedings of the 1972 Spring Joint Computer Conference SJCC 40 view details
  • Schucany, W. R.; Minton, Paul D.; Shannon, Stanley B. "A Survey of Statistical Packages" view details
          in [ACM] ACM Computing Surveys (CSUR) 4(2) June 1972 view details
  • Sammet, Jean E. "Roster of Programming Languages for 1973" p147 view details
          in ACM Computing Reviews 15(04) April 1974 view details
  • Stamen, Jeffrey P. and Robert M. Wallace "Janus: A data management and analysis system for the behavioral sciences" pp273-282 view details Extract: INTRODUCTION
    In the middle sixties there was a revolution in behavioral science computing brought about by the introduction of software systems, or 'packages', on second and third generation batch equipment (Most notably, BMD, SPSS, OSIRIS, DATA-TEXT). These systems offered the analyst a higher-level language designed specifically for the problems of behavioral science data handling and analysis, thus freeing him from the details of programming, data reformatting and using subroutine libraries.
    A short time later a number of data-management and analysis systems appeared on time-shared computers. (Most notably, ADMINS, DATANAL, TRACE, IMPRESS, TROLL) These systems seemed to hold further promise for the behavioral scientist wanting to analyze data. An analyst would now be able to interact with his data: to test hypotheses, explore for and formulate new hypotheses, test again and so on. In addition, because of immediate feedback on errors these interactive systems were expected to reduce the learning investment needed to be able to communicate with the computer. Unfortunately, to the broader behavioral science community, the promise of interactive systems is still just that a promise. A number of factors contributed to this situation, among which were: i) time-shared computers were not widely available; 2) the cost of using these interactive systems was high compared with the batch systems; 3) the interactive systems did not, in general, have the breadth of capabilities in both data handling and statistics as the batch systems; and, 4) analysis techniques that took advantage of the power of interactive computing were just beginning to be developed. Going into the middle seventies, we feel the situation is ripe for change. The Cambridge Project  is a joint effort by computer scientists, behavioral scientists, and statisticians from M.I.T. and Harvard to bring about the change.
    Janus is an attempt to provide a powerful interactive data handling and analysis tool for the behavioral scientist. Its design grew out of experience with two interactive systems, ADMINS Mark III and DATANAL, and one batch system, DATA-TEXT. In addition, Janus was influenced by systems and ideas from outside of the behavioral science tradition; for example, the relational data work of S. D. McIntosh and D. M. Griffel and that of E. F. Codd. Janus is one of the subsystems being developed for the Cambridge Project Consistent System (CS). The CS also contains other data analysis programs and subsystems, modeling programs, an urban-planning subsystem, an econometrics analysis subsystem and others.
          in [ACM] Proceedings of the 1972 Annual Conference of the ACM view details
  • Stock, Marylene and Stock, Karl F. "Bibliography of Programming Languages: Books, User Manuals and Articles from PLANKALKUL to PL/I" Verlag Dokumentation, Pullach/Munchen 1973 169 view details Abstract: PREFACE  AND  INTRODUCTION
    The exact number of all the programming languages still in use, and those which are no longer used, is unknown. Zemanek calls the abundance of programming languages and their many dialects a "language Babel". When a new programming language is developed, only its name is known at first and it takes a while before publications about it appear. For some languages, the only relevant literature stays inside the individual companies; some are reported on in papers and magazines; and only a few, such as ALGOL, BASIC, COBOL, FORTRAN, and PL/1, become known to a wider public through various text- and handbooks. The situation surrounding the application of these languages in many computer centers is a similar one.

    There are differing opinions on the concept "programming languages". What is called a programming language by some may be termed a program, a processor, or a generator by others. Since there are no sharp borderlines in the field of programming languages, works were considered here which deal with machine languages, assemblers, autocoders, syntax and compilers, processors and generators, as well as with general higher programming languages.

    The bibliography contains some 2,700 titles of books, magazines and essays for around 300 programming languages. However, as shown by the "Overview of Existing Programming Languages", there are more than 300 such languages. The "Overview" lists a total of 676 programming languages, but this is certainly incomplete. One author ' has already announced the "next 700 programming languages"; it is to be hoped the many users may be spared such a great variety for reasons of compatibility. The graphic representations (illustrations 1 & 2) show the development and proportion of the most widely-used programming languages, as measured by the number of publications listed here and by the number of computer manufacturers and software firms who have implemented the language in question. The illustrations show FORTRAN to be in the lead at the present time. PL/1 is advancing rapidly, although PL/1 compilers are not yet seen very often outside of IBM.

    Some experts believe PL/1 will replace even the widely-used languages such as FORTRAN, COBOL, and ALGOL.4) If this does occur, it will surely take some time - as shown by the chronological diagram (illustration 2) .

    It would be desirable from the user's point of view to reduce this language confusion down to the most advantageous languages. Those languages still maintained should incorporate the special facets and advantages of the otherwise superfluous languages. Obviously such demands are not in the interests of computer production firms, especially when one considers that a FORTRAN program can be executed on nearly all third-generation computers.

    The titles in this bibliography are organized alphabetically according to programming language, and within a language chronologically and again alphabetically within a given year. Preceding the first programming language in the alphabet, literature is listed on several languages, as are general papers on programming languages and on the theory of formal languages (AAA).
    As far as possible, the most of titles are based on autopsy. However, the bibliographical description of sone titles will not satisfy bibliography-documentation demands, since they are based on inaccurate information in various sources. Translation titles whose original titles could not be found through bibliographical research were not included. ' In view of the fact that nany libraries do not have the quoted papers, all magazine essays should have been listed with the volume, the year, issue number and the complete number of pages (e.g. pp. 721-783), so that interlibrary loans could take place with fast reader service. Unfortunately, these data were not always found.

    It is hoped that this bibliography will help the electronic data processing expert, and those who wish to select the appropriate programming language from the many available, to find a way through the language Babel.

    We wish to offer special thanks to Mr. Klaus G. Saur and the staff of Verlag Dokumentation for their publishing work.

    Graz / Austria, May, 1973
          in [ACM] Proceedings of the 1972 Annual Conference of the ACM view details
  • Slysz, William D. "An evaluation of statistical software in the social sciences" pp326-332 view details
          in [ACM] CACM 17(06) (June 1974) view details
  • Sammet, Jean E "Roster of programming languages for 1976-77" pp56-85 view details
          in SIGPLAN Notices 13(11) Nov 1978 view details