PROTEXT(ID:7735/)





References:
  • Rudall, B.H. "A command language for text processing" in The computer in literary and linguistic research, R. A. Wisbey (Ed.), Cambridge University Press, New York, 1971, pp281-288 view details
  • Wachal, R. S. review of Rudall 1971 view details Abstract: This paper describes some of the design features of a programming language, PROTEXT, "for the limited manipulation of a textual corpus." The language is designed with the needs of a literary data processing user in mind. The intended chief consumer is the naive user, although the author sees the specialist using PROTEXT as "an environmental language which will allow him to organize and prepare texts, and to carry out routine processing before his more detailed algorithm in a higherlevel or list-processing language can be executed."

    Facilities are provided for storage (temporary and archive) and retrieval of texts and subtexts and for various kinds of searches and counts. There does not appear to be any kind of updating or editing facilities, so presumably one must begin with an accurate text. The results of earching and counting are stored in a results file that can also be manipulated as a text, but one laments the apparent lack of any kind of sort command.

    Two interesting features of the language deserve special attention. First, the user is allowed to define the text symbols used in such a way that too or more successive symbols can be represented by a unique bit pattern -- a very useful feature in linguistic and literary data processing. Second, the user has some control over the levels within the text (word, line, paragraph) and the choice of delimiters used to mark them. Thus, an appropriate subtext can be retrieved using ordinal numbers applied to the desired level (character, word, line, etc.). This permits very flexible searching, since one can input the patterns, appropriately delimited, as if they were text, and search any extent and level of the real text on the basis of any extent and level of the pattern text. Furthermore, any pattern can contain an asterisk, which accepts any value in the entire user-defined symbol set.

    The language is difficult to evaluate in that the article is brief, condensed, and selective. Some features are carefully defined (BNF), and some examples are given. (Fuller documentation seems to be available.) Various designs and implementations are being worked on to provide batch and interactive versions for the ICE 803 and 4130, initially as preprocessors in ALGOL and later in machine language.

    As far as one can tell, PROTEXT appears to be an English like language that is flexible, powerful, and useful. Un fortunately, the history of programming languages has shown us that the road between language design and widespread use is not only long but is also characterized by a very high accident rate. In this case, we might wish that were not so.
          in ACM Computing Reviews 14(04) April 1973 view details