DASH(ID:7210/das005)String patters in ALGOL 60String manipulation extensions to Algol 60 Designed to incorporate both Snobol string pattern mechanisms and the Wirth/Hoare record extensions People: Related languages
References: Introduction DASH is designed to allow a user to handle strings and perform efficient arithmetic in the same language. There are a number of languages designed to handle strings (Farber, Griswold and Polonsky, 1964 and 1966 ; Guzman and McIntyre 1966), but their arithmetic facilities are slight. On the other hand, on many computers the only "general purpose" languages at present implemented (e.g. ALGOL 60, FORTRAN) are almost purely arithmetic. This hampers both users and those whose job is computer science education. The procedures to be described aim to provide string processing of the SNOBOL type (Farber et al., 1966) to the extent that is reasonably possible within ALGOL. The description is informal, and some details are omitted for brevity. The reader will benefit from a knowledge of SNOBOL, though it is hoped an understanding can be gained without this knowledge. The superimposition of procedures of this type on ALGOL will be rendered easier and more natural with a facility such as the proposed record handling of Wirth and Hoare (1966). in The Computer Journal 10(3) 1967 view details In general, string processing systems deal with data which is in the form of unstructured strings of characters. COMIT (Yngve, 1962), SNOBOL-3 (Farber, Griswold and Polansky, 1966) and SNOBOL4 (Griswold, Poage and Polansky, 1968) are three well-known string processing languages. Typical of the types of operation possible in these languages are matching, insertion, replacement and concatenation of strings and substrings. With the increasing usage of computers in many different fields, the distinction between numeric and non-numeric applications is becoming less apparent, as for example in information retrieval problems. Consequently it seems desirable that a single programming system should incorporate efficient numeric and non-numeric capabilities. The SP/1 system described here has been designed and implemented as a string processing system embedded in FORTRAN-IV. To avoid adding to the diversity of programming systems already in existence and since SNOBOL is a wellknown language whose syntax is readily adaptable to a FORTRAN environment, the operations provided in SP/I are similar to those available in SNOBOL-3. Unlike, for example DASH (Milner, 1967), which is a string processing extension embedded in ALGOL, SP/1 is both a syntactic and semantic extension to FORTRAN. The string processing statements can be represented by a set of macros which are expanded into FORTRAN statements by a macro generator (Macleod and Pengelly, 1969) prior to compilation. The macros have been designed so that there is a close similarity between the syntax of the corresponding SP/1 and SNOBOL3 statements. For example, the SNOBOL-3 statement REPEAT E "(" *V* ")" = V /S (REPEAT) deletes all the pairs of left and right parentheses from a string E. The corresponding SP/I statement is In addition, SP/1 provides a data type known as an association which may have a range of alternative values associated with it. This data type is in some ways similar to the pattern type in SNOBOL-IV and the assertion type in AXLE (Cohen and Wegstein, 1965). A further distinctive feature of SP/1 is that strings are stored as sequences of atoms where an atom is the smallest meaningful unit of the string. The size of an atom is determined on input as shown below, but normally an atom may be regarded as a single character symbol or as a group of consecutive alphameric characters. The latter could be the case for example in text processing where one atom would be equivalent to a word of text. This approach allows the processor to operate on strings composed of text words while retaining the capability to manipulate strings of individual symbols where required. This provides faster operation with a considerable saving in storage requirements in the text processing types of applications where the smallest logical unit of information is a word of text. Thus, essentially, there are two modes of operation, character and text, corresponding to the two types of storage. In the current version of SP/1 mixed mode operations are not allowable. The method of string storage, which involves a hash table, is described elsewhere (Macleod, 1969a). Extract: Introduction In general, string processing systems deal with data which is in the form of unstructured strings of characters. COMIT (Yngve, 1962), SNOBOL-3 (Farber, Griswold and Polansky, 1966) and SNOBOL4 (Griswold, Poage and Polansky, 1968) are three well-known string processing languages. Typical of the types of operation possible in these languages are matching, insertion, replacement and concatenation of strings and substrings. With the increasing usage of computers in many different fields, the distinction between numeric and non-numeric applications is becoming less apparent, as for example in information retrieval problems. Consequently it seems desirable that a single programming system should incorporate efficient numeric and non-numeric capabilities. The SP/1 system described here has been designed and implemented as a string processing system embedded in FORTRAN-IV. To avoid adding to the diversity of programming systems already in existence and since SNOBOL is a wellknown language whose syntax is readily adaptable to a FORTRAN environment, the operations provided in SP/I are similar to those available in SNOBOL-3. Unlike, for example DASH (Milner, 1967), which is a string processing extension embedded in ALGOL, SP/1 is both a syntactic and semantic extension to FORTRAN. The string processing statements can be represented by a set of macros which are expanded into FORTRAN statements by a macro generator (Macleod and Pengelly, 1969) prior to compilation. The macros have been designed so that there is a close similarity between the syntax of the corresponding SP/1 and SNOBOL3 statements. For example, the SNOBOL-3 statement REPEAT E "(" *V* ")" = V /S (REPEAT) deletes all the pairs of left and right parentheses from a string E. The corresponding SP/I statement is In addition, SP/1 provides a data type known as an association which may have a range of alternative values associated with it. This data type is in some ways similar to the pattern type in SNOBOL-IV and the assertion type in AXLE (Cohen and Wegstein, 1965). A further distinctive feature of SP/1 is that strings are stored as sequences of atoms where an atom is the smallest meaningful unit of the string. The size of an atom is determined on input as shown below, but normally an atom may be regarded as a single character symbol or as a group of consecutive alphameric characters. The latter could be the case for example in text processing where one atom would be equivalent to a word of text. This approach allows the processor to operate on strings composed of text words while retaining the capability to manipulate strings of individual symbols where required. This provides faster operation with a considerable saving in storage requirements in the text processing types of applications where the smallest logical unit of information is a word of text. Thus, essentially, there are two modes of operation, character and text, corresponding to the two types of storage. In the current version of SP/1 mixed mode operations are not allowable. The method of string storage, which involves a hash table, is described elsewhere (Macleod, 1969a). in The Computer Journal 13(3) view details |