LINGOL(ID:617/lin010)for LINguistics Oriented Language Vaughan Pratt MIT 1973 Natural language processing language People: Related languages
References: in Proceedings of the Third International Joint Conference on Artificial Intelligence IJCAI-73, Stanford, CA: Stanford University 1973 view details Survey of the Problem Domain. There is little agreement on the extent to which syntax should be a consideration in the design and implementation of programming languages. At one extreme, it is considered vital, and one may go to any lengths [Van Wijngaarden 1969, McKeeman 1970] to provide adequate syntactic capabilities. The other extreme is the spartan denial of a need for a rich syntax [Minsky 1970]. In between, we find some language implementers willing to incorporate as much syntax as possible provided they do not have to work hard at it [Wirth 1971]. In this paper we present what should be a satisfactory compromise for a respectably large proportion of language designers and implementers. We have in mind particularly (i) those who want to write translators and interpreters (soft, firm or hardwired) for new or extant languages without having to acquire a large system to reduce the labor, and (ii) those who need a convenient yet efficient language extension mechanism accessible to the language user. The approach described below is very simple to understand, trivial to implement, easy to use, extremely efficient in practice if not in theory, yet flexible enough to meet most reasonable syntactic needs of users in both categories (i) and (ii) above. (What is ?reasonable? is addressed in more detail below) , Moreover, it deals nicely with error detection. One may wonder why such an ?obviously? utopian approach has not been generally adopted already. I suspect the root cause of this kind of oversight is our universal preoccupation with BNF grammars and their various offspring: type 1 [Chomsky 1959], indexed [Aho 1968], macro [Fischer 1968], LR(k) [Knuth 1965], and LL(k) [Lewis 1968] grammars, to name a few of the more prominent ones , together with their related automata and a large body of theorems. I am personally enamored of automata theory per se, but I am not impressed with the extent to which it has so far been successfully applied to the writing of compilers or interpreters. Nor do I see a particularly promising future in this direction. Rather, I see automata theory as holding back the development of ideas valuable to language design that are not visibly in the domain of automata theory. Users of BNF grammars encounter difficulties when trying to reconcile the conflicting goals of practical generality (coping simultaneously with symbol tables, data types and their inter-relations, resolution of ambiguity, unpredictable demands by the BNF user, top-down semantics, etc.) and theoretical efficiency (the guarantee that any translator using a given technique will run in linear time and reasonable space, regardless of the particular grammar used). BNF grammars alone do not deal adequately with either of these issues, and so they are stretched in some directions to increase generality and shrunk in others to improve efficiency. Both of these operations tend to increase the size of the implementation "life-support" system, that is, the software needed to pre-process grammars and to supervise the execution of the resulting translator. This makes these methods correspondingly less accessible and less pleasant to use. Also , the stretching operation is invariably done gingerly, dealing only with those issues that have been anticipated, leaving no room for unexpected needs. I am thinking here particularly of the work of Lewis and Stearns and their colleames on LL(k) grammars, table grammars, and attributed translations. Their approach, while retaining the precision characteristic of the mathematical sciences (which is unusual in what is really a computer-engineering and human-engineering problem) , is tempered with a sensitivity to the needs of translator writers that makes it perhaps the most promising of the automata-theoretic approaches . To demonstrate its practicality, they have embodied their theory in an efficient Algol compiler. A number of down-to-earth issues are not satisfactorily addressed by their system deficiencies which we propose to make up in the approach below; they are as follows. (i) From the point of view of the language designer, implementer or extender, writing an LL(k) grammar, and keeping it LL(k) after extending it, seems to be a black art, whose main redeeming feature is that the life-support system can at least localize the problems with a given grammar. It would seem preferable, where possible, to make it easier for the user to write acceptable grammars on the first try, a property of the approach to be presented here. (ii) There is no "escape clause" for dealing with non-standard syntactic problems (e.g. Fortran format statements), The procedural approach of this paper makes it possible for the user to deal with difficult problems in the same language he uses for routine tasks. (iii) The life-support system must be up, running and debugged on the user's computer before he can start to take advantage of the technique. This may take more effort than is justifiable for one-shot applications. lie suggest an approach that requires only a few lines of code for supporting software. (it ) Lewis and Stearns consider only translators, in the context of their LL(k) system; it remains to be determined how effectively they can deal with interpreters. The approach below is ideally suited for interpreters, whether written in software, firmware or hardware. in [ACM SIGACT-SIGPLAN] Proceedings of the ACM Symposium on Principles of Programming Languages, Boston, October 1973. Association for Computing Machinery. view details in [ACM SIGACT-SIGPLAN] Proceedings of the ACM Symposium on Principles of Programming Languages, Boston, October 1973. Association for Computing Machinery. view details in IJCAI-75 Proceedings of the 4th International Joint Conference on Artificial Intelligence, 1975 Tbilisi, USSR view details Vaughan Pratt did another Lisp-based CF parser called Lingol while at MIT in the 1970's. Lingol is a full CF parser with additional hints and extensions on the grammar rules. Unlike Woods's parser, Lingol is bottom-up, and generates _all_ parses in O(n^3) time and space. For ambiguous sentences, you get a fully factored parse tree in which the potentially infinite number of different parses are stored in cubic space. Lingol uses the Lisp 'hints' to decide which of the ambiguous parses are to be selected. Lingol uses no backup. I extended Lingol to handle ambiguity in the word morphology level, which is required if there is noise (e.g., misspellings) in the input. (I still have a version of Lingol for Common Lisp if anyone is interested. I'll have to check with Vaughan re copyright, however.) in IJCAI-75 Proceedings of the 4th International Joint Conference on Artificial Intelligence, 1975 Tbilisi, USSR view details |