RIGAL(ID:1357/rig001)

Structured language for compiler writing 


for RIGA Language

Language for compiler writing. Data strucures are atoms, lists/trees. Control based on pattern-matching


People:
Structures:
Related languages
Floyd - Evans language => RIGAL   Influence

References:
  • Auguston, Mikhail "Programming Language RIGAL as a Compiler Writing Tool", Inst of Math and CS of Latvia U, 1987. view details
  • Auguston, Mikhail "Rigal Programming System: Language Description" University of Latvia, Institute of Mathematics and Computer Science 1987 view details Abstract: A new programming language for compiler writing is described. The main data structures are atoms, lists and trees. The control structures are based on advanced pattern matching. All phases of compilation, including parsing, optimization and code generation, can be programmed in this language in short and readable form. Sample compiler written in RIGAL is presented. Extract: Description
    Programming language RIGAL is intended to be a tool for parsing ( context checking, diagnosing and neutralization of errors included ), for code optimization, code generation, static analysis of programs, as well as for the programming of preprocessors and convertors.

    Almost all the systems envisaged to solve compiler construction problems contain means to describe context-free grammar of the source language. Earlier systems, like the Floyd - Evans language, [1] present tools to work with stack, which is used for parsing. Parsing methods for limited grammar classes are implemented in the languages and systems of later generations (usually LL(1) or LR(1) ).

    Such systems as YACC (Johnson [2]), CDL-2 ( Koster [3]), SHAG (Agamirzyan [4]) and many others make use of synchronous implementation of parsing and different computations, e.g., formation of tables, context checking, etc. Usually these actions are performed by call of semantic subroutines, written in some universal programming language ( e.g., in Pascal or C).

    Attribute grammars advanced by Knuth [5] have greatly influenced development of systems for compiler construction. Systems, like, SUPER (Serebryakov [6]), ELMA (Vooglaid, Lepp, Lijb [7]), MUG2 (Wilhelm [9]) are based on the use of attribute grammars not only for parsing, but for code generation as well.

    Pattern matching is a convenient tool for programming of parsing, optimization and code generation. The REFAL programming language [10], acknowledged for translator writing, may serve as a good example.

    Vienna method for defining semantics of programming languages [11] suggests the usage of labelled trees in order to present the abstract syntax of programs. Representation of compilation intermediate results in the tree form has become usual (see [12]).

    Dependence of control structures in the program from data structures used for program's work is one of the basic principles in programming. The recursive descent method could be considered to be the application of dependence principle.

    The above mentioned ideas and methods were taken into account when creating RIGAL language.

    The language possesses few basic notions. Data structures contain atoms, lists and trees. Advanced mechanism of pattern matching lies at the basis of control structures.

    The fact that RIGAL is a closed language makes RIGAL distinctive. That means that almost all the necessary computations and input-output could be executed by internal means and there is no need to use external semantic subroutines. Therefore the portability of RIGAL programs to other computers is increased.

    Means for work with trees, different patterns including, enable both programming of parsing algorithms and optimization phases and code generation as well. The language supports design of multipass translators. Trees are used as intermediate data.

    The language allows to split the program into small modules (rules) and presents various means to arrange interaction of these modules. Pattern matching is used for parameter passing.

    RIGAL supports attribute translation scheme and easy implementation of synthesized and inherited attributes is possible. The problem of global attributes is solved by usage of special references.

    Lexical analysis is a separate task and requires special language facilities for description as it is, for example, in LEX/YACC [2] system. In the current implementation of RIGAL two scanners are included that accept lexics of Pascal, C, Modula-2 and RIGAL.

  • Auguston, Mikhail "Programming language RIGAL as a compiler writing tool", ACM view details
          in [SIGPLAN] SIGPLAN Notices 25(12) December 1990 view details
  • Auguston, Mikhail "RIGAL - a programming language for compiler writing", Baltic Computer Science, Selected Papers LNCS 502, Springer Verlag, 1991, pp.529-564. view details
          in [SIGPLAN] SIGPLAN Notices 25(12) December 1990 view details
  • Auguston, Mikhail and Vadim Engelson, "The Programming Language RIGAL, a Tool for Compiler Writing" pp.167-176 view details Abstract: The programming language RIGAL is a tool for parsing, (context condition checking, error recovery), code optimization, code generation and static analysis of programs as well as programming of preprocessors and translators. In this paper we summarize our experiences of writing compilers and other software in RIGAL and discuss how the RIGAL constructs are used in writing different phases of a compiler. An advanced type system with compile-time and run-time checks is also described. Extract: Survey of CCSs
    Compiler construction systems usually are based on the specification of the context-free grammar of the source language. While tools such as YACC [1,3] and CDL-2 [6] do perform parsing and attribute computation synchronously, the semantic actions are written in a different language (e.g. C). Attribute grammars are used both for parsing and code generation, e.g. in MUG2 [7]. Trees with labelled
    branches present conveniently the abstract syntax of the program in the Vienna semantic definition method [9]. Languages like ML [4] and LISP have a number of operations with lists and trees, but they lack syntactic elements to describe context-free grammars. The above mentioned ideas were taken into account when creating RIGAL. Initially RIGAL had no type declarations [2], like LISP. This paper presents an attempt to add types and type checking to RIGAL. The authors have implemented the RIGAL programming environment that integrates a syntax checker, an interpreter and a compiler. The syntax checker prepares an intermediate code for the interpreter, which is then used for debugging. The debugged program is finally compiled to PASCAL or C. The compiled code has about five times higher performance than the interpreted one. Extract: Overview of the RIGAL language
    2. Overview of the RIGAL language
    In this section a new version of the language is presented. This variant has not been implemented yet, and this paper is the first presentation of the new syntax. The main purpose of this description is to introduce the type declarations in RIGAL.

    2.1 Basic operations
    There are three general types of value in RIGAL: #atom, #list and #tree. #atom is either #str (string) or #int (integer).

    Constants of type #str are quoted strings like ’Hello’; #int constants are integer numbers, like 1994.

    Operations on lists are: the constructor {el1 el2...}, access by index list[index], append list!.element and concatenation list1!! list2.

    Operations on trees are: the constructor
    <.label1:el1, label2:el2,....> (all labels must be different), access by label tree.label (returns the appropriate element), and append tree1++tree2 (The branches of tree2 replace the branches of tree1 if the labels are equal).

    Variables must be declared with their corresponding type, e.g.
    The_list:#list; M:#int.

    It is also possible to declare the structure of a variable; RIGAL patterns describe lists and trees constructed from objects of other types: Rec:<.’a’:#t1,’b’: #t2.> where #t1 and #t2 are some types.

    This is discussed in chapter 3 in more detail.

    Expressions can be constructed from constants and variables using the operators ==, <> (for trees, lists and atoms), +, -, *, div, mod (for integers), <, >, <=, >= (for integers and strings), and, or, not (for all data types), [], !., !! (for lists), ., ++ (for trees).

    There is a special atom null that represents the boolean value "false" or the result of an incorrect operation. Any value except null represents "true" if it is an argument to a logical operation. The atom 't' is returned by logical operations and represents "true".

    The accumulative assignments can be written in short form, like
    A_list!.=N; A_list!!= Other_list; A_tree++= Other_tree; Counter+=1.

    2.2 Rules.
    The rule in RIGAL is similar to procedure or function in conventional languages. At the same time it corresponds to a grammar rule. The execution of the rule is based on the pattern matching. In the rule
    #r1 'a' 'b' ##
    the pattern consists of the atoms 'a' and 'b'. The rule may be called as #r1('a' 'b'). If the arguments match the pattern we say that the call of #r1 was successful. The calls #r1('a') and
    #r1('a' 'c') will thus fail.

    This is an example of function definition in PASCAL: function sum (A:integer;B:integer):integer; begin sum:=A+B; end; The same function can be written in RIGAL as
    #sum -- rule name
    : #int; -- type of returned value
    A,B:#int -- variable declarations
    # -- end of declaration part
    A B -- pattern. Variables match any object.
    / return A+B /## -- statement. Rule terminates with the symbol ##

    The result of #sum(5 2) is 7. Note that the rule names start with the # symbol. RIGAL keywords begin with a lowercase letter.

    A variable can be used as a pattern which matches any object, like in the rule #sum. The types of all variables must be specified. The type of the return value may be specified, otherwise it is assumed that the rule returns null. Also the type of input data may be specified (using the name of the pseudo-variable Input).

    The statements may be inserted before, between and after patterns and they are separated by ';'.

    The rule name can be used as a pattern element, like in the Extended Backus Naur form. The value returned by the rule may be assigned to a variable:
    #r2: #int; A,M:#int #
    'a' A=#r3 'd' /M=A+2;return M/.
    #r3: #int #
    'b' 'c' / return 2/.
    A call such as #r2('a' 'b' 'c' 'd') is successful and returns the value 4. The rule name
    #r3 is used as a pattern element in rule #r2 above. The result returned by #r3 is assigned to A.

    2.3 Compound patterns
    Sequences of elements, lists and trees can be recognized by compound patterns (S1...Sn denote arbitrary patterns in the following):
    (* S1... Sn *) describes repetition of the sequence S1...Sn zero or more times;
    (+ S1... Sn +) describes repetition of the sequence S1...Sn one or more times;
    (S1!...!Sn) means that the patterns S1...Sn are applied until one of them succeeds;
    [S] means that the pattern S is optional;
    {S1 S2 ... Sn} matches a list whose elements match S1,S2,...,Sn respectively;
    <.'a1':P1,'a2':P2,...,'an':Pn.> matches a tree T such that T.'a1' matches P1, then
    T.'a2' matches P2 etc.
    The patterns are applied in the same order as they are written; this way the tree traversal is controlled.

    The iterative tree pattern <* S1:S2 *> matches a tree such that each branch label name matches S1 and the corresponding leaf matches S2.

    Additional patterns help to simplify the error handling and parsing of non-LL(1) grammars: satisfy(expression) matches the object if expression yields true. The object is available as the variable This in the expression. This is a special pseudo-variable name. look(expression) executes in the same way, but it does not advance along the input sequence, it only "looks ahead".

    Example:
    #digit: #int; E,This: #str #
    E=satisfy((#length(This)== 1)
    and (This>= '0') and (This<= '9'))
    /return #ord(E)-#ord('0')/.
    This rule checks whether the input string represents a digit, and returns the numerical value of the digit if that is the case.

    2.4 Statements
    The assignment is written as variable=expression. The left hand side may also be a list or tree element:
    a[7]=3; b.'k'='x';

    The return terminates the rule execution with success and returns its argument.

    Rules can be used in expressions, like A=#sum(3 7)+#sum(8 9). They can also be used as statements in which case the returned value is ignored: #prepare(3 7);

    The statement print expression expression...; prints the values.

    There are also common if and loop statements, forall for iteration over lists and trees as well as constructs saving and loading values in (from) a file.

    2.5 Whole program structure
    The program is a set of rules. Each rule has the form:
    #rule_name [: [returned_value_type]; [variable_declaration; ... ]#] pattern ##

    The program starts from the first rule in the code. All the variables are local, thus during the execution each rule instance has its own set of variable values. However, variables of other rules can be referred too: last #r V refers to the variable V in the last (and still active) instance of the rule #r. External link: CiteSeer
          in Proceedings of Nordic Workshop on Programming Environment Research, Lund, June 1-3, 1994 view details
    Resources