PIVOT Input language(ID:7160/)INput language for the PIVOT theorem proverSystem for proving theorems "The input language for PIVOT is a simple statement-oriented language reminiscent of ALGOL W" Deutsch, Stanford, 1973 Places People: Related languages
References: INPUT LANGUAGEProgram structurePIVOT uses an algebraic, statement-oriented language for the programs to be analyzed. The author rejected the obvious alternative, LISP, on the grounds that it was not particularly suitable for transformation into flowchart form, that the published programs chosen as benchmarks were written in ALGOL already, and that using a parser which provided the ability to define the syntax of the language would allow experimentation with the idea that the language structure could affect ease of verification. PIVOT programs are organized in terms of modules. A procedure module, or simply procedure, consists of a header giving the procedure name, declarations for the input and output variables and structured local variables, and a body containing assertions and executable statements. (Procedure modules are like ALGOL procedures, except that textual nesting is not allowed and all parameters are transmitted by value.) A declaration module has a header and declarations only. Declaration modules are similar to macro-instructions: the INSTANCE statement inserts a copy of a declaration module in-line. Declaration modules are not especially useful in PIVOT at present: their existence is motivated by the "structured programming" principle that definitions which are required in many places should only be written in a single place. Within each module statements are numbered. The numbers are strictly for editing and sequencing purposes. Intra-program references are not by number, as in BASIC for example, but by label. The only semantic significance of statement numbers is that numbers below 100 are reserved for declarations and numbers above are reserved for non -declarations. The semantics of variables are especially simple. All variables are local to the procedure module in which they appear. Undeclared variables are assumed to be local scalars. This requires that all information accessible to a procedure be passed to it by the caller. Of course, this attitude is not satisfactory for a real programming system; it was adopted in order to defer questions relating to global side-effects until a later stage of the research. <procedure-head> ::= PROCEDURE <id> <formal-parameters> INSTANCE <id> <optional-parameters> | <declaration-mode> ::= ARRAY [ ( <number-of dimensions> ) ] | <number-of-dimensions> ::= <num> <statement> ::= [ : <label> : ] <statement-body> LEMMA <q-relation-list> | <simple-list> ::= <assignment> ${; [ ; <branch> ] <assignment> ::= <left-hand-side> <q-relation-list> ::= <q-relation> ${, (q-relation>} IF <rel-expression> <non-q-relation> ::= <disjunction> [ IMP <disjunction> ] NOT <relation> | <rel-symbol> ::= = | # | < [ = ] | > [ = ] ( <general-expression> ${, <arith-expression> ::= <term> ${+ <term> | - <term>} CREATE <record-type> | <record-type> ::= <id> <expression>} "j" | <field-name> ::= <id> DeclarationsFigure II-1 gives the syntax of declarations. A variable (name) may be declared as one of four types: an array of a given number of dimensions, a scalar (unbounded integer), a pointer to a record of a previously defined type, or a record type. Record types carry a list of field names (formal parameters): the field selection operator takes a record variable on the left and a field name appropriate to that record on the right. Neither field names nor array names are values, i.e., they cannot be computed, so the relationship between record pointers and field names is essentially the same as between array indices and array names. These simple semantics allow expression of many programs that manipulate structured data, but avoid the more difficult questions concerning design of data structuring facilities in programming languages, such as the proper treatment of coercions and unions. To cover the need for abbreviation, the LET statement provides a macro-definition capability at the expression level. For example, one could define an abbreviation for the maximum of two quantities as INSTANCE calls for insertion of a copy of a declaration module with actual parameters substituted for formal. As with LET, the substitution is macro-like. In particular, variables in the declaration module which are not formal parameters of the module are treated like any other variables of the procedure in which the INSTANCE statement appears, since the statements of the declaration-module are literally copied on-line to effectively replace the INSTANCE statement. The intended use of declaration modules and INSTANCE is to provide a library of parameterized assertions. For example, suppose one wants to have the idea of an array being sorted readily available. The following declaration module would serve: StatementsThe syntax of statements other than declarations appears in Figure II-2. Groups of ASSERT statements mark points in the program with two properties: the asserted relations are to be assumed for all control paths leading away from the point, and the relations are to be proved necessarily true for all control paths leading to the point. DECLARE statements are like ASSERTs except that they are implicitly copied as part of every interior group of ASSERTs. (This provides the notational convenience mentioned in the introductory section on theoretical background.) For cases where it is temporarily necessary to override a DECLARE, the CANCEL statement appearing in an ASSERT group cancels an enclosing DECLARE statement with the specified label. The LEMMA statement allows the user to provide "hints" for PIVOT: a LEMMA must be proved for all incoming paths and is assumed for all outgoing paths, but does not itself delimit paths. The ASSUME statement allows the user to specify truths (axioms) to be accepted on faith. The semantics of the control statements are simple. For tests, there are IF statements and ELSE statements (the THEN clause is part of the IF). Loops are a little less conventional. The LOOP statement signals the beginning of a loop, but the WHILE statement (or REPEAT, which means "repeat forever") actually begins the body. This arrangement allows insertion of assertions between the LOOP and the WHILE, which are checked both before the first entry into the loop and after each iteration. (Earlier versions of PIVOT did not provide this facility, which made it necessary to write two copies of the assertions if these semantics were desired.) For abnormal exit from a loop, the EXIT statement looks for an enclosing statement with the specified label and branches to the statement following the end of the loop body, if a LOOP statement, or the END, if a BEGIN. The NEXT statement searches for an enclosing IF or ELSE IF and effectively revokes the decision, forcing the succeeding ELSE to be executed. EXIT and NEXT are present in some form in several modern programming languages and seemed conceptually cleaner than GOTO: for one thing, they make the problem of detecting unreachable statements trivial: Doing all control in terms of BEGIN-END pairs also makes possible attractively indented listings of procedures which indicate the logical structure. Early versions of PIVOT used GOTO, which was not much harder to analyze for purposes of path generation but which produced less readable programs. This leaves assignments and the "..." statement. Assignments to simple variables, array elements, or structure components are allowed. Assignment to a variable declared as a record pointer just changes the pointer: this is like LISP and most other languages that recognize the existence of pointers. "..." is used to indicate a gap in the program. Its only effect is to prevent verification from being attempted on any paths that pass through it. In the future, "..." will prompt PIVOT to analyze the surrounding program for qualities which the missing code must possess, such as requirements that certain variables be changed, or even the requirement that a certain assignment be done. This statement was added to PIVOT very recently, as an implementation of the idea suggested in Appendix C. Extract: Representation and Simplification REPRESENTATION AND SIMPLIFICATION Expression overviewLike King's system, PIVOT represents expressions internally in a simplified canonical form. Each operator has an associated routine which is responsible for producing the canonical expression resulting from applying that operator to operands, e.g., there is a routine MKPLUS which takes a list of expressions and produces their canonical sum. This organization reduces the size of intermediate expressions (an important factor since a good deal of the processing involves scans over entire expressions) and, since most simplifications are inherently local anyway, provides an inexpensive unifying framework within which new operators and their associated simplification rules can easily be added. Restricting the variety of forms for the same expression can also cut processing time: for example, keeping the terms of a sum sorted reduces the time to add two existing sums (of m and n terms) and combine like terms from O(m*n) to O(m+n). It is worth noting that these operator routines, particularly those for addition, multiplication, and logical connectives, were a major speed bottleneck in the early versions of PIVOT and had to be rewritten: in the process, they were tuned for the relatively simple expressions that form the great majority of those actually encountered in proofs. PIVOT, unlike King's system, relies heavily on the ability to store and retrieve properties of expressions, i.e., to use expressions as keys in tables. For example, computing the negation of a logical expression is a frequent and moderately time-consuming operation: it would be convenient to compute the negation only once, store it in a table, and then look it up when it was needed thereafter. A problem arises from the fact that hash tables, which afford the most rapid lookup procedure, require a fast mapping of the key into a small range of integers. Doing this for an expression, in the absence of any other stratagem, requires examining the entire expression, if not to compute the hash code, at least to compare the expression to the key in the table. However, if means can be found to ensure that each expression actually only appears once in storage, then the location of the expression can be used as the key and lookup can be extremely fast. It appears that the extra time required to ensure this property using PIVOT' s technique (which is described in the next paragraph) is far outweighed by the reduction in space and the increased speed with which properties of expressions can be determined compared to more conventional methods. As Chapters III and IV make clear, the entire organization of PIVOT assumes that looking up expressions in tables is cheap. (The QA4 group uses a different technique called a "discrimination net" to achieve similar ends; their application requires more general kinds of matching than strict identity when looking for a matching key.) The PIVOT technique for assuring expression uniqueness is similar to techniques used in compilers to detect common subexpressions. PIVOT expressions are LISP list structures, and all such structures are built up from two-component nodes (e.g., the head node of the expression (PLUS 3 X) has as components the symbol PLUS and a pointer to the sublist (3 X)), so the problem reduces to ensuring that no two nodes exist with the identical components. PIVOT implements this with a function (HCONS cl c2) which constructs a new node with the given components if none exists, or returns the existing node if there is one. If c2 is NIL (the list terminator), HCONS looks up cl in a hash table, using the location as the key. (The locations of literal symbols like X are guaranteed to be unique by LISP.) The value found in the hash table is a node with the key as first component and NIL as second. If c2 is not NIL, it is looked up in a different hash table, where the value is a list of nodes with c2 as second component: this list is searched linearly for a node with the proper first component (cl). Thus, after constructing the list (PLUS 3 X), the NIL table would have the following entry: All routines in PIVOT which construct expressions use HCONS, either directly, or to copy the expression before returning it, so all expressions are guaranteed unique representation. This hashing scheme was partly a result of the presence in BBN-LISP of a hashing function that only took a single pointer as the hash key, and the absence of any facilities for building a two-key hashing function: the latter would probably have been preferable to the scheme just described, although not greatly since the average length of the lists is only about 3 entries. Arithmetic expressionsThe syntax of internal arithmetic expressions appears in the figure below. Conditional and choice expressions will be discussed in the next section. Note that the internal syntax is quite a bit simpler than the external: for example, there is no subtraction operator. Note also that some simplification, such as combining constant terms, is implicit in the syntax. These syntax equations, and the others in this chapter, refer to LISP lists rather than strings of characters; in particular, parentheses refer to list boundaries (as is customary in the external representation of LISP data). <arithmetic-expression> ::= (PLUS <num> <term) $<term>) | <num> | <term> <term> ::= (TIMES <num> <primary> $<primary>) | < primary> <primary> ::= <id> | <conditional> I <choice> | (<function> $<arithmetic-expression>) <function> ::= EL | QUOTIENT | REMAINDER | IEXPT | <user-defined-function> Simplifications for addition include combining constant terms, combining other like terms (e.g., adding X and (TIMES 2 X) to produce (TIMES 3 X)), and eliminating terms that sum to zero. After simplification, to produce the canonical form the non-constant terms are sorted using an ordering which initially disregards constant factors, and places variables first, then other primaries. (This is the same as King's ordering: its advantages over an arbitrary ordering based on, for example, the locations of their terms, are that it places variables and negatives of variables first, which is useful in determining whether an equality can be solved for a variable, and that it makes collecting like terms easy. It is not clear that these advantages outweigh the expense of sorting according to such a complex ordering.) If the final sum is just a constant or a variable, the canonical form is the constant or variable by itself rather than a PLUS expression. King's canonical form essentially keeps all expressions in the form (PLUS <num> $(TIMES <num> S<primary>)); the PIVOT form seems preferable since it conserves space in expressions not involving multiplication, but it admittedly complicates the operator routines for addition and multiplication. The decision to always have a constant term in a sum was a compromise with King's form. Simplifications for multiplication are essentially the same as those for addition: like factors are combined to produce exponentiation (lEXPT). Products of sums are multiplied out, e.g., multiplying (PLUS -1 X) by Y produces (PLUS 0 (TIMES -1 Y) (TIMES 1 X Y)). Although this occasionally produces larger expressions, it simplifies the internal syntax drastically. EL is the subscripting operator and has no speical simplification rules. Division recognizes x/x=>l, x/l=>x, and 0/x=>0; if the denominator is a constant or a simple variable, and the denominator divides each term of the numerator, simplification also occurs. Exponentiation of a sum or a product to a constant power results in explicit multiplying-out; the only other transformations for the IEXPT operator are 0**x=0, l**x=l, and x**l=x. >PIVOT currently has no provision for user-defined simplifications, but extending it to deal effectively with user-defined functions and domains will require such provision. The main problem is not how to express such rules but how to incorporate them into the program in an efficient manner. It is the author's opinion that this will involve user-defined canonical forms as well as some kind of efficient pattern detection mechanism; this question will receive further attention in the last chapter of this thesis. Logical expressionsThe syntax of internal logical expressions appears below. Conditional and choice expressions are actually arithmetic expressions, but their processing and semantics are more closely related to those for logical expressions. <conditional> ::- (LCOND ${<1ogical-expression> <arithmetic-expression>}) <logical-expression> ::= <quantifier> | <disjunction> | <conjunction> <disjunction> ::= (OR (conjunction> <conjunction> $<conjunction>) | <logical-primary> <conjunction> ::= (AND <disjunction> <disjunction> <logical-primary> <logical-primary> ::= <quantifier> | (<relationb- symbol> <arithmetic-expression> <num>) <relation-symbol> ::= EQUAL | NEQUAL | LEQ | CEQ LCOND stands for "linear conditional" and represents an expression which may have different values depending on the truth of some condition or conditions. Unlike the familiar IF-THEN-ELSE of programming languages, however, the logical-expressions of a PIVOT conditional are always mutually exclusive, so that in principle their order is unimportant and they could be evaluated in parallel. (In fact, for canonicalization, the clauses are sorted on their logical-expressions.) (This format is called "linear" somewhat in the sense of "linearly independent".) Thus the ALGOL conditional expression IF pl THEN el ELSE IF p2 THEN e2 ELSE e3 would become (LCOND pl el p2∼pl e2 ∼p2∼pl e3). In the early versions of PIVOT, the conditional expression did have sequential (ALGOL-like) semantics, but it turned out that this was ill-suited to processing by the LISP mapping functions and also that the linear (independent) conditions like were usually what was wanted. Three kinds of simplification apply to conditionals. One is de-nesting: (LCOND pl el p2 . Another is removal: if any condition is NIL (false), that clause is removed, and if a condition is T (true: this implies that all the others are NIL by exclusiveness), its associated expression becomes the value of the entire conditional. Another is merging of clauses with identical consequents: (LCOND pl el p2 e2 p3 el) A fourth transformation related to conditionals is carried out by a pre-processor that precedes every operator routine: if op is any arithmetic operator (PLUS, TIMES, EL, QUOTIENT, REMAINDER, IEXPT, or a user-defined function), then (op ... (LCOND pl el -. pn en) ...) becomes (LCOND pl (op ... el ...) ... pn (op ... en ...)), i.e., the operator is pushed inside the conditional. (This procedure can lead to a proliferation of cases, for example if both terms of a sum involve conditionals, but PIVOT is sufficiently successful at avoiding conditionals through case analysis that this difficulty does not seem to arise in practice.) In this way, the conditional eventually becomes an argument of a predicate (in the current PIVOT, always the first argument of arithmetic relation), where it undergoes a similar transformation and vanishes: (relation-symbol (LCOND pl el ... pn en) .num) becomes (OR (AND pl (relation-symbol el .num)) ... (AND pn (relation-symbol en .num))). King's system has all of these transformations except clause merging, but it is not clear from his thesis whether his conditionals are sequential or parallel. The alert reader will have remarked that there is no NOT operator in PIVOT. It happens that the negation of a canonical expression is always another canonical expression without needing a NOT operator: AND and OR become each other, likewise EQUAL and NEQUAL, LEQ and GEQ. (The last arises from the fact that for integers, ∼(x≤y) ≡ x>y ≡ x≥y+1.) In fact, the PIVOT internal expression syntax does allow for NOT, and there is a routine associated with it, in expectation of future additions such as user-defined predicates. Simplifications for AND include the obvious ones: A less obvious set, discovered by King, results from the representation of arithmetic relations in the form (<relation-symbol> <arithmetic-expression> <num>). Any conjunction of relations of this form with the same <arithmetic-expression> can be simplified as follows: (1) If any of the relations is an equality, then substitute the numeric value obtained from that relation into all the others: if they are all true, then the simplified conjunction is the equality alone, otherwise NIL (false). (2) If more than one of the relations is a LEQ, drop all such relations except the one with the smallest numeric part ( (3) Similarly, if more than one relation is a GEQ, drop all such relations except the one with the greatest numeric part. (4) If the upper and low bounds have crossed the result is NIL. If they are equal, replace them by an equality and do case 1. (5) Otherwise, the remaining relations are all inequalities. If any fall outside the range delimited by the upper and lower bounds, delete them. If one falls on an endpoint, delete it and "lighten" the bound, e.g., . (This transformation relies on properties of the integers and is not applicable to real arithmetic in general.) This may produce a crossover or equality as in case 4. PIVOT actually carries out this process whenever it forms a conjunction that involves two or more arithmetic relations with the same first argument (non-constant part). The remaining transformation for AND arises when one of the conjuncts is a disjunction. In this case, each disjunct is compared against every simple conjunct to see whether it is either incompatible with that conjunct, in which case the disjunct is eliminated ( The simplifications for OR are exactly analogous to those for AND. Simplifications for arithmetic relations are those of King (dividing out constant factors, and recognizing that certain equalities like 4x+2y=3 have no integer solutions), and two more. One is conversion of x/k≥n to x≥k*n, where k and n are integers, k positive. The other is conversion of x*y=0 to x=0!y=0, where at least one of x and y is not a sum (PIVOT' s factoring abilities are rather weak). This long list of simplifications may seem haphazard, but it was extended from King's list by hand-proof of several complex examples, and every simplification in PIVOT receives sufficient use to warrant retaining it in view of the fact that the same result often cannot be obtained without it. The list is certainly not "complete" in the formal sense of the term; the author believes that this is not a very important criterion for judging the merits of such systems. This question is discussed at length in Chapter VII. Quantifiers<choice> ..= (CHOICE <bound-variable-list> < logical-expression > < arithmetic-expression >) <quantifier> ::= (FA <bound-variable-list> <domain> <disjunction>) | <logical-expression>) Quantifier simplifications will be discussed in terms of FA (universal quantifier) although EX (existential quantifier) is treated exactly the same way. PIVOT tries to divide the expression in the range of a quantifier into a "domain" and a "body" for purposes of simplification and as an eventual guide to forming instances of universally quantified assertions. (This representation is motivated by the frequent occurrence in program proofs of quantifiers which describe properties of intervals or subarrays.) If the body of a FA is c1!c2!...!cn, then ci is a "domain predicate" if it is of the where v is a bound variable and e is any expression not involving v (but possibly involving other bound variables). The negated disjunction of all domain predicates becomes the domain and the disjunction of the remaining cj becomes the body, using the equivalence of the three expressions: If any domain predicate is an equality (inequality before negating), e is substituted for v throughout the domain and the body and v is deleted from the list of bound variables. If the body reduces to a logical constant b (T or NIL), then the entire quantifier expression is a constant whose value depends on the type of quantifier and whether there are any values of the bound variables that satisfy the domain expression, according to the following table:
Unlike resolution-based systems, which move quantifiers to the outermost level, PIVOT moves quantifiers as far in as possible. This involves moving a FA inside an AND, moving a disjunct of an OR outside a quantifier if none of the bound variables occur in it, and one more complex transformation which was discovered in the course of trying out examples: converting ( , where c [ e:x ] represents the result of substituting e for x in c. One obvious initial transformation, namely converting a body to conjunctive form for FA or disjunctive form for EX, is only carried out if such a transformation does not cause a size explosion (specifically, if it no more than doubles the length of the expression). There is a computationally expensive problem associated with converting an expression with bound variables to canonical form. One would like the canonical form to be independent of the names of the bound variables, so that e.g., (FA (BV X) (OR (P V) (P X))) and (FA (BV U) (OR (P U) (P V))) would have the same canonical representation. As this example shows, this is an extremely difficult task; in general, it involves finding a canonical form for an arbitrary labeled tree. PIVOT does not attempt to deal with this problem: the two expressions exhibited above are actually both in canonical form as far as PIVOT is concerned. The CHOICE function is an attempt to clear up a fuzzy area in King's definition of division. One would like to define x/y according to the Fundamental Theorem of Arithmetic, i.e., "the q such that x=q*y+r where 0≤r<|y|". Leaving aside some complications introduced by the possibility of a negative denominator, King's definition was a conditional: "if q*y<x<(q+l)*y then q", where q was a newly-generated variable name. (For positive y, there is always a q that satisfies the condition.) Making q a new variable implicitly recognizes that q is local to this subexpression; this function is more properly provided by a bound variable. In fact, there are situations where this omission of an implicit quantifier produces errors. Consider the generic expression When a CHOICE expression becomes the argument of a predicate, (r (CHOICE (BV x) (p x) (f x))), there are two plausible transformations to use to eliminate the CHOICE. One possible transformation produces We want to show The second derivation: However, neither [ 1 ] nor [ 2 ] is an ideal translation. Consider what happens when a clause which originally contained a CHOICE is asserted as part of a theorem. If the quantifier resulting from the CHOICE is EX, it can be removed, and its variables replaced by Skolem constants; this is preferable to the FA case since the theorem prover is better equipped to deal with unquantified clauses. Whether EX or FA results at the clause level depends both on the original quantifier used to replace the CHOICE and on the number of times the subexpression is negated: neither EX nor FA as the original quantifier can guarantee EX at the clause level. Therefore, since we have just proved that we can use either EX or FA and produce logically equivalent expressions, we invent a new "hybrid" quantifier FU which has the desirable properties of both EX and FA: and FU can be eliminated at the clause level like EX. This hybrid quantifier is unique to PIVOT and is the result of many frustrating attempts to understand the semantics of CHOICE. |