Refine(ID:1179/ref004)

Country: us
- languages for us
- us/1985
Began: 1985
Published: 1985
- 1985

Cordell Green et al, Stanford U. High-level wide-spectrum specification language. Set-theoretic date types (sets, mappings, sequences), first-order logic (for all, there exists). Syntax-directed pattern matching, type inference, and a declarative transformation operator (pre- and post-conditions). Implemented as algorithms which build and transform annotated abstract syntax trees. Compiled into Common Lisp.
Available from Reasoning Systems, 3260 Hillview Ave, Palo Alto, CA 94304. (See CHI). info: help@reasoning.com

Places

Stanford us Academic

Structures:

Tree

Related languages

CHI

Refine

Evolution of

References:

D.R. Smith et al, "Research on Knowledge-Based Software Environments at Kestrel Institute" view details Extract: Introduction
Introduction
Today's programmer weaves a diverse collection of materials into the complex tapestry called a program. The materials are logical in nature and may include a model of the application domain, detailed knowledge of the particular problem being solved, general programming knowledge, constraints imposed by the targeted hardware and software environment, effciency considerations, and so on. The weaving of these materials is strongly influenced by the cognitive, notational, and manipulative tools used by the programmer. The problem is that, once constructed, a program typically undergoes a continual process of modification to repair defects and support new uses. How can programmers be very productive and have their products remain supple under these conditions?

The diversity of materials going into the construction of programs suggests that we begin to factor the programming process in order to lessen the amount of detail that the programmer needs to be concerned with at one time. The complexity of producing programs suggests that we begin to formalize and mechanize the programming process. In pursuit of these related goals of factorization and mechanization of the programming process, Kestrel Institute has as its primary focus research on knowledge-based software environments. That such systems are knowledgebased means that some of the diverse logical materials of the programming process are factored out and represented formally in a knowledge base. That they are software environments means that they are intended to provide automated support to the entire software lifecycle.

In this paper we present a summary of the CHI project which extended from 1978 to mid-1984 under the leadership of C. Green at Systems Control Technology (1978-1981) and Kestrel Institute (1981-1984). The objective was to perform research on knowledge-based software environments. Towards this end, key portions of a prototype environment, called CHI, were built that established the feasibility of the approach. One result of this research was the development of a wide- spectrum language that could be used to express all stages of the program development process in the system. Another result was that the prototype compiler was able to synthesize itself from a veryhigh-level description of itself. In this way the system bootstrapped itself into existence. We will use the term "CHI system" to refer to the research prototypes that were constructed during this period.

We feel that the approach to knowledge-based software environments presented here is promising not only as longterm research, but also because a significant technology transfer has already taken place to the world outside the laboratory. Namely, Reasoning Systems, Inc. has developed a commercial knowledge-based software environment called REFINE, based on the principles and ideas presented in this paper and demonstrated in the CHI prototypes. The REFINE system includes the functionality described in this paper, and is employed in applications at Reasoning Systems as well as other sites. An overview of this related work at Reasoning Systems is given in Section IV-C. In place of the CHI research prototypes, Kestrel now uses REFINE as the foundation for its current research. Kestrel's current research has since grown to include not only greater depth on the topics mentioned herein but new topics in the area of knowledge-based design environments. Kestrel's current research is surveyed in Section IV-D.

By way of introduction, let us examine one kind of factorization of the programming process, its motivation, and an approach to its realization in CHI. One of the reasons for the complexity of programs is that they typically incorporate certain optimization decisions. For example, a solution to a user's problem may be concisely expressed using recursion; however, for the sake of efficiency it is implemented using iteration controlled by a stack. Once the decision has been made to use a stack, other uses for it may be found and it quickly becomes an integral part of the program, related to other parts in complex ways. From a recent report: "These optimizationsspread information and introduce efficient but complex implementations for simpler abstractions. Both of these effects exacerbate the maintenance problem by making the system harder to understand, by increasing the dependencies among the parts, and by delocalizing information."

The incorporation of optimizations in programs is essentially forced upon the programmer by the nature of the compilers for current high-level languages. A typical highlevel language compiler maps each data and control structure into a unique machine-level code scheme, independently of the particular way it is used in context. The need for efficiency forces the programmer to make detailed implementations that often obscure the basic structure of a program. In contrast, knowledge-based compilers allow a program to be expressed in a very-high- level language (VHLL), i.e., in terms of contructs that may be more natural to the problem, but whose translation into machinelevel code depends on usage and context. For example, the compilation tools for CHI's VHLL allow programs to be written in terms of sets and mappings. Depending on how a particular set is used, it may be compiled into any one of a number of implementations, such as bit vectors, lists, or hash tables. So the use of a VHLL helps to factor out certain optimization decisions from the programming process. The programmer can write clearer, more concise programs in terms that are natural to the application domain, and then later can aid the compiler in choosing efficient implementations for the control and data structures in the program.

In this paper we describe the overall nature of the work done on the CHI project, give highlights of implemented prototypes, and describe the implications that this work suggests for the future of software engineering. A rationale for the design of the CHI system is presented next in Section II. A description of various research prototypes built during the CHI project and examples of their behavior are presented in Section III. The origins of CHI in the PSI project at Stanford University, as well as current research projects at Kestrel Institute and commerical applications of this technology at Reasoning Systems, are described in Section IV. Extract: Rationale for the design of CHI
Rationale for the design of CHI

In this section we introduce the main principles underlying the design of CHI. From these principles we then derive the overall structure and characteristics of the kind of knowledge-based software environment that provided the conceptual framework for the implementation and research efforts of the CHI projects.

A. General Principles

Perhaps the most important principle guiding the design of CHI was to construct an integrated environment for research in knowledge-based programming. There are several aspects to this. First, there was an emphasis on implementation, not only to test concepts and hypotheses, but also as a vehicle for elicting and exploring research issues. Second, a laboratory for research in programming environments requires instrumentation not generally available. The idea was to build up the tools needed to support productive research in this area.

A related principle was self-application. Since CHI was a system for constructing programs, what could be more natural (and a good test of the system) than to have CHI bootstrap itself? Upon analysis [ 30 ] , self-application seemed to depend on two prior principles: self-description and uniform closure. A system is self-described if every component of the system has a description within the system. Uniform closure in a system means that every component of the system is uniformly accessible by every other component. That is, every component can interact with or modify any other component in a uniform way. Taken together, these two principles mean that every system component has uniform access to the descriptions of all system components, including itself.

Another principle was the formal use of stepwise refinement. The idea in stepwise refinement is to start with an abstract statement of the problem or an abstract solution to the problem, called a specification. At each step a part of the specification is decomposed or refined into more detailed parts. This decomposition/refinement step embodies a design decision. Both data and control structures are refined in this way until the entire specification has been rewritten into the target programming language. For our purposes the important notions are that these refinements are explicit, that their rationale can be made explicit, and there typically exist alternative refinements of a given specification.

The alternative refinements that can be applied to a partially refined specification give rise to a tree-shaped "search space." Each node in this tree represents a more or less refined specification, with the root representing the initial specification. Arcs between nodes represent refinements. Alternative refinements are represented by the directed arcs from a node to the nodes representing the refinements of that node. Leaves represent programs in the target language.

The process of synthesizing code from a given specification can be viewed as a search through this tree for a leaf representing a program with satisfactory performance characteristics. Subtrees represent families of programs that share a common heritage (sequence of design decisions) and, thus, have common characteristics. This fact is useful in constructing heuristics for guiding the search.

It may be difficult to foresee the effects of a series of design decisions. If a partially redefined specification has unacceptable characteristics, then the synthesis process backs up and explores another sequence of refinements. If the space of possible implementations is large enough, then a programmer will not be able to manually explore many alternatives. We believe there are many applications where a machine can search and find better implementations of a specification than can humans.

The formalization of stepwise refinement provides two important factorizations of the programming process. First, there is the factorization of the programming process into formulating specifications and implementing them via refinements. Second, when refinements are represented explicitly, there is a factorization of programming knowledge. Since different sequences of refinements generally result in different implementations of the same initial specification, a relatively small number of refinements may be able to compactly encode a wide range of implementations.

B. System Structure Suggested by These Principles

Starting from the above principles, we derive the overall structure and characteristics of a knowledge-based software environment. These characteristics were worked out early in the CHI project and provided a sturdy foundation for later research and implementation efforts.

Several key system characteristics derive from the principle of stepwise refinement. First, a language is needed for expressing very-high-level specifications of programs, low- level (or target level) programs, and partially refined specifications that contain a mixed range of constructs from very-high-level to low-level. This suggests the need for a wide-spectrum language. Since the system is to be selfapplied, this language also serves as the system language. Also, a language is needed for representing refinements. Refinements can be formalized in terms of transformation rules that embody a single design decision and rewrite one piece of code into another. Transformation rules are typically comprised of a source pattern, SP, a target pattern, TP, and an enabling condition, C. Applying a transformation rule to a given specification S involves matching the source pattern SP with a part of S. If the enabling condition C holds, then S is refined by replacing the part of S that matches SP with TP A matching algorithm is required in order to appropriately instantiate SP, C, and TP The representation of transformation rules requires the ability to express patterns. One way to express patterns is via metalanguage expressions where the varying parts of the pattern are represented by metavariables. Second, the use of stepwise refinement suggests that much of the system's knowledge about programming will be captured in the form of transformation rules. This suggests that a knowledge base of such rules is an important component of the system, and that acquiring and organizing these rules are important tasks in developing and using the system. Third, the system must provide facilities to support the refinement of specifications into programs. This implies the need for a good user interface, mechanisms to support the search for efficient implementations of a specification, tools for analyzing partially refined specifications, etc.

The principles of self-description and uniform closure suggest several other characteristics of the system. In order to achieve self-description, a language is needed to describe system components and a knowledge base is needed to store the descriptions. The existence of such a description knowledge base allows us to further elaborate on the system support for stepwise refinement of specifications. Since the system is to be self-applied, the code generated by the stepwise refinement process can be system code. This suggests that the same knowledge base be used as the medium of the refinement process, and that specifications and their refinements be represented via the same kinds of descriptions as system components. System support for searching the tree of alternative implementations can be provided by a context mechanism that records states of the knowledge base such that it can be returned to an earlier state when necessary.

The description of a system component might include its structure, implementation, specification, type, properties, relationships to other objects, and other kinds of annotations. The description language thus has the character of a metalanguage, since it can refer to expressions in the (wide-spectrum) system programming language. Uniform accessibility to these descriptions suggests the use of a single coherent language that incorporates the widespectrum features needed for stepwise refinement, the patterns needed to express transformation rules, and the metalanguage needed to describe system components. Uniformity also suggests that this language be used for accessing the knowledge base.

In summary, the principles discussed above can be realized by a system organized about a knowledge base that contains descriptions of system components, programming knowledge in the form of transformation rules, and various stages of the refinement process, and utilizing a wide-spectrum language that is capable of expressing patterns and other metalevel concepts. In the next section we describe a prototypical implementation of a system of this kind. Extract: the V Language
V Language
In this section we give a brief description of some of the implemented features of the programming subset of the V language. V is wide-spectrum language, meaning that it contains a range of constructs from very-high-level to lowlevel. One use for V is specifying application programs, which can then be implemented using program synthesis tools. A variety of programming constructs, both highand low-level, are present in V for this purpose. The lowlevel constructs are included not to encourage a user to program in terms of them, but rather to allow the representation of all levels of program description generated during synthesis (including the target implementation) within V. Since V was designed also as a language with which to describe a knowledge-based programming environment, it necessarily contains constructs used to specify programming knowledge, system tools and their interrelationships, and even the knowledge-base objects with which DKB represents instances of V constructs. Experience with the system code of CHI indicates that program specifications in V are typically 5-10 times shorter than their equivalents in standard high-level languages such as Lisp.

The V language may be viewed as comprising a hierarchy of levels of description. The lowest level is a procedural subset used to express program specifications. The next level contains constructs used to describe the CHI objects used by the system to represent V expressions. The third level has constructs used to specify changes (creation, modification, and transformation) to these objects. The fourth and highest level contains constructs that can describe the use of structured collections of changes to achieve end goals, and the constraints imposed on these processes of change by CHI and the application at hand. These levels may be mixed freely in V specifications, but it is useful in presenting the language to describe V starting at the lowest level of the hierarchy and then proceeding to the higher levels. The next four sections describe these levels from the bottom up, with examples of V constructs at each level.

1) The Procedural Component of V: The procedural component of V is a strongly-typed block-structured language, similar in appearance to Pascal or Ada but containing a more powerful set of constructs. This is the primary portion of V used to write target programs. Programs are composed of type, variable, and function declarations, followed by an imperative portion written in terms of standard control structures. The data types provided by V include, set, sequence, mapping, relation, products (record type), and a host of lower level types such as array, integer, boolean, and atom. V includes primitive operations for input/output, assignment, and knowledge-base access and modification. The control structures in V include sequential composition, enumerations, various loop constructs, if statements, concurrency, blocks (which are permitted to return values), and function calls. V also includes firstorder logical constructs for specifying relationships and actions without explicitly specifying how to accomplish them.

[…]

2) Operations on Objects: The preceding section outlined the component of V used to specify procedural programs. In order to express specifications of the DKB objects that represent these and other V expressions, V must have constructs for describing these objects.

An object is described in V by means of a logic expression that specifies its property-value relationships. Two main notational devices are used. Since properties are treated as functions (mappings), a functional notation is used to reference the value of a property. Thus, the V expression P(X) denotes the value of the property P applied to the object X In a related notation, each generic object G has an associated predicate with the same name that determines whether a given object is an instance of G. For example, member-op(a) is true if object a is a membership operation.

[…]

The reader may have noticed that without further specification, the semantics of satisfy are ambiguous. For example, the predicate in (1) would be satisfied if an integer variable I already exists in the knowledge base. Several conventions are used to resolve such ambiguities. First, each existentially quantified variable in a satisfy statement results in the creation of a new object. Thus, (1) results in the creation of a new integer variable 1. Second, a frame convention is adopted to the effect that no condition that is true in the initial state will be false in the next state unless it is inconsistent with the predicate in the satisfy statement. For example, the predicate in formula (2) could possibly be satisfied by setting loc- vars(P) to the singleton set containing 1. However, this would violate the frame convention, since the membership of other elements in locvars(P) in the initial state is consistent with the membership of I in loc-vars(P). By the frame convention, the correct interpretation is to simply add I to loc-vars(P).

There is also a pragmatic issue that should be mentioned. Each V construct is compiled by transformation rules (discussed in the next section) into lower level constructs and ultimately into Lisp code. This approach allows the incremental development of V. The language can be extended by adding transformation rules to the compiler that explicate the meaning of new constructs. Also, the meaning of a particular complex construct can be built up incrementally in the compiler. For example, the satisfy construct is intended to be used for specifying small changes to the knowledge base by means of a logical description. For this purpose it is a clear and concise construct. However, as it is defined, one can use it to specify the halting problem and other unsolvable problems. Thus, code containing a satisfy construct can be arbitrarily difficult (and sometimes impossible) to compile. Despite this difficulty, satisfy is a very useful construct and a set of transformation rules were developed that handles its intended uses. As new uses arise, new transformation rules are developed and added to the compiler. The open-ended nature of compiling satisfy expressions provides an opportunity for future research on synthesizing code for achieving complex state changes.

3) Rules and the Rule Compiler: Basic knowledge-base operations can be combined with the procedural constructs of V to specify more complex and structured changes to the knowledge base. Program transformation rules could be implemented using if-statements and other control structures, and the desire changes could be accomplished by a sequence of the operations described above. However, this procedural representation of rules would be markedly inferior to the declarative formalism provided by V rules. The primary reasons for this area 1) the nonuniformity of such procedural representations, and 2) their lower descriptive level. Strong evidence for this second point is that the rule compiler, in translating V rules,
transforms them into an intermediate form exactly like the procedural representation proposed above, at which point this intermediate form is roughly 5-10 times as long as the original declarative V rule form.

V rules have the form:
'rule' name parameter-list transform
where transform is a transform statement of the form:
precondition ~ postcondition.

The parameter list of a rule contains one or more variables that denote the arguments in the body of the rule; these are treated as bound in the context of the rule. Other variables may be used to name (and subsequently refer to) additional objects of interest in the rule; these variables are free. Both the precondition and postcondition of a V rule are predicates, written as conjunctions, on the state of DKB. The constituent conjuncts of the precondition and postcondition are predicates that test the property-value relationships among the arguments and other objects referenced in the rule.

5) Data Structures and Data Structure Synthesizer: This section contains a brief, informal overview of the data type facilities provided by the V language, and of the data structure synthesizer (DSS) system.

V Data Types: The high-level types that V supports are sets, sequences, mappings, and product types (sometimes called records or plexes). A robust set of operations is provided for each type. Set types in V are unordered, nonrepeating collections of homogeneous elements. Mappings are unary, single-valued functions. Sequences are mappings from an initial subrange (possibly empty) N1 of the natural numbers to values. Sequences can have repeated value; that is, two or more indexes in [ 1 N1 can be mapped to the same value. The sequence type constructor encapsulates the total order property that is common to many lower level types such as strings, lists, stacks, queues, etc. In V, one can define all of these types in terms of sequences. Product types are tuples with a positive number of typed components. Product types are written T' x x Tn where T', , Tn are the types of the components of the tuples. V also includes the following lower level types: arrays, dotted pairs, integers, booleans, and atoms. Extract: The PSI Project
The PSI Project

The PSI system was the forerunner of CHI. The designs of PSI and CHI have much in common, most notably the use of a knowledge base of program transformation rules to represent programming knowledge, and tools for searching the space of alternative program implementations generated by the rules to find efficient target level programs.

The operation of PSI was composed of two phases: the acquisition phase, during which an internal model of a program was acquired, and the synthesis phase, which produced an efficient Lisp implementation of the program. In the acquisition phase, PSI used tools for acquiring a program specification by means of natural language dialog with the user, example input-output pairs, partial traces, or a high-level specification language.

The synthesis phase of PSI used two subsystems. The "coder," called PECOS, contained rules for implementing both high-level operations and data structures [ 1 ] . PECOS utilized several different implementations for each of a set of high-level data types, including sets and mappings. A variety of methods for high-level operations like enumerations, sorting, and searching were codified.

The "efficiency expert," called LIBRA, was used to estimate the efficiencies of alternative (partial) implementations, and to guide the search for an efficient (with respect to a user definable cost function) target level program. LIBRA used heuristics and analytic cost estimation as a basis for comparing alternatives. Both upper and lower bounds were used for execution-cost estimates, allowing the application of a branch-and-bound method to eliminate implementation decisions. LIBRA also used techniques for managing compilation resources by assigning priorities to decisions.
Extract: Related Research Systems
Related Research Systems
We describe below a few examples of research systems that explore areas related to the CHI project. In the area of languages, SETL was one of the first very-highlevel languages. SETL is based on the use of sets and set operations, and its compiler is capable of choosing alternative implementations based on performance. SETL differs from CHI in that it makes limited use of logic constructs and does not provide a programming environment. The CIP project in Munich developed one of the first widespectrum languages, called CIPL. Although the CIP effort has emphasized linguistic concerns and the formalization of transformational programming, a prototype compiler has been built for student use. Because of its first-order logic constructs, CHI is related to logic programming systems. It differs in that generally logic programming relies on a general-purpose theorem prover (which CHI does not). AIM in our view, logic programming systems (such as PROLOG) do not treat logic as a very-high- level language in that logic programs are not compiled into alternative implementations of varying efficiency. Specification languages are widely used and several systems[...]. Zave and Schell describe a system called PAISLey that provides an executable specification language for modeling asynchronous parallel processes.

Extract: Concluding remarks
Concluding remarks
Under the leadership of C. Green, the CHI project developed through the individual and combined efforts of many people. J. Phillips outlined many of the guiding principles of the CHI project and did much of the early implementation. T. Pressburger built later versions of the knowledge-base manager and context mechanism, G. Kotik designed and implemented the data structure synthesizer, and S. Westfold developed the rule compiler, logic assertion compiler, and pattern language. Other members of the CHI project included S. Angebrandt, B. Kedzierski, B. Mont-Reynaud, and S.

Tappel. In addition to the authors, current Kestrel researchers include C. Green, T. Brown, J. Coursey, A. Goldberg, R. Jullig, D. King, T. Pressburger, and B. Riemen sch neide r.

We feel that the CHI project has been very successful. The research prototypes established the feasibility and value of both the general principles underlying CHI and the CHI approach to their realization. Based on the key ideas of these prototypes, a commercial and robust system (REFINE) has been built and is in routine daily use. REFINE is the sole tool used in its own development and has been used to develop a number of applications.

In order to give the reader a feeling for how the CHI approach to software development works, we briefly describe some of our experiences. As mentioned above, the rule compiler was an early example of self-application. Initially implemented in Lisp, it became clear that the rule compilation process could be viewed as the application of a collection of transformations. Thus, the rule compiler was rebuilt in terms of rules. Initially the control code was also expressed in Lisp, but has since been rewritten in V. The benefits of expressing the rule compiler in V rather than Lisp included a factor of 5-10 compaction in the size of the code and greater understandability and maintainability. One benefit of the self-application of the rule complier has been that, as new rules are added to the compiler, it can sometimes compile a faster version of itself. Thus, we have the phenomenon of a knowledge-based system that can actually speed up with the addition of new knowledge.

More recently a parser generator has been built. Since, the theory of parsing is well formalized at this point, the parser generator is mostly coded in procedural V, making heavy use of logical constructs and the set and mapping; data types. The code bears many similarities to the mathematical treatment of parsing given in current textbooks.

As another example, the description knowledge base (DKB) has undergone continual evolution. There are a number of constraints that DKB is expected to maintain. The logic assertion compiler is used to compile these constraints into demons and procedures for computing implicit property values as mentioned in Section III-A-I.

As a final example, there is currently a project underway to develop a project management assistant (PMA). The PMA exploits (and tests) most of the capabilities of the system. The objective is to provide tools to assist in the management of a software project. Objects and generic objects in the DKB are used to model the various entities, activities, and relationships of a software project. Logic assertions are used to model constraints on the ordering of tasks, assignment of personnel, and so on. Tools for scheduling and other highly algorithmic tasks are coded in procedural V. An interactive stepwise refinement approach is taken to the development of plans. Rules are used to encode the transformation of tasks into subtasks and the graphics capability of the Symbolics 3600 is used to facilitate interaction with the user/planner.

One particularly successful aspect of the CHI project was the development of the wide-spectrum language V. As the representation language of the knowledge base, V is used to describe conceptual models of domains, formal requirements and specifications, programs, derivation histories, transformation rules, relationships between objects, and properties of objects. V is also the query and access language of the knowledge base. As the programming language of the system, V accommodates a range of programming styles that effectively span the spectrum of current programming practice and theory. Not only can one program as in a conventional high-level language such as Pascal, Lisp, or Ada, but also one can program in terms of the very-high-level logic, data, and control structures of V, while realizing that these must be mapped down to particular implementations during the compilation process. Also, one can program in terms of assertions and constraints to be maintained dynamically in the database, thus allowing an object-oriented style of programming. Finally, one can use the system for program synthesis from input-output specifications. This takes us to the furthest reaches of current research in knowledge-based programming -- the possibility of an automated system engaging in the creative process of designing novel algorithms from descriptions of problems. Thus, the approach to knowledge-based programming environments suggested by the CHI project provides a unified framework that encompasses the past, present, and future of programming, and provides not only an important research environment and pedagogical tool, but also a practical bridge into that future.
in IEEE Transactions on Software Engineering, November 1985 view details