• Fehder, P. F., "HQL: A Set-Oriented Translaction Language for Hierarchically-structured Data Bases" view details
          in Proceedings of the 1974 ACM Annual Conference San Diego, November, 1974 view details
  • Barron, C. Housel , Nan C. Shu "A high-level data manipulation language for hierarchical data structures" pp155-169 view details Abstract: In this paper we assert that the hierarchical view of data will continue to be popular for a broad class of applications and users. In particular, some of these applications require complex data manipulation which, heretofore, has been dealt with procedurally. In this light, a nonprocedural language, CONVERT, is proposed as a high-level DBMS interface. CONVERT is meant to provide users with a tool for performing complex data manipulation and query of hierarchical data abstractions, called “Forms”. Included in the paper are a description of the Form data abstraction and the CONVERT language, as well as a complete illustrative sample application.
    Extract: Introduction
    In a previous paper [I] we presented a high-level nonprocedural language, CONVERT, which was designed for specifying translation and restructuring~for the purpose of data conversion. This paper explores the use of an enhanced version of CONVERT as a high-level user interface for data base management systems (DBMS's) for query and data manipulation. ., our studies on data translation, we found.hierarchical data structures to be the most /revalent in today's environment. Therefore, CONVERT was based on a hierarchical data abstraction referred to as a Form [I]. A Formis a two dimensional tabular representation of a hierarchical data structure and is described in depth in section 2. The CONVERT language, then, consists of a few conceptually simple operations which produce an output Form from one or more input Forms.
    Recently, there has been a great deal of interest in high~level, nonprocedural data base languages because they enable users to state their requests in a natural way without regard to low level details (e.g. access paths, etc.) of~the particular DBMS. In addition, they offer a basis for achieving greater data independence, global optimization, integrity checking, and security. However, most of these languages, for example, SEQUEL [2], SQUARE [3], DSL-ALPHA [4], and QUEL [5] to name a few, are based on the relational data model [6].
    The motivation for exploring the use of CONVERT as a DBMS user interface lies in our belief that hierarchical data models will continue to be a popular user (application) view for the indefinite future. Also, we feel that existing hierarchica~onprocedural languages lack the capability required for a number of significant applications, particularly those involving complex data manipulation for report~generation.
    Extract: Why Another Language?
    Why Another Language?
    CONVERT is certainly not the first nonprocedural language developed for hierarchical DBMS's. For example, HQL [7] is a highly user oriented query language for hierarchical structures, similarly, MRI's System 2000 [8] offers a powerful nonprocedural query and data manipulation language. However, these and other similar languages are designed to operate within a given hierarchical data base structure which has been predefined in the system by the data base administrator.
    The uniqueness of CONVERT lies in its power to specify operations over a family of hierarchies and in its ability to specify that a new hierarchical structure is to be generated from existing ones. This concept is also central tothe relational model, in which the relational operators (join, projection, etc.) may be applied to a family of base relations in order to produce a new relation which complies with a particular user view.
    Extract: CONVERT and the Different Data Models
    CONVERT and the Different Data Models
    Hierarchical systems.
    One of the main difficulties in hierarchical systems is in developing applications which require a view of the data base (application view) which radically departs from the schemas developed during data base design (native views). For example, such an application view may imply a reorganization of the data base requiring combinations of data from several hierarchies. Currently, such applications must be designed to procedurally "navigate" via access pointers through the various native data base st@uctures in order to collect the required information. As an alternative, we propose using CONVERT as a means of prescribing how the particular application view is to be derived from the native views. One may consider this prescription as resulting in a temporary file for use by an application program (e.g. report formatter), assuming no updating. Alternatively, CONVERT could be viewed as a more interactive interface in which instances of the application view are generated dynamically from the source views.
    Network Systems.
    Most of the previous discussion also applies here. The main difference is that the predefined data base schemas in the data base are not restricted to hierarchical structures; thus, CONVERT cannot be used directly. It should be mentioned that a nonprocedural interface for a network DBMS has yet to be demonstrated, although McGee's work [9] has taken a nice step in that direction. The problem, of course, is due to the complexity of the network model in that the potentially large number of access paths within a network structure makes the formalization required for nonprocedural query and data manipulation difficult. One approach for overcoming this problem is to Use a subschema definition facility for logically partitioning the network into a set Of "logical views" which reflect simpler data models (i.e. hierarchical or rel@tional). The facility of IBM's Information Management System [10] (IMS) for defining ~ogical hierarchical structures in terms of several inter-connected physical hierarchie~ is an example of this approach. Given such a facility, the network could be viewed as a family of hierarchies, thus permitting the use of CONVERT.
    Relational Systems.
    It is not possible to fully implement CONVERT on a relational system which adheres strictly to the relational model as presented by Codd [6]. This is because (a) rela£ions preclude hierarchical structure, since they must be in at least "first normal form"i (i.e. only simple domains); (b) all relations must have unique keys, thus preventing theloccurrence of duplicate tuples; and (c) there is no ordering property among tuples of a relation. It is possible to simulate the different hierarchical levels via a set of relations where each relation corresponds to a node in the hierarchy and contains among its ~ttributes the keys of all the ancestor nodes (relations) in the hierarchy. (note: th~s is a sufficient but not necessary condition). However, in CONVERT, instances Df ~a Form may be ordered and duplicates are allowed (i.e. keys need not exist).
    In the remainder of the paper, section 2 describes terms and the underlying concept of the Form data abistraction. Section 3 describes the CONVERT language, and section 4 presents a sample application which illustrates the semantics of the operators and the derivation of a newl application view from several native views. Section 5 presents the summary and concluslions. Extract: The CONVERT Language
    The CONVERT Language
    The CONVERT language presented below differs slightly from that described in [I]. Here, we conoentrateion describing the semantics of the operations. The reader is referred to [I] for further discussions about the various operations. The basis of CONVERT is a set of Forms consisting of SELECT, CASE-select, SLICE, CONSOLIDATE, GRAFT, SORT, MERGE, ELIM DUP, and a set of aggregate functions (COUNT, SUM, AVG, MAX, and MIN). These operations--require one or more Forms as input operands and produce an output Form as the result. The general format for all Form operations is Operator ( Operands )
    The input Form operands can be defined in terms of Form operations allowing operations to be nested. To save the results of a Form operation, an Assignment statement is provided having the format
    Target-Form <- Form-operation
    Target-Form may consist of a simple name which is not defined in a FORM definition (e.g. F2 <- ...), in which case all Component names are inherited from the schema defined by the Form-operation. Alternatively, a Form definition may be specified in order to assign new names. This can be specified in two ways as shown below (see Figure 1).
    The FORM statement is provided in CONVERT for declaring Form definitions (schemas), and is the only datal definition statement. Field definitions are provided elsewhere [12]. Components can be grouped under one name for referencing ~onvenience, e.g. DATE(MO,DA,YR). In defining Forms, the repeating Components such as EMPL are enclosed by parentheses. A r!epeating Field, f, is viewed as a Group with one Field, and is denoted as (f). It ;should be mentioned that the schema of the Target-Form, if specified explicitly, must be consistent with that of the resulting Form operation (except for CONSOLIDATE, discussed later).
    Now we shall describe the Form operations, appealing to thedefinitions of section 2 and also to the e~amples of the next section (reference to "Step n" pertains to the examples in the next section).
    The following canventions are'used in the descriptions given below. Brackets [..] denote an optional ~lause; alternative constructs are enclosed in braces {..} and separated by the vertical bar (e.g. { albl..}). The default alternative, if any, will be underlined. The capital letter, F, (ana FI, F2, ...) indicate~a Form specification (i.e. a Form name o~ Form operation); keywords are capitalized andother syntactical constructs are in lower case.
    A. SELECT i
    [WHERE selection-criteria ] )
    The SELECT operation can be viewed as two basic processes~ I) pruning out all (sub)sections which fail to meet the "WHERE clause" selection-criteria, and 2) mapping the (sub)sedtions w~ich remain from I) to target instances according to the Component selection list el,.$.,en. If the WHERE clause is omitted, all (sub) sections are selected. If no Component selection list is specified, all Components (columns) are selected, i
    If the ALL EXCEPT clause is absent, e1,...,en, specifies the Components to be included in the result and the order of their occurrence. An ej may be an arithmetic expression consisting of the operators +, -, /, and *, and operands which may include Fields, aggregate functions over repeating fields, and literals. All Fields used in arithmetic operations in ej must be twins. If an aggregate function and Field are referenced in the same expression~ the result of the aggregation and the referenced Field(s) must be inlone to one correspondence.

    An ej which may also consist of a Field assignment may be used to rename a selected Component or give a name to an expres~n (e.g. CMARG <- CEST-CACT).
    The ALL EXCEPT clause indicates all columns are to be mapped to the result with some exceptions. An ej may be a simple component name which indicates that the component is to be omitted from the result (see Steps 2,9,11,12). Alternatively, an ej may consist of a Field assignment which defines a result Field as an expression of (0 or more) Fields from the selected file (see Step 9).
    The selection-criteria consists of a conjunction of predicates of the form, pl AND p2 ... AND pn. A predicate pj may specify a "Field-criteria" which compares single values or "set-criteriaw" which tests sets of values. Field-criteria predicates have the form, "fl comp op f2," where comp op~s one of the comparison operators (<, =, -~, etc.), and f~ and ~2 are arithmetic e~pressions (as described for Component selection) or character strings.
    Set-criteria have the form "sq [NOT] set comp op s2" where set comp op may be IN (sl is a subset of s2), INTERSECT, or SAME AS (sl -= s2)o The set ~pera~ds sl and s2 can be specified as a list of constants, a one column Form (i.e& a Form name or operation which returns a one column Form), or a "SET-clause". The SET-clause is provided for referencing sets of values within a given section of a Form7 its syntax is given as SET ( f [ FOR EACH gp-name ] )
    f is a Field name which identifies the column containing the values of the set, and the FOR EACH clause limits the scope of reference to instances of ehe Group, gp-name. If the FOR EACH clause is absent, all values of f in a given section are referenced. With respect to Figure I, the operation
    In this example, SET(MNO FOR EACH PROJ) references threes sets in the first section, {M~ , {M8,M15}, and {M2}. If the FOR EACH clause were omitted, 3nly one set containing all values of MNO (in the section) would have been referenced.
    Now we focus on how instanceswhich fail to meet the selection-criteria are identified. For sake of brevity we will treat only the case where each predicate, pj, contains only one Component from the selected Form, F (although Fields from other Forms may occur). These semantics can be simply stated in terms of the definitions of section 2. That is, consider any Group, G(Cl,...,Cn), whose Group instance, I(G), is defined by the sequence , and each I(Cj) (j=1,...,n) is given by the sequence of instances (m at least I). Then the selection algorithm is given as follows:
    1) Mark all Component instances (fields or sets) which fail to satisfy the selection criteria.
    2) If all instances, , are marked, then mark I(Cj).
    3) If any Component instance, I(Cj), of I(G) is marked, then mark I(G).
    4) Recurse on 2), 3) until no more instances can be marked and select the unmarked instances.
    Although not presented here, a limited form of the disjunctive "OR" is permitted in the selection-criteria. Some of the conceptual difficulties of using the "OR" in an unrestricted way, With respect to hierarchical structures, have been demonstrated bv Hardgrave [13]. The desired effect of OR (union) can be accomplished with combinations of the operators SELECT, MERGE, and ELIM DUP. In general, the selection-criteria may span multiple Formslas long as thereare'sufficient equal (=) field-criteria to "tie" one Form to another (illustrated in Step 2).
    B. CASE-select
    CASE( FROM F WHEN case-expr comp_op
    ci : case-Select(1),
    c2 : case-Select(2),
    . @ ,
    cn : case-Select(n)
    [,OTHERS : case-Select(n+1)] );
    In the above construct, F is the source Form of the:CASE selection. "case-expr" is an arithmetic expression containing one or more Fields of F, and comp op is a comparison operation as given previously. Each cj (j=1,...,n) is a constant or ~ list of constants, and case-select(j) is a selection operation with the same format as the SELECT operation in A, except the "FROM F" specification is not given.
    In the SELECT o~eration (described in A), the mapping of input to output is uniform for all sections in F. The CASE-select operation allows this mapping to vary from section to section depending on the evaluation of case-expr. For each (sub)section, the tests "case-exp~ comp_op cj" (j=1,, .... ,n) are performed until "true" is returned for some j, in which case case-select(j) is chosen to map the components of that section to the output. If none of the tests return "true" and the OTHERS option is specified, case-select(n+1) is[chosen; otherwise no mapping is performed for the current section.
    Regardless of the "Case" selected, all output sections must conformto a common hierarchical structure.
    The Fields contained in case-expr determine the Components which can be included in the varying mappingls and the WHERE selection criteria. Specifically, if case-expr contains Fields of Group G, then only those Groups and Fields contained in G can participate. Other I mappings must be identical in all the case-select's.--In the previous example, the second I Field selected varies from case I to case 2. This is permitted since CEST and CACTI are both contained in the Group PROJ, which includes PJNO as one of its Field Componentls. The mapping of DNO is (and must be) invariant in both cases. By using the ALL EXCEPT option in the case-select's, only the varying mappings (ej's) need be specified expliciitly.
    C. SLICE [
    SLICE(fl,f2,...,fn FROM F)
    The objective ~f SLICE is to provide the facility for generating "flat" tables from hierarchical Form structures. In the above construct, fl,...,fn are Fields contained in F suc H that for any two fields fi and fj, the following condition holds.
    If fi (fj) is a Component of Group G, then fj (fi) must be contained in G.
    (a) Target-Form <- CONSOLIDATE(F) ;
    (b) CONSOLIDATE (F1 AS Target-Form)
    The function of CONSOLIDATE is opposite of SLICE in that it produces an output Form with more hierarchical structure than the input Form in order to remove data redundancy. The input Form, however, does not have to be a relational table. CONSOLIDATE varies from the other operations in that the schema of the output must be specified explicitly in order to define how the consolidation is to be carried out. Format (b) may be used if it is desired to nest a CONSOLIDATE within another operation. The semantics is illustrated by example.
    E. GRAFT(FI,...,Fn ONTO Fm [AT f] WIIERE match-conditions )
    GRAFT is similar to the "join" operation in the relational model. Its function is to combine two or more Forms based on the specified "match-conditions". Each Fj (j=l,...,n), referred to as the branch Forms, is combined with the root form, Fm. In terms of trees, lone can view the operation as "grafting" the HG def~n~ by the branch Forms onto the HG defined by the root Form.
    First, we will describe GRAFT with only one branch Form (n=1). The match-conditions can be stated in terms of either "Equal-conditions" or as "Prevail-conditions".
    Equal-conditions are stated as
    (i) fl = gl AND f2 = g2 ... AND fn = gn
    where fi (i=1,.,.,n) are twin Field Components of FI (i.e. Fields in the root node of the HG of FI), and gi (i= ,~...,n) are twin Field Components of Fm or of some Group contained in Fm, With Equal-conditions, each equality predicate in--the conjunct must be satisfie--~ before an output can result. If the predicates evaluate true, then the FI section is concatenated with the (sub) section of Fm and output; however, the common Fields (those uSed in the predicates) are recorded only once. The match Fields in the root Form determine the Group to be expanded. For example, the result of F3 <- GRAFT(FI ONTO F2 WHERE FI.A = F2.A); is given in FigUre 4.
    F. SORT(F BY fll [{ASCIDES} ],...,fn [{ASCIDE$}] ) fi (i=1,...,n) are Fields contained in the Form F, and specify the sort keys. If some fi is a Field Component of some Component Group, then only the subsections within the Group instances are sorted. For example, if SORT(DEPT BY DNO,ENO DES) were specified for DEPT of Figure I, then the result would be sorted by DNO, and within each DEPT section the EMP subsections would be sorted by ENO in descending order (e.g. within the section where DNO= 'DI', then the EMP subsections would be <80,8> <67,9> <43,2> <19,7>).
    G. MERGE(FI,...,Fn) Sections from the homogeneous Forms are combined into one output Form having the same structure. If MERGE is nested within another operation~ the Component names of the result are inherited from the schema of the first Form in the list (i.e. FI). In general, the order of the output is unpredictable.
    H. ELIM DUP (F) This operation copies the sections of F to the output with duplicates eliminated.
    I. Aggregate Functions
    function (f [FROM F] [FOR EACH gp-name] ) "function" is one of the functions SUM, ~L~X, MIN, COUNT or AVG. "f" is the Field to be aggregated within the Form F, and gp-name is a Group name (possibly qualified) which defines the scope of aggregation.
    As shown in previous examples and in section 4, these functions can be used in arithmetic expressions within SELECT lists and also in the selection criteria. In that context, the FROM clause may be omitted, since the Form name is specified in the FROM clause of the SELECT. However, aggregate functions can also be used as self-contained Form operations. If the FOR EACH clause is missing, f is applied to all instances of f in F, returning a scalar (i.e. a one column, one row Form). For example, in Figure I, SUM(UF FROM DEPT) would result in 3.5. If the FOR F~C}! clause is specified, a vector is returned. The operation SUM(UF FROM DEPT FOR EACH DEPT.PROJ) would return a vector <.2,.8,.4,.7,.9,.5>, while SUM(UF IN DEPT FOR EACH DEPT) would yield <1.4, 2.1>.
    More CONVERT Statements
    Up to this point we have really discussed only the CONVERT statements for the purpose of data restructuring. Additionally, in the context of a DBMS interface, we need statements for updating and outputting existing Forms.
    J. ADD ( FI TO gp-name [IN F2 WHERE add-crit] ) ;
    The function of this statement is to add sections to F2 or subsections to some Group "gp-name" contained in F2. The Form FI contains the (sub) sections to be added. If gp-name equals F2 then sections are to be added to F2 from FI and the "IN-WHERE" clause is omitted (e.g. ADD(NEWDEPTS TO DEPTS);). The "add-crit" is a conjunction of equal predicates. The schema of FI must be a subschema of F2. Specifically, FI must contain gp-name as one of its Components. The subsections of gp-name in FI will be added to Component instances of gp-name in F2 depending on the add-crit. Components other than gp-name in FI are used in the add-crit to locate the specific component instance of gp-name in F2 being updated. For example, suppose we want to add employees to departments 'DI' and 'D3' (Figure I), where FI is defined as
    K. DELETE ( gp-nam e [IN F] [WHERE delete-crit] ) ; The function of !this statement is to cause deletion of sections in F or subsections of a Group "gp-name" I contained in F subject to certain criteria, "delete-crit". If gp-name equals F (the Form name), then the IN phrase may be omitted. If the WHERE clause is omitted, all the !instances of gp-name are deleted. The delete-crit has the same form as field-crit for SELECT. We illustrate with several examples with reference to Figure I.
    L. Updating an Existing Form "in Place" Update-in-place can be performed on an existing Form by utilizing the SELECT and CASE operations following the keyword "UPDATE". In updatingt however, the ALL EXCEPT version of the SELECT's must be used, since Component selection has no meaning. For example, the request~ "increase DBUDG for DEPT for department DI" could be stated as
    N Output Statement@ {DISPLAY IPRIN~} F ; The DISPLAY and pRINT operators designate on-line and remote printer, respectively. A predefined format for the Forms is assumed. In the above statement, F may be either a Form name or a Form operation.
    This concludes the description of the CONVERT language. Extract: Summary and Conclusions
    Summary and Conclusions
    In this paper we assert that the hierarchical view of data will continue to be popular for a broad class of applications and users. In particular, some of these applications require complex data manipulation which, heretofore, has been dealt with procedurally. In this light, a nonprocedural language, CONVERT, is proposed as a high-level DBMS interface. CO}~ERT is meant to provide users with a tool for performing complex data manipulation and query of hierarchical dataabstractions, called "Forms". In section 2, we present the definitions and terms relevant to the Form data model. Section 3 describes the CONVERT language in some detail and gives a number of examples. A previous version of CONVERT was described in the context of data translation [I]. Section 3 gives further semantic detail and describes additional features required due to the change of emphasis, and language revisions resulting from continued developement. Section 4 gives a nontrivial application scenario, and demonstrates the data restructuring facilities of CONVERT by presenting a program which derives a user report from the "native" data base.
    The "Form" data model presents a general but conceptually simple view of hierarchical data structures. Similarly, we believe the basic Form-operations while providing powerful manipulation capabilities, are at the same time, conceptually easy to understand. Herein, lies the attractiveness of the language. Currently, we are implementing CONVERT as a data translator and later hope to test the language on real users.
          in SIGPLAN Notices 11(02) February 1976 also Proceedings of the SIGPLAN '76 Conference on Data: Abstraction, Definition and Structure, Salt Lake City, Utah, USA, March 22-24, 1976 view details