STREMA(ID:5970/str012)

conversational graphic language for application processes based on streams 


Dataflow language featuring conversational graphic language for application processes based on streams

IBM Peterlee, Durham, UK


Structures:
Related languages
ISBL => STREMA   Influence

References:
  • Clark, I.A. "STREMA: A Graphic Language for Relational Applications", IBM UK, Technical Report, UKSC 0084, October 1976 view details
  • A Parkin and RB Coats "EDSIM - event based discrete simulation using general purpose languages such as FORTRAN" view details Abstract: A conversational graphic language for application processes is described. It is based on a concept of a stream as the prime building block, which offers a unified treatment of I/O files, subroutines and data base relations. An application process is designed conversationally by linking components belonging to difficult streams. Components are roughly analogous to registers in which each item of flowing data resides whilst it is passing through a given stream. The designer's problem of making provision for handling errors which arise in the generation and transfer of data by I/O and processing subroutines is solved by introducing the construct of constraints. External link: Online copy
          in The Computer Journal 21(2) May 1978 view details
  • Clark, I. A. "STREMA: Specifying Application Processes Using Streams" pp25-30 view details Abstract: A conversational graphic language for application processes is described. It is based on a concept of a stream as the prime building block, which offers a unified treatment of I/O files, subroutines
    and data base relations. An application process is designed conversationally by linking components belonging to different streams. Components are roughly analogous to registers in which each item of flowing data resides whilst it is passing through a given stream. The designer's problem of making provision for handling errors which arise in the generation and transfer of data by I/O and
    processing subroutines is solved by introducing the construct of constraints. Extract: Introduction
    Introduction
    STREMA is a graphic conversational language for specifying and running application processes. The name is a contraction of 'stream-schema', which alludes to the schematic representation of an entire application process as a collection of interconnected streams.
    We concentrate here on how STREMA caters for commercial batch applications, to which it is particularly. suited. Commercial data processing involves the manipulation of collections of records, e.g. files. Such processes can be described in two ways: by file, i.e. describing the operations on a given file, or by transaction, i.e. describing the operations necessary to complete a given transaction. Documentation systems usually try to combine these views, for example TAG (Time Automated Grid).
    We aim to encourage a design method whereby the computer helps present these different aspects of the developing design. This would be useful in financial file handling, e.g. insurance, or in order-entry and stock control.
    The chief objective of STREMA is to produce cheaply application software having the following properties:
    (a) easy to probe or audit
    (b) easy to perform ad hoc modification (e.g. extract new reports) without introducing 'bugs'
    (c) easy to understand its normal behaviour
    (d) easy to discover, and adjust, its exceptional behaviour
    (e) easy to interface with a conventional file or subroutine
    (f) easy to interface with a separately developed application.
    STREMA provides a model of the application process which can be altered easily and rapidly. Satisfactory casual use by senior personnel such as inventory controllers, accountants, credit control managers and auditors is a requirement that can and should be met. Without pretending that the ideal is ultimately for them to do without an expert to build their application, their ability to 'play' with the model by themselves, even if rarely exercised, is a prerequisite to achieving the level of understanding to make sensible contributions to the design process they are responsible for initiating. Two assumptions about the technology which STREMA exploits deserve comment.
    Extract: The meaning of the term 'stream'
    The meaning of the term 'stream'
    Since the basic building block of a STREMA process is called a stream, it is important to clarify the concept. It may usefully be compared with the 'stream' of Burge (1975) and Stoy and Strachey (1972), which inputs data to an arbitrary process one item at a time, and also with the 'random number stream' familiar in digital simulations. What earns such processes the title 'stream' instead of 'list' is that no physical list may actually exist. The act of picking off the current item may trigger the production of the next item, and so forth. However, besides enumerating some logical, and passibly unending, list, STREMA streams can directly drive each other, collating their data. Burge's streams simply feed into a program, appearing to the program like a sequential file, although the records that flow are never actually accumulated into a physical file. In Fig. l(a) we have the simplest notion of a stream, called INPUT, as the source of data to the process PROC. INPUT can be visualised as a section of pipeline. Where INPUT gets its original data from is of no concern at the moment. This is a matter of its definition. We are concerned with how INPUT is being used currently, i.e. how it is to deliver the data it somehow generates within itself.

          in The Computer Journal 21(1) February 1978 view details
  • Morrison, J.Paul "Flow-Based Programming: A New Approach to Application Development" van Nostrand Reinhold, 1994 view details Extract:
    It is described as a graphic conversational language for specifying and running application processes. STREMA uses a relational model and is intended to allow relational data to be treated in a uniform manner with flat files and subroutines. In STREMA, all these are made available to the programmer as "streams", which resemble most closely the processes of FBP. You can specify graphically how "streams" are connected, and what happens to the fields in the records travelling through them - Clark uses the term "component" to describe what a field resides in as it is in transit through a given stream (not to be confused with FBP "components"). Streams drive each other, are described by a "relator", and may be subject to constraints on their components. Components (fields) have values, but they also have status: one of UNDEFINED, VALID or INVALID (similar to DFDM's dynamic attributes). As a record enters a stream, what happens is determined by the stream's "relator", and the constraints on, and status of, the incoming components. Constraints may be such things as bounds on a value, type specifications, or forcing a value not to repeat nor descend in a run. This concept can support processes as diverse as applying subroutines to streams, collating data streams, or getting data from or writing data to a relational table. Combining the concepts of relators and constraints simplifies a lot of the logic conventional programs have to do validating fields and deciding what to do if things go wrong. Clark has done a good job of combining a number of useful concepts into a single framework.
          in The Computer Journal 21(1) February 1978 view details
  • Johnston, Wesley M.; Hanna, J. R. Paul and Richard J. Millar "Advances in Dataflow Programming Languages" ACM CSUR 36(1) March 2004 view details Extract: Introduction
    Introduction
    The original motivation for research into dataflow was the exploitation of massive parallelism. Therefore, much work was done to develop ways to program parallel processors. However, one school of thought held that conventional "von Neumann" processors were inherently unsuitable for the exploitation of parallelism [Dennis and Misunas 1975; Weng 1975]. The two major criticisms that were leveled at von Neumann hardware were directed at its global program counter and global updatable memory [Silc et al. 1998],
    both of which had become bottlenecks [Ackerman 1982; Backus 1978]. The alternative proposal was the dataflow architecture [Davis 1978; Dennis and Misunas 1975; Weng 1975], which avoids both of these bottlenecks by using only local memory and by executing instructions as soon as their operands become available. The name dataflow comes from the conceptual notion that a program in a dataflow computer is a directed graph and that data flows between instructions, along its arcs [Arvind and Culler 1986; Davis and Keller 1982; Dennis 1974; Dennis and Misunas 1975]. Dataflow hardware architectures
    looked promising [Arvind and Culler 1986; Dennis 1980; Treleaven and Lima 1984; Veen 1986], and a number of physical implementations were constructed and studied (for examples, see Davis [1978], Keller [1985], Papadopoulos [1988], Sakai et al. [1989], and Treleaven et al. [1982]).
    Faced with hardware advances, researchers found problems in compiling conventional imperative programming languages to run on dataflow hardware, particularly those associated with side effects and locality [Ackerman 1982; Arvind et al. 1977; Arvind and Culler 1986; Kosinski 1973; Wail and Abramson 1995; Weng 1975; Whiting and Pascoe 1994]. They found that by restricting certain aspects of these languages, such as assignments, they could create languages [Ackerman 1982; Ashcroft and Wadge 1977; Dennis 1974; Hankin and Glaser 1981; Kosinski 1978] that more naturally fitted the dataflow architecture and could thus run much more efficiently on it. These are the so-called dataflow programming languages [Ackerman 1982; Whiting and Pascoe 1994] that developed distinct properties and programming styles as a consequence of the fact that they were compiled into dataflow graphs-the "machine language" of dataflow computers.
    The often-expressed view in the 1970s and early 1980s that this form of dataflow architecture would take over from von Neumann concepts [Arvind et al. 1977; Treleaven et al. 1982; Treleaven and Lima 1984] never materialized [Veen 1986]. It was realized that the parallelism used in dataflow architectures operated at too fine a grain and that better performance could be obtained through hybrid von Neumann dataflow architectures. Many of these architectures [Bic 1990] took advantage of more coarse-grained parallelism where a number of dataflow instructions were grouped and executed in sequence. These sets of instructions are, nevertheless, executed under the rules of the dataflow execution model and thus retain all the benefits of that approach. Most dataflow architecture efforts being pursued today are a form of hybrid
    [Iannucci 1988; Nikhil and Arvind 1989], although not all, for example, Verdoscia andVaccaro [1998].
    The 1990s saw a growth in the field of dataflow visual programming languages (DFVPLs) [Auguston and Delgado 1997; Baroth and Hartsough 1995; Bernini and Mosconi 1994; Ghittori et al. 1998; Green and Petre 1996; Harvey and Morris 1993, 1996; Hils 1992; Iwata and Terada 1995; Morrison 1994; Mosconi and Porta 2000; Serot et al. 1995; Shizuki et al. 2000; Shurr 1997; Whiting and Pascoe 1994; Whitley 1997]. Some of these, such as LabView and Prograph were primarily driven by industry, and the former has become a successful commercial product that is still used today. Other languages, such as NL [Harvey and Morris 1996], were created for research. All have software engineering as their primary motivation, whereas dataflow programming was traditionally concerned with the exploitation of parallelism. The latter remains an important consideration, but many DFVPLs are no longer primarily concerned with it. Experience has shown that many key advantages of DFVPLs lie with the software development lifecycle [Baroth and Hartsough 1995].
    This article traces the development of dataflow programming through to the present. It begins with a discussion of the dataflow execution model, including a brief overview of dataflow hardware. Insofar as this research led to the development of dataflow programming languages, a brief historical analysis of these is presented. The features that define traditional, textual dataflow languages are discussed, along with examples of languages in this category. The more recent trend toward large-grained dataflow is presented next. Developments in the field of dataflow programming languages in the 1990s are then discussed, with an emphasis on DFVPLs. As the environment is key to the success of a DFVPL, a discussion of the issues involved in development environments is also presented, after which four examples of open issues in dataflow programming are presented.
    Extract: Early Dataflow Programming Languages
    3. Early Dataflow Programming Languages
    3.1. The Development of Dataflow Languages
    With the development of dataflow hardware came the equally challenging problem of how to program these machines. Because they were scheduled by data dependencies, it was clear that the programming language must expose these dependencies. However, the data dependencies in each class of language can be exploited to different degrees, and the amount of parallelism that can be implicitly or explicitly specified also differs. Therefore, the search began for a suitable paradigm to program dataflow computers and a suitable compiler to generate the graphs [Arvind et al. 1988]. Various paradigms were tried, including imperative, logical, and functional methods. Eventually, the majority consensus settled on a specific
    type of functional language that became known as dataflow languages.
    An important clarification must be made at this stage. In early publications, dataflow graphs are often used to illustrate programs. In many cases, these graphs are simply representations of the compiled code [Dennis and Misunas 1975] that would be executed on the machine, where the graph was generated either by hand or by a compiler from a third-generation programming language. Until the advent of Dataflow Visual Programming Languages in the 1980s and 1990s, it was rarely the intention of researchers that developers should generate these graphs directly. Therefore these early graphs are not to be thought of as "dataflow programming languages."
    3.1.1. What Constitutes a Dataflow Programming Language ?. While dataflow programs can be expressed graphically, most of the languages designed to operate on dataflow machines were not graphical. There are two reasons for this. First, at the low level of detail that early dataflow machines required, it became tedious to graphically specify constructs such as loops and data structures which could be expressed more simply in textual languages [Whiting and Pascoe 1994]. Second, and perhaps more importantly, the hardware for displaying graphics was not available until relatively recently, stifling any attempts to develop graphical dataflow systems. Therefore, traditional dataflow languages are primarily text-based.
    One of the problems in defining exactly what constitutes a dataflow language is that there is an overlap with other classes of language. For example, the use of dataflow programming languages is not limited to dataflow machines. In the same way, some languages, not designed specifically for dataflow, have subsequently been found to be quite effective for this use (e.g., Ashcroft and Wadge [1977]; Wadge and Ashcroft [1985]). Therefore, the boundary for what constitutes a dataflow language is somewhat blurred. Nevertheless, there are some core features that would
    appear to be essential to any dataflow language. The best list of features that constitute a dataflow language was put forward by Ackerman [1982] and reiterated by Whiting and Pascoe [1994] and Wail and Abramson [1995]. This list includes the following:
    (1) freedom from side effects,
    (2) locality of effect,
    (3) data dependencies equivalent to scheduling,
    (4) single assignment of variables,
    (5) an unusual notation for iterations due to features 1 and 4,
    (6) lack of history sensitivity in procedures.
    Because scheduling is determined from data dependencies, it is important that the value of variables do not change between their definition and their use. The only way to guarantee this is to disallow the reassignment of variables once their value has been assigned. Therefore, variables in dataflow languages almost universally obey the single-assignment rule. This means that they can be regarded as values, rather than variables, which gives them a strong flavor of functional programming. The implication of the single-assignment rule is that the compiler can represent each value as one or more arcs in the resultant dataflow graph, going from the instruction that assigns the value to each instruction that uses that value.
    An important consequence of the single-assignment rule is that the order of statements in a dataflow language is not important. Provided there are no circular references, the definitions of each value, or variable, can be placed in any order in the program. The order of statements becomes important only when a loop is being defined. In dataflow languages, loops are usually provided with an imperative syntax, but the single-assignment rule is preserved by using a keyword such as next to define the value of the variable on the next iteration [Ashcroft and Wadge 1977]. A few dataflow languages offer recursion instead of loops [Weng 1975].
    Freedom from side effects is also essential if data dependencies are to determine scheduling. Most languages that avoid side effects do so by disallowing global variables and introducing scope rules. However, in order to ensure the validity of data dependencies, a dataflow program does not even permit a function to modify its own parameters. All of this can be avoided by the single-assignment rule. However, problems arise with this strategy when data structures are being dealt with. For example, how can an array be manipulated if only one assignment can ever be made to it? Theoretically, this problem is dealt with by conceptually viewing each modification of an array as the creation of a new copy of the array, with the given element modified. This issue is dealt with in more detail in Section 6.3.
    It is clear from the above discussion that dataflow languages are almost invariably functional. They have applicative semantics, are free from side effects, are determinate in most cases, and lack history sensitivity. This does not mean that dataflow and functional languages are equivalent. It is possible to write certain convoluted programs in the functional language Lucid [Ashcroft and Wadge 1977], which cannot be implemented as a dataflow graph [Ashcroft and Wadge 1980]. At the same time, much of the syntax of dataflow languages, such as loops, has been borrowed from imperative languages. Thus it seems that dataflow languages are essentially functional languages with an imperative syntax [Wail and Abramson 1995].
    3.1.2. Dataflow Languages. A number of textual dataflow languages, or functional languages that can be used with dataflow, have been implemented. A representative sample is discussed below. (Whiting and Pascoe [1994] presented a fuller review of these languages.) Dataflow Visual Programming Languages are discussed in detail in Section 5.

    Extract: STREMA
    Morrison's [1994] flow-based programming concept, while it does not strictly obey the rules of dataflow, describes a system where nodes are built in arbitrary programming languages which the programmer arranges using a single network editing environment. Morrison [1994] reported empirical evidence that appears to support his assertion that this method is practical in real-world situations.
          in The Computer Journal 21(1) February 1978 view details