Textual Dataflow Language
for Textual Data-Flow Language
MIT LCS experimental dfl
In this thesis we are concerned with issues arising from the need to achieve concurrency of operation within a computation on a large scale. Several factors contribute toward increasing interest in systems capable of exploiting the concurrency of computation. Concurrency provides the potential for performance improvement through concurrent operation of hardware components such as processors and memory modules. This results in better utilization of total resources and in faster response if a computation has a high level of concurrency. The dramatic progress of technology has made concurrent systems more attractive as an alternative for high performance systems. In particular, systems that have many replicated hardware modules can take advantage of the projected potential of the processing capability of a single chip device which can be very economically produced. Such systems may further offer better fault-tolerance capability and extendibility of system performance.
So far, concurrent programming has not been adequately dealt with in conventional programming languages. It is our belief that future systems must depart from the prevalent view of sequential computation both at the programming language level and at the machine organization level if a substantial progress is to be made toward practical large concurrent systems.
The goal of this thesis is to demonstrate that an adequate computation model can provide a basis both for a good programming language and for an architecture that can fully exploit the inherent concurrency in algorithms expressed in the language. To this end, we show how a value-oriented language can be implemented based on a model of concurrent computation known as dataflow schemas and how this implementation can guide the design of an architecture that achieves a high level of concurrent operations.
The model of computation is based on the notion of data driven computation, in the sense that an operation in a computation is executed as soon as all of the required operands become available. Thus, there is no notion of sequential control of execution. Data flow schemas allow many concurrent subcomputations to take place without creating side-effects. The lack of side-effects is essential for several reasons. First, the existence of side-effects among concurrent processes may cause the outcome of the computation to be dependent on the order in which the processes are executed -- that is, the computation is nondeterminate. In most applications, it is desirable to achieve concurrent operation while preserving the uniqueness of the result of the computation. From the semantic point of view, a language that is free of side-effects is easily formalized using denotational semantics. Furthermore, when a computation is expressed in a side-effect-free language, concurrency in the computation is easily recognized as subcomputations which do not depend on results of other subcomputations - and this data dependency is manifest in the program structure.
We introduce a simple value-oriented language that has two important features: streams which are sequences of values communicated between computations and forall constructs in which one can express concurrent operations on components of data structures. A computation expressed in this language is guaranteed determinate unless explicit forms of nondeterminacy are used. In this thesis, we consider a (united form of nondeterminacy that merges two sequences of values in a nondeterminate manner. We discuss limitations of the language in Section 1.4.
The architecture presented in this thesis is based on a form of data flow processor proposed by Dennis and Misunas. We show how the language can be effectively implemented on this architecture such that concurrency of a computation can be exploited. The main extension includes suggestions for the design of the storage of a targe number of activations of procedures and data structures such that contentions in accessing data structures can be alleviated.
Extract: Rationale for TDFL
Data Flow Languages
Because the data flow model is graphical in nature, numerous studies have attempted to define textual programming languages based on these models. While it is possible to define an algorithm that transforms programs written in existing sequential programming, languages into data flow schemas, such an algorithm is complex because of the semantics of the sequential programming languages. Furthermore, the inherent concurrency of a computation is often hidden from the translator because there are additional constraints that are built-in in the expressiveness of sequential programming languages. We believe that high level data flow programming languages will allow algorithms for concurrent computation to be easily expressible.
Programming languages based on the data flow concept are sufficiently expressive to encompass conventional programming language constructs such as iterations, while-loops, conditionals, procedures, and data types such as date structures and procedure values. These constructs, however, are embedded in a semantics which is free of both side-effects and the sequential control of execution. The distinctive tack of control transfer primitives such as GOTO's and operations which introduce side-effect allows compilers to easily detect data dependencies between operations in a program. Languages with these characteristics have been shown to have simple denotational formal semantics.
Additional features such as forall constructs, primitives for stream values, and constructs for nondeterminate computations are found to be natural extensions to these languages. The forall constructs allow programmers to specify concurrent operations on all components of a data structure. The notion of stream provides an alternative to the use of coroutines and synchronization primitives for expressing computations passing sequences of values among their component modules.
A very important characteristic of these languages is that the determinacy of a computation is guaranteed when the computation is expressed not using primitives or features explicitly provided for situations where non-determinacy is required. In conventional languages, nondeterminate computations are expressed using semaphore primitives, call and wait primitives, and monitors. The semantics of these primitives, however, are not consistent with the semantics of data flow languages. Because there are significant applications where nondeterminacy is necessary, the formal semantics of languages with non-determinate primitives is still an active area of research. In this thesis, we have chosen a very primitive form of nondeterminacy which seems essential as a basis for higher level constructs for nondeterminate computations.
The possibility of including streams in our basic data flow model was considered in Chapter 14 when we discussed implementation strategies. Weng discusses the possibility of adding streams to Dennis's basic model in his report [Wen75]. He suggests that streams of tokens be allowed to flow along arcs in a data flow graph. In order to control these streams additional start of stream tokens are required. End of
stream tokens are only required if the number of items in a stream is unknown when the first entries in the stream are used, This is often the case as far as input and output arc concerned.
Extract: MIT Dataflow systems: Dataflow, DBFL, Val
The major group in America is that led by Jack Dennis at MIT, and many of the other groups in the USA have used the MIT work as a basis. Jack Dennis's group are currently partially funded by the Lawrence Livermore Laboratory, and have built a simple version of their data flow machine. [Dataflow] Work is also in progress in the field of programming language design [Val], and in application areas. Kosinski has also worked at MIT looking into the formal semantics of data flow programming languages [DFPL] Extract: Id languages
Another large data flow research project in America was the one at the University of California at Irvine [Id].
They originally took Dennis?s graphical notation as a base language, and designed a high-level language which could be compiled into the MIT notation. Since then they have made many changes to the base language - adding and deleting features to fit more closely the view of data flow that they developed - so that it is now significantly different to Dennis's. Various designs for machine architectures have been produced, and work has been done on simulating them. As far as I am aware there is as yet no intention to build an actual machine. Around 1980 both the main researchers, Arvind and Gostelow, left Irvine, but the project is continuing on a smaller scale with Lubomir Bic being the main faculty member involved. Arvind moved to MIT, and is pursuing his research separately from Jack Dennis there, though obviously there is some interaction between the groups. A report on Arvind's ideas for a data flow computer was given in a paper by Arvind and Kathail in 1981.
The original motivation for research into dataflow was the exploitation of massive parallelism. Therefore, much work was done to develop ways to program parallel processors. However, one school of thought held that conventional "von Neumann" processors were inherently unsuitable for the exploitation of parallelism [Dennis and Misunas 1975; Weng 1975]. The two major criticisms that were leveled at von Neumann hardware were directed at its global program counter and global updatable memory [Silc et al. 1998],
both of which had become bottlenecks [Ackerman 1982; Backus 1978]. The alternative proposal was the dataflow architecture [Davis 1978; Dennis and Misunas 1975; Weng 1975], which avoids both of these bottlenecks by using only local memory and by executing instructions as soon as their operands become available. The name dataflow comes from the conceptual notion that a program in a dataflow computer is a directed graph and that data flows between instructions, along its arcs [Arvind and Culler 1986; Davis and Keller 1982; Dennis 1974; Dennis and Misunas 1975]. Dataflow hardware architectures
looked promising [Arvind and Culler 1986; Dennis 1980; Treleaven and Lima 1984; Veen 1986], and a number of physical implementations were constructed and studied (for examples, see Davis , Keller , Papadopoulos , Sakai et al. , and Treleaven et al. ).
Faced with hardware advances, researchers found problems in compiling conventional imperative programming languages to run on dataflow hardware, particularly those associated with side effects and locality [Ackerman 1982; Arvind et al. 1977; Arvind and Culler 1986; Kosinski 1973; Wail and Abramson 1995; Weng 1975; Whiting and Pascoe 1994]. They found that by restricting certain aspects of these languages, such as assignments, they could create languages [Ackerman 1982; Ashcroft and Wadge 1977; Dennis 1974; Hankin and Glaser 1981; Kosinski 1978] that more naturally fitted the dataflow architecture and could thus run much more efficiently on it. These are the so-called dataflow programming languages [Ackerman 1982; Whiting and Pascoe 1994] that developed distinct properties and programming styles as a consequence of the fact that they were compiled into dataflow graphs-the "machine language" of dataflow computers.
The often-expressed view in the 1970s and early 1980s that this form of dataflow architecture would take over from von Neumann concepts [Arvind et al. 1977; Treleaven et al. 1982; Treleaven and Lima 1984] never materialized [Veen 1986]. It was realized that the parallelism used in dataflow architectures operated at too fine a grain and that better performance could be obtained through hybrid von Neumann dataflow architectures. Many of these architectures [Bic 1990] took advantage of more coarse-grained parallelism where a number of dataflow instructions were grouped and executed in sequence. These sets of instructions are, nevertheless, executed under the rules of the dataflow execution model and thus retain all the benefits of that approach. Most dataflow architecture efforts being pursued today are a form of hybrid
[Iannucci 1988; Nikhil and Arvind 1989], although not all, for example, Verdoscia andVaccaro .
The 1990s saw a growth in the field of dataflow visual programming languages (DFVPLs) [Auguston and Delgado 1997; Baroth and Hartsough 1995; Bernini and Mosconi 1994; Ghittori et al. 1998; Green and Petre 1996; Harvey and Morris 1993, 1996; Hils 1992; Iwata and Terada 1995; Morrison 1994; Mosconi and Porta 2000; Serot et al. 1995; Shizuki et al. 2000; Shurr 1997; Whiting and Pascoe 1994; Whitley 1997]. Some of these, such as LabView and Prograph were primarily driven by industry, and the former has become a successful commercial product that is still used today. Other languages, such as NL [Harvey and Morris 1996], were created for research. All have software engineering as their primary motivation, whereas dataflow programming was traditionally concerned with the exploitation of parallelism. The latter remains an important consideration, but many DFVPLs are no longer primarily concerned with it. Experience has shown that many key advantages of DFVPLs lie with the software development lifecycle [Baroth and Hartsough 1995].
This article traces the development of dataflow programming through to the present. It begins with a discussion of the dataflow execution model, including a brief overview of dataflow hardware. Insofar as this research led to the development of dataflow programming languages, a brief historical analysis of these is presented. The features that define traditional, textual dataflow languages are discussed, along with examples of languages in this category. The more recent trend toward large-grained dataflow is presented next. Developments in the field of dataflow programming languages in the 1990s are then discussed, with an emphasis on DFVPLs. As the environment is key to the success of a DFVPL, a discussion of the issues involved in development environments is also presented, after which four examples of open issues in dataflow programming are presented.
Extract: Early Dataflow Programming Languages
3. Early Dataflow Programming Languages
3.1. The Development of Dataflow Languages
With the development of dataflow hardware came the equally challenging problem of how to program these machines. Because they were scheduled by data dependencies, it was clear that the programming language must expose these dependencies. However, the data dependencies in each class of language can be exploited to different degrees, and the amount of parallelism that can be implicitly or explicitly specified also differs. Therefore, the search began for a suitable paradigm to program dataflow computers and a suitable compiler to generate the graphs [Arvind et al. 1988]. Various paradigms were tried, including imperative, logical, and functional methods. Eventually, the majority consensus settled on a specific
type of functional language that became known as dataflow languages.
An important clarification must be made at this stage. In early publications, dataflow graphs are often used to illustrate programs. In many cases, these graphs are simply representations of the compiled code [Dennis and Misunas 1975] that would be executed on the machine, where the graph was generated either by hand or by a compiler from a third-generation programming language. Until the advent of Dataflow Visual Programming Languages in the 1980s and 1990s, it was rarely the intention of researchers that developers should generate these graphs directly. Therefore these early graphs are not to be thought of as "dataflow programming languages."
3.1.1. What Constitutes a Dataflow Programming Language ?. While dataflow programs can be expressed graphically, most of the languages designed to operate on dataflow machines were not graphical. There are two reasons for this. First, at the low level of detail that early dataflow machines required, it became tedious to graphically specify constructs such as loops and data structures which could be expressed more simply in textual languages [Whiting and Pascoe 1994]. Second, and perhaps more importantly, the hardware for displaying graphics was not available until relatively recently, stifling any attempts to develop graphical dataflow systems. Therefore, traditional dataflow languages are primarily text-based.
One of the problems in defining exactly what constitutes a dataflow language is that there is an overlap with other classes of language. For example, the use of dataflow programming languages is not limited to dataflow machines. In the same way, some languages, not designed specifically for dataflow, have subsequently been found to be quite effective for this use (e.g., Ashcroft and Wadge ; Wadge and Ashcroft ). Therefore, the boundary for what constitutes a dataflow language is somewhat blurred. Nevertheless, there are some core features that would
appear to be essential to any dataflow language. The best list of features that constitute a dataflow language was put forward by Ackerman  and reiterated by Whiting and Pascoe  and Wail and Abramson . This list includes the following:
(1) freedom from side effects,
(2) locality of effect,
(3) data dependencies equivalent to scheduling,
(4) single assignment of variables,
(5) an unusual notation for iterations due to features 1 and 4,
(6) lack of history sensitivity in procedures.
Because scheduling is determined from data dependencies, it is important that the value of variables do not change between their definition and their use. The only way to guarantee this is to disallow the reassignment of variables once their value has been assigned. Therefore, variables in dataflow languages almost universally obey the single-assignment rule. This means that they can be regarded as values, rather than variables, which gives them a strong flavor of functional programming. The implication of the single-assignment rule is that the compiler can represent each value as one or more arcs in the resultant dataflow graph, going from the instruction that assigns the value to each instruction that uses that value.
An important consequence of the single-assignment rule is that the order of statements in a dataflow language is not important. Provided there are no circular references, the definitions of each value, or variable, can be placed in any order in the program. The order of statements becomes important only when a loop is being defined. In dataflow languages, loops are usually provided with an imperative syntax, but the single-assignment rule is preserved by using a keyword such as next to define the value of the variable on the next iteration [Ashcroft and Wadge 1977]. A few dataflow languages offer recursion instead of loops [Weng 1975].
Freedom from side effects is also essential if data dependencies are to determine scheduling. Most languages that avoid side effects do so by disallowing global variables and introducing scope rules. However, in order to ensure the validity of data dependencies, a dataflow program does not even permit a function to modify its own parameters. All of this can be avoided by the single-assignment rule. However, problems arise with this strategy when data structures are being dealt with. For example, how can an array be manipulated if only one assignment can ever be made to it? Theoretically, this problem is dealt with by conceptually viewing each modification of an array as the creation of a new copy of the array, with the given element modified. This issue is dealt with in more detail in Section 6.3.
It is clear from the above discussion that dataflow languages are almost invariably functional. They have applicative semantics, are free from side effects, are determinate in most cases, and lack history sensitivity. This does not mean that dataflow and functional languages are equivalent. It is possible to write certain convoluted programs in the functional language Lucid [Ashcroft and Wadge 1977], which cannot be implemented as a dataflow graph [Ashcroft and Wadge 1980]. At the same time, much of the syntax of dataflow languages, such as loops, has been borrowed from imperative languages. Thus it seems that dataflow languages are essentially functional languages with an imperative syntax [Wail and Abramson 1995].
3.1.2. Dataflow Languages. A number of textual dataflow languages, or functional languages that can be used with dataflow, have been implemented. A representative sample is discussed below. (Whiting and Pascoe  presented a fuller review of these languages.) Dataflow Visual Programming Languages are discussed in detail in Section 5.