EPL(ID:2012/epl004)

Equational language for parallel scientific applications

Country: us
- languages for us
- us/1991
Began: 1991
Published: 1991
- 1991
Type:Algebraic
- Algebraic
Sammet:NSC
- NSC
- NSC/1991
- NSC/us

Equational Programming Language.

Szymanski, RPI. Equational language for parallel scientific applications

Structures:

Function

References:

McKenney, B. "An EPL scheduler for the Connection Machine" Master's thesis. Dept. of Computer Science, Rensselaer Polytechnic Institute, 1990 view details

Sinharoy, B., AND Ssymanaski, B. K. "Data alignment in SIMD machines" Tech. Rep. 91-9, Rensselaer Polytechnic Institute. Dept. of Computer Science, 1991. view details

Szymanski B. "EPL - Parallel Programming with Recurrent Equations" view details
in Szymanski, B. (ed.) "Parallel Functional Languages and Compilers", Addison-Wesley, 1991 view details

Bruce McKenney and Boleslaw Szymanski. "Generating parallel code for SMD machines" pp59-73 view details Abstract: Massively parallel SIMD machines rely on data parallelism usually achieved by a careful hand coding to support program efficiency. This paper describes parallelization of code generated for SIMD machines by the compiler for the Equational Programming Language, EPL. The language supports architecture-independent scientific programming by recurrent equations. The EPL compiler serves as a programming aid for users of parallel machines by automating data partitioning and computation parallelization based on inherent data dependencies. In support of a Connection Machine architecture, the EPL compiler performs horizontal partitioning of the program, a process that selects a dimension of each data structure to be projected along the processor array. Each processor then holds a single instance of that structure and operations along the projected dimension are done in parallel. The paper describes horizontal partitioning, code generation in MPL and efficiency of programs generated for Maspar SIMD machine. Abstract: The compiler for the Equational Programming Language EPL provides for translation, correctness checking, scheduling and code generation for EPL specifications. This paper presents the part of the EPL compiler that is responsible for horizontal partitioning?that is for code scheduling and optimization that implements data parallelism suitable for execution on a Connection Machine.
The Equational Programming Language, abbreviated tional language designed for programming parallel EPL, is a simple functional scientific computation. THe language is defined in terms of just a few constructs: generalized arrays and subscripts for data structures, recurrent equations for data value definitions, ports for process interactions and virtual processors for execution directives. In EPL a computation is viewed as a collection of cooperating processes. An EPL process is described by a program that consists of only data declarations and annotated conditional equations. The canonical data structure is the array, typically multiply-dimensioned. Structured files are provided for communication with an external environment (in records) and with other processes (through ports). Data items follow a single-assignment rule, i.e., no array element takes on more than one value through the computation. Thus equations, though syntactically reminiscent of assignment statements, are best viewed as assertions of equality and are typically descriptive of recurrence relations.
The EPL programmer can also define configurations that describe interconnections between ports of different processes. Configurations allow the programmer to reuse the same EPL programs in different computations. They also facilitate computation decomposition.
In addition to single-valued data structures, EPL programs contain subscripts. A subscript assumes a range of integers as its value. Subscripts give EPL a dual flavor. In the definitional uiew they may be treated as universal quantifiers and equations are then viewed as logical predicates. In the operational view they can be seen as loop control variables, and each equation then is seen as a statement nested in loops implied by its subscripts.

in ACM Letters on Programming Languages and Systems, 1(1) March 1992 view details

Sinharoy, Balaram; McKenney, Bruce; Szymanski ,Boleslaw K.; "Scheduling EPL programs for parallel processing" view details
in SIGPLAN Workshop on Languages, Compilers, and Run-Time Environments for Distributed Memory Multiprocessors 1992: Boulder, Colorado view details

Can Ozturan, Balaram Sinharoy and Boleslaw K. Szymanski "Compiler Technology for Parallel Scientific Computation" pp201-225 view details Abstract: There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the nttded software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with Equational Programming Language, EPL.
Our approach is based on a program decomposition, paralhl code synthesis and run-time support for parallel scientific computations. The program decomposition is guided by the source program annotations provided by the user. The synthesis of parallel code is based on configurations that describe the overall computation as a set of interacting components. Run-time support is provided by the compiler-generated code that redistributes computation and data during object program execution. The generated parallel code is optimized using techniques of data alignment, operator placement, wavefront determination and memory optimization.
In this paper we discuss annotations, configurations, parallel codt generation and run-time support suitable for parallel programs written in the functional parallel programming language EPL and in Fortran.
Extract: Introduction
Introduction
With a constant need to solve scientific and engineering problems of ever-growing complexity, there is an increasing need for software tools that provide solutions with minimal user involvement. Parallel computation is becoming indispensable in the solution of the large-scale problems that arise in science and engineering. While the use of parallel computation has been increasing, its widespread application has been hampered by the level of effort required to develop and implement the needed software, Parallel software often must be tuned to a particular parallel architecture to execute efficiently; thus, it often requires costly redesign when ported to new machines. Parallel program correctness requires the results to be independent of the number and speed of the processors. This requirement can be satisfied only if the parallel tasks are independent of each other or properly synchronized when a dependence exists. Designing proper synchronization is a major source of difficulty in ensuring parallel program correctness. Different categories of parallel architectures have led to a proliferation of dialects of standard computer languages. Varying parallel programming primitives for different parallel language dialects greatly limit parallel software portability. Poor portability of parallel programs has resulted in a duplication of efforts and has limited the use of developed systems.
Parallel computation can be viewed as an interwoven description of operations that are to be applied to data distributed over the processors, and of data mapping and synchronization that dictate the data movements and the computation order. The traditional programming languages, such as Fortran, C, or C++, cope well with the task of prescribing operations to be performed. However, the description of data mapping and synchronization in such languages is often introduced by ad
hoc architecture-dependent extensions. Examples are various synchronization constructs, like busy-wait, locks or barriers, used in programs for shared-memory machines, send and receive with different semantics employed by programs for message-pas sing architectures, and dimension projection and data broadcast popular in programs for SIMD computers. To avoid such architecture-dependent language definitions, we propose to separate the description of operations to be performed on the data values from the definition of data mapping and synchronization needed to supply these data values to the proper processor at the proper instance of the program execution.
With this goal in mind, we developed tools [1, 2] that (i) decompose, at least partially, the parallel program into the two (nearly) orthogonal parts described above, (ii) translate the necessary data movements into optimal form customized for the target architecture, and (Hi) synthesize an overall parallel computation. Using these tools the user can describe high-level features of a program and synthesize parallel computation from numerical algorithms, program fragments, and data structures that are implemented separately. The tools support (i) parallel task generation and their allocation to the processors, (ii) distribution of data to the processors, (iii) run-time optimization, and (iv) rapid prototyping of different parallel implementations.
Through the application of transformation techniques, different versions of the same program can be generated from decomposed components. The synthesized computation uses load assignment, data distribution, and synchronization appropriate to the size and type of target parallel architecture. The computation synthesis is guided by conditiona.1 dependence graphs that represent externally accessible information in each of the synthesized fragments. Usage of conditional information in dataflow analysis and parallelization significantly increase efficiency of the generated parallel code.
A brief description of the EPL language, its annotations and configurations is given in Section 2. The relationship of EPL constructs and tools to different levels of parallelism is discussed in Section 3. The EPL compiler is discussed in Section 4. Section 4.4 includes an overview of our approach to scalable parallel code generation. A dynamic load management strategy for adaptive scientific computation on SIMD architecture is the topic of Section 5. Finally, conclusions and comparison to other approaches is given in Section 6.
Extract: Overview of the EPL Language
Overview of the EPL Language
EPL is a simple non-strict functional language with a type inference designed for scientific computation. Although computationally vast, scientific computations are typically quite regular both in terms of control flow patterns and employed data structures. The data structures used are usually some variations of multidimensional arrays (sparse matrices, grids, jagged-edge arrays, and even some hierarchical structures can be viewed a.s suc.b). Correspondingly, the KPL language is defined in terinw of just a few constructs: generalized arrays and subscripts for data structures, recurrent equations for program specification, ports for process communication, and virtual processors to facilitate mapping of computation onto processors and load balancing.
A computation is viewed in EPL as a collection of cooperating processes. A process is described by an EPL program that consists of only data declarations and annotated conditional equations. The canonical data structure is a tree with nodes that can repeat and with leaves of primitive types. In its simplest form such a tree can be viewed as a multi-dimensional array; each level of a tree corresponding to a new dimension of the corresponding array. Structured files are provided for communication with an external environment (in records) and with other processes (through ports). EPL enforces a single-assignment rule, i.e.. each data element should be defined exactly once (the EPL compiler, however, is free to produce multiple-assignment object code). Thus equations, though syntactically reminiscent of assignment statements, are best viewed as assertions of equality,
The EPL programmer also defines the process interconnection network (the graph obtained by representing processes as nodes and port interconnections as edges) in the configuration file. Processes along with the configuration files are provided by the user to facilitate the compiler in extracting the coarse grain parallelism in the computation by generating processes and inter-process communication constructs. Configurations also allow the programmer to reuse the same process in different computations.
2.1    Iterations
An iteration is a staple of scientific computing. In EPL, iterations are programmed using subscripts. A subscript assumes a range of integers as its value. Subscripts give EPL a dual flavor. In the definitional view, they may be treated as universal quantifiers and equations can be viewed as logical predicates. In the operational view, they can be seen as loop control variables and each equation can be seen as a statement nested in loops implied by its subscripts.
There is a special class of indirect indexes, called sublinear subscripts, that are used in scientific applications so often that a special construct devoted to them has been introduced in EPL. Formally, an indirect index s defined over the subscript i is sublinear to this subscript if it satisfies the following property:
(0< -s[l] < 1) and (s[i] < s[i + L] < *[/] + !) for i= 1.2,...
It immediately follows from this definition that the sublinear subscript s[i] starts with the value of either 1 or 0 and then, with each increase of i. it is either incremented by 1 or kept unchanged. Typically, there is a condition associated with each sublinear subscript. The condition dictates when the subscript increases. This is the way a sublinear subscript is defined in EPL. For example, a sparse matrix 5 that is a row-major representation of a matrix D can be defined in EPL using a sublinear subscript col\j\ as follows:
subscript: col is sublinear j :  D[i.j] ^ 0; Sli,cef\ = D[itj\
Sublinear subscripts have an implicit range determined by the number of times the defining condition yields true.
The sublinear subscripts are convenient in expressing such operations as creating a list of selected elements, operating on sparse matrices, or defining a subset of the given set. Even more important is the fact that in the implementation of a, process no new iteration has to be created for computation associated with the sublinear subscripts. Instead, all necessary computation can be nested in the iterations created for subscripts in terms of which the considered sublinear subscript has been defined. Sublinear subscripts are also useful in defining dynamic distribution of data to processors at run-time. An example of such a definition is given in Section 5.2.
2.2     Reduction
A computation that frequently occurs in scientific applications is to apply a binary operation over an entire vector and store the result in the last element of the vector. For example, in scientific computation there is often a need to apply an associative operator (such as +,*,-, max, min, etc.) selectively on the elements of an array. Scan and Reduce are language constructs in EPL and other parallel languages that allow such operations to be succinctly written. Reduce applied to a vector of values produces a scalar result, whereas scan results in a vector of partial results. For example, consider a matrix A multiplied by a vector X with the result placed in a vector r. This operation can be written in EPL as:
Temp[iJ] - if j==l then A[ij]*X[j] else Temp[i,j-l]+A[i:j]*X[i]: r[i] ? Temp [i, range .j];
or, even shorter as
Such operations result in references of the form V[. . . range. i. . . .], where range.i indicates the range of the reduced/scanned dimension of a multidimensional array V. (In general, the EPL range variable prefix denotes the size of its suffix.)   The presence of such references in the program is explored by memory optimization and scheduling which is discussed later. A more detailed description of the language is given in [3],
2.3    Configurations
In our approach a parallel computation is viewed as a collection of cooperating processes. Processes are defined as functional programs. Process cooperation is described by a simple macro dataflow specification, called a configuration. Configurations support programming- in-t he-large. The user can experiment with various configurations to find the one that results in the most efficient code.
The configurator uses the dependence graph created during configuration analysis to generate an architecture-independent parallel description which is fed to the code generator. Configurations define processes (and their aggregates) and ports. Statements of the configuration represent relations between ports in different processes. They are supplied by the user to direct integration of the processes into a parallel computation.
Processes created dynamically can communicate with ports located at parent, child, and sibling processes; each of those processes is just a copy of the same program, except the parent process that can be arbitrary.
[...]
2.4    Program Decomposition through Annotations
Annotations provide an efficient way of introducing the user's directives that assist the compiler in program paiallelization.   Annotations have been proposed in many systems by various researchers
[11, 12, 13, 14, 15] and are used mainly as compiler directives. In our approach annotations limit the feasible mappings of computation onto the processors. Hence, they are used only during the decomposition of a process into smaller fragments. This kind of annotation is similar to ON clause as used in the Kali compiler [11], Fortran D [12] or Vienna Fortran [13].
Annotation does not have any effect on the result computed by a program. Consequently, sequential programs that have manifested their correctness over many years of usage are good candidates for parallelization through annotations. Being orthogonal to the program description, annotations support rapid prototyping of different parallel solutions for the same problem, which can be helpful in performance tuning.
In EPL, each equation can be annotated with the name of an array of virtual processors on which it is to be mapped. Virtual processors can be indexed by the equation's subscripts to identify instances of equations assigned to individual virtual processors. Such instances constitute the smallest granule of parallel computation. For example, for the process MVM the following annotation:
will cause the compiler to consider only the tasks that define a sequence of r vector elements. Each task will locally store one row of array A but the vectors x[k, *] must be broadcast to all of those ta,sks. The above partitioning allocates a slice of the equation defined by a single subscript value. The resultant granularity may be too fine for a target architecture. However, when an annotation is indexed by a sublinear subscript, then the corresponding sublinear expression dictates how the annotated equations are clustered onto the virtual processors. For example, let p be a sublinear subscript of i. then range.p is the number of physical or virtual processors. (This number may be a system constant not even known explicitly to the user; it may depend on the architecture, system load, or it may be defined by the user or compiler directive.) Considering again the previous example of a matrix vector multiplication, we can use an annotation:
P[p]: r[k,i] - reduce(+,A[ij]*x[kJ]J);
It will distribute (or partition) the last dimension of r and A over range.p processors in a block fashion (each processor will hold |_^J or |~^~| elements of r and rows of A). In Section 5.2 there is an example in which a different distribution is achieved using a sublinear subscript in an annotation. This distribution balances the load on the processors.
There are similarities as well as differences between the EPL annotations and the Fortran language extensions that have been introduced in many systems, e.g., Vienna Fortran [13, 16. 17]. Fortran D [12, 18. 19] and SUPERB [20]. Vienna Fortran provides directives for array-like processor structure definition. The distribution of arrays can be specified at compile-time through the use of a DIST directive with BLOCK or CYCLIC options. INDIRECT directives can be added to indicate runtime distribution. Such a distribution may have a range of valid distributions defined in its declaration. It uses an explicit mapping array to assign a distribution by an executable statement. The assigned distribution can be part of the condition in the source program. In addition to direct distribution definition, an array in Vienna Fortran can inherit a distribution from the definition of its alignment relative to some other array (and vice versa). Directive DIST can be used with options like =A, TRANSPOSED), PERMUTED, PERM) to align an array with, respectively, another array AD, transposed array A or array A with indicies permuted according to the given vector PERM,
Fortran D directives are similar to Vienna Fortran, however distribution is separated from alignment. In Fortran D, first the DECOMPOSITION statement is used to declare a problem domain for each computation. The ALIGN statement is then used to describe problem mapping tha,t defines the alignment of arrays with respect to each other. Finally, the DISTRIBUTE statement is used to map the problem and its associated arrays to the physical machine.
In EPL, by subscripting the annotated virtual process names and defining the appropriate ranges for the subscripts, the user can distribute the arrays in blocks, columns or rows. The arrays can also be transposed by permuting the subscripts of annotated virtual processors. Unlike Vienna Fortran and Fortran D, EPL does not provide the user with directives to do manual alignment of data. Instead. data alignment algorithms have been developed to facilitate this task automatically (see Section 4.4.1), Hence embedding alignment directives in source programs is not necessary.
      in Scientific Programming, 3(3) 1994 view details

Szymanski, B. "Specifying Parallel Programs in Functional Language: the EPL Experience" pp201-223 view details Abstract: This paper describes the research and development associated with the parallel functional language called EPL - Equational Programming Language - and its compiler. The emphasis is on opportunities and challenges arising from the use of a functional paradigm for specifying parallel programs. The EPL approach is based on a two-level computation description: the specification of the individual processes in EPL and the description of their interactions in a macro data flow language, called configuration. The EPL process specification targets loop and data parallelism, while the interaction description is oriented towards task parallelism. The EPL compiler performs an extensive program analysis and customizes the generated parallel code to specific architectures. The parallel computation decomposition and synchronization is based on the source program annotations provided by the user and on the user's defined configurations.
The paper provides a general description of the features of the EPL system, the detailed technical results have been published elsewhere. It concludes with a discussion of the general properties required of higher level parallel programming languages and with an assessment of EPL from that point of view. Extract: Why Use a Functional Language for Parallel Programming?
Why Use a Functional Language for Parallel Programming?
Parallel computation has become indispensable in the solution of large-scale problems that arise in science and engineering. While the use of parallel computation has been increasing, its widespread application has been hampered by the level of effort required to develop and implement the needed software. Parallel software often must be tuned to a particular parallel architecture to execute efficiently; thus, it often requires costly redesign when ported to new machines. Different categories of parallel architectures have led to a proliferation of dialects of standard computer languages. Varying parallel programming primitives for different parallel language dialects greatly limit parallel software portability.
Parallel computation can be viewed as an interwoven description of operations that are to be applied to data values, and of data movement and synchronization that dictate the form of data access and the computation order. The traditional programming languages, like Fortran, C, or C++, provide a description of data movements and synchronization through ad hoc architecture-dependent extensions. Examples are various synchronization constructs, such as busy-wait, locks or barriers, used in programs for shared-memory machines, send and receive with different semantics employed by programs for message-passing architectures, and dimension projection and data broadcast popular in programs for SIMD computers.
Traditional imperative languages over-specify programs and as a result such programs are hard to write and, more importantly, difficult to understand and debug. The over-specification falls into one of the two classes: (i) Data storage related: unnecessary details describing storage representation or management. For instance, the program may state that two names must be mapped to the same storage instead of a separate space, or a particular integer variable may be treated as a list of bits and not as an encoded number. Another example is an explicit deallocation by the program of part of its memory at a given point of the program execution, (ii) Flow of control related: unnecessary details that specify the exact order of program execution or allocation of the program's resources. For instance, the program may direct the compiler to execute two particular loops in parallel or to execute a specific code on a particular processor, etc.
Functional programming paradigm has been one of the main directions in developing new languages that directly addresses the challenge of parallel programming. Functional languages encourage the use of functional abstractions in place of control abstractions. They employ type inferencing for consistency checking in place of type declarations. The storage related details have no place in functional languages because of their fundamental premise of value transformations. In addition, functional languages can easily separate the description of operations to be performed on data values from the definition of data movements and the synchronization needed to supply these data values to the proper processor at the proper execution instance. Hence, a well-designed functional language can shield the programmer from designing a detailed implementation, a flow of control, or a synchronization, while automatically exploiting parallelism that exists in a program.
In functional languages, computational abstractions are expressed through functions. A first-order function takes data objects as arguments and produces new data objects as results. What is abstracted by the function is the method used to produce the new objects from the arguments. Much of the elegance of functional languages stems from such semantics and from the absence of operational or machine-dependent details. It can be further advanced if the referential transparency is supported by enforcing the single assignment rule. Under this rule, each variable can be assigned a value only once. Thus, all references to a variable yield the same value, no matter where the reference is placed in the program. Because arguments of a function could be evaluated in any order, functional programming languages exhibit significant amounts of implicit parallelism. The above-discussed properties are also important in sequential programming. However, compilers of conventional languages for sequential machines are less complex and produce more efficient code than the compilers of functional languages. This advantage of conventional languages does not carry to parallel programming.
Many large codes were written long before parallel processing was easily accessible, and their methods are based on efficient, sequential algorithms. Turning such algorithms into algorithms that can be easily and efficiently parallelized is not merely a job for an "optimizing" compiler, but requires reprogramming at a fairly high level of the algorithm and data structure design. In general, the task of automatically reprogramming computation into parallel programs is extremely difficult and costly except for some very restricted classes of problems. Lower level details of parallel implementation, such as identifying independent threads of control flow, spawning parallel tasks, and deciding low-level details of data representation, can be automated. Functional languages are a good basis for building compilers capable of such automation, thanks to their simple semantics and lack of notion of the execution state.
Extract: What is EPL?
What is EPL?
This paper describes Equational Programming Language (EPL). It is a simple, array-manipulation functional language designed for programming scientific parallel computations. Scientific computations are particularly well suited for parallel processing. It can be claimed that, so far, they have driven parallel applications most aggressively. Although computationally vast, scientific computations are typically quite regular both in terms of control flow patterns and employed data structures. Often such computations are comprised of iterative applications of numerical algorithms to all (or the majority of) the parts of the data structures. We refer to such computations as iterative computations. Typically, the data structures used in scientific computations are some variations of multidimensional arrays (sparse matrices, grids, jagged-edge arrays, and even some hierarchical structures can be viewed as such). Correspondingly, the EPL
language is defined in terms of a few constructs: generalized, jagged-edge arrays and subscripts for data structures, recurrent equations for data value definitions, ports for process interactions, and virtual processors for execution directives. The canonical data structure is an irregular tree which, in its simplest form, can be viewed as a multi-dimensional array. Structured files are provided for communication with an external environment (in records) and with other processes (through ports). EPL enforces a single-assignment rule, i.e., each data element should be defined exactly once (the EPL compiler, however, is free to produce imperative code including reassignment of variables). Equations, though syntactically reminiscent of assignment statements, are best viewed as assertions of equality. In EPL, a computation is represented by a collection of processes that interact through ports. Such a representation is particularly convenient for expressing task parallelism. Each process is specified with equations and data declarations. Equations may be annotated by virtual processors on which they are to be executed. A process definition is used by the EPL compiler to uncover loop and data parallelism. An integration of different level parallelism in a single computation is one of the design goals of the EPL system.
2.1. Iterations. One of the most important constructs in any functional language is that of iteration. In EPL, iterations are constructed using subscripts. A subscript assumes a range of integers as its value. Subscripts give EPL a dual flavor. In the definitional view, they may be treated as universal quantifiers and equations can be viewed as logical predicates. In such a view, the compiler must produce a program that for the given input will produce such EPL variable values that all predicates become satisfied. In the operational view, subscripts can be seen as loop control variables and each equation can be seen as a statement nested in loops implied by its subscripts.
There is a special class of subscripts, called sublinear subscripts, that are used in scientific applications so often that a special construct devoted to them has been introduced in EPL. Formally, an indirect subscript s defined over the subscript i is an array s[i] associated with a condition cond[i] such that:
s[i\ = if cond[l] then s[i - 1] + 1 else s[i - 1] and s[0] = 0.
It immediately follows from this definition that the sublinear subscript s[i] starts with either the value of 1 or 0 and, then with each increase of i, it is incremented by either 1 or 0. A sublinear subscript has an implicit range determined by the number of times the defining condition yields true.
The sublinear subscripts are convenient in expressing such operations as creating a list of selected elements, operating on sparse matrices, or defining a subset of the given set. Even more important is the fact that in the implementation of a process no new iteration has to be created for computation associated with the sublinear subscripts. Instead, all necessary computation can be nested in the iterations created for subscripts in terms of which the considered sublinear sub-
script has been defined. Sublinear subscripts are also useful in defining dynamic distribution of data to processors.
[...]
2.3.   Configurations. In EPL approach, a parallel computation is viewed as a collection of cooperating processes.    Processes  are defined as functional programs (either by the user or by a program decomposition incurred by the user's annotations, see Section 2.4). Process cooperation is described by a simple macro dataflow specification, called a configuration. Configurations support programming-in-the-large. The user can experiment with various configurations to find the one that results in the most efficient code.
The configurator uses the dependence graph created during configuration analysis to generate an architecture-independent parallel description which is fed to the code generator. Configurations define processes (and their aggregates) and ports. Statements of the configuration represent relations between ports in different processes. Some of these statements are generated during annotation processing (at the subprogram level, see Subsection 4.2); others must be supplied by the user (to direct how the processes are to be integrated into a computation).
Processes created dynamically can communicate with ports located at parent, child, and sibling processes (each of those processes is just a copy of the same program or program fragment, except the parent process that can be arbitrary).
2.4. Annotations. Annotations provide an efficient way of introducing the user's directives to assist the compiler in program parallelization. To be effective, annotations have to be carefully limited to a few constructs. They also should preserve semantics of the original program. Annotations have been proposed in many systems [4, 5, 9] and are used mainly as compiler directives. In contrast, in our approach annotations are used in decomposing a program into fragments. Annotations in the EPL system are introduced solely to limit the feasible allocations of parallel tasks to processors. Hence, programs decorated with annotations produce the same results as unannotated programs.
[...]
In EPL, each equation can be annotated with the name of an array of virtual processors on which it is to be mapped. Virtual processors can be indexed by the equation's subscripts to identify instances of equations assigned to individual virtual processors. Such instances constitute the smallest granule of parallel computation. An example of the use of EPL annotations in a program for the LU decomposition of a matrix is shown in Figure 1. The program follows directly the
standard definition of LU decomposition with pivoting [10]. The input matrix Ain is decomposed into two matrices: lower triangular L and upper triangular U.
The program operates on the sequence of matrices A[i, *, *] starting with the input matrix Ain. As discussed in [10], the computation can be partitioned into the following tasks:
?  diagonal: The n tasks D[j] that operate on the diagonal of the matrix A.
?  subarray: Tasks T[i, j] that operate on submatrices of the matrix A. The presented annotation imposes this partitioning.
      in Specification of Parallel Algorithms, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol. 18, American Mathematical Society, Providence, RI, 1994 view details

Skillicorn, David B. and Talia, Domenico "Models and languages for parallel computation" pp123-169 view details
in [ACM] ACM Computing Surveys (CSUR) 30(2) June 1998 view details