SUPERB(ID:6261/)

Parallel FORTRAN 


Parallel fortran for the SUPERNUM machine in Vienna, and the precursor to Vienna Fortran

Hans Zim, Genesis project, 1988

"SUPERB was the first implemented system that translated sequential Fortran 77 into explicitly parallel message-passing Fortran."


Related languages
FORTRAN 77 => SUPERB   Extension of
SUPERB => Vienna Fortran   Evolution of

References:
  • Zima, H.; Bast, H.; and M. Gerndt "Superb: A tool for semi­automatic MIMD/SIMD parallelization" Parallel Computing, 6:1--18, 1988 view details
  • Zima, H. and Chapman, B. "Supercompilers for Parallel and Vector Computers" ACM Press Frontier Series, Addison­Wesley, 1990. view details
  • Zima, H.; Brezany, P.; Chapman, B.; Mehrotra, P. and Schwald, A. "Vienna Fortran - a language specification" Internal Report 21, ICASE, Hampton, VA, March 1992. view details
  • Zima H. and B. Chapman "Compiling for Distributed Memory Systems" Proceedings of the IEEE, Special Section on Languages and Compilers for Parallel Machines, pp. 264­287, February 1993 view details
  • Zima, H. P. Brezany, P. and B. M. Chapman, "SUPERB and Vienna Fortran" pp1487-1517 view details Abstract: Distributed-memory systems are powerful tools for solving large-scale scientific and engineering problems. However these machines are difficult to program since the data have to be distributed across the processors and message-passing operations must be inserted for communicating non-local data. In this paper, we discuss SUPERB and Vienna Fortran, two related developments with the objective of providing the user with a higher level programming paradigm while not sacrificing target code sacrificing target code performance.
    The parallelization system SUPERB was developed in the German supercomputer project SUPRENUM from 1985 to 1989. It is based on the Single-Program-Multiple-Data (SPMD) paradigm, allows the use of global addresses, and automatically inserts the necessary communication statements, given a usersupplied data distribution. SUPERB was the first implemented system that translated sequential Fortran 77 into explicitly parallel message-passing Fortran.
    As a result of the experiences with SUPERB and related research, the language Vienna Fortran was designed within the ESPRIT project GENESIS, in a joint effort of the University of Vienna and  ICASE, Nasa Langley Research Center. Vienna Fortran is a machine-independent language extension  to Fortran, which includes a broad range of features for the high-level support of advanced application  development for distributed-memory multiprocessors. It has significantly influenced the development of  High Performance Fortran, a first attempt of language standardization in this area.  Keywords: distributed-memory multiprocessor systems, numerical computation, data parallel algo-  rithms, data distribution, program analysis, optimization. External link: Online copy Extract: Introduction
    Introduction
    Since the advent of early distributed-memory multiprocessing systems (DMMPs) such as Caltech's Cosmic Cube and the German supercomputer SUPRENUM less than a decade ago, these architectures have rapidly gained user acceptance and are today offered by most major manufacturers. Current DMMPs include  Intel's hypercubes, the Paragon, the nCUBE, Thinking Machine's CM-5, and the Meiko Computing Surface. DMMPs are relatively inexpensive to build, and are potentially scalable to large numbers of processors.  However, these machines are difficult to program: the non-uniformity of the memory which makes local  accesses much faster than the transfer of non-local data via message-passing operations implies that the  locality of algorithms must be exploited in order to achieve acceptable performance. The management of  data, with the twin goals of both spreading the computational workload and minimizing the delays caused  when a processor has to wait for non-local data, becomes of paramount importance.
    When a code is parallelized by hand, the programmer must distribute the program's work and data to the processors which will execute it. One of the common approaches to do so makes use of the regularity  of most numerical computations. This is the so-called Single Program Multiple Data (SPMD) or  data parallel model of computation. With this method, the data arrays in the original program are each  partitioned and mapped to the processors. This is known as distributing the arrays. A processor is then  thought of as owning the data assigned to it; these data elements are stored in its local memory. Now the  work is distributed according to the data distribution: computations which define the data elements owned  by a processor are performed by it -- this is known as the owner computes paradigm. The processors then  execute essentially the same code in parallel, each on the data stored locally. Accesses to non-local data must  be explicitly handled by the programmer, who has to insert communication constructs to send and receive  data at the appropriate positions in the code. The details of message passing can become surprisingly  complex: buffers must be set up, and the programmer must take care to send data as early as possible,  and in economical sizes. Furthermore, the programmer must decide when it is advantageous to replicate  computations across processors, rather than send data. A major characteristic of this style of programming is that the performance of the resulting code depends  to a very large extent on the data distribution selected. It determines not only where computation will  take place, but is also the main factor in deciding what communication is necessary. The communication  statements as well as the data distribution are hardcoded into the program. It will generally require a great  deal of reprogramming if the user wants to try out different data distributions. This programming style can  be likened to assembly programming on a sequential machine -- it is tedious, time-consuming and error prone.
    Thus much research activity has been concentrated on providing programming tools for DMMPs. One of the first such tools is SUPERB[42], an interactive restructurer which was developed in the SUPRENUM  project ([42]) starting in 1985. It translates Fortran 77 programs into message passing Fortran for the  SUPRENUM machine [18], the Intel iPSC, and the GENESIS machine. SUPERB performs coarse-grain  parallelization for a DMMP and is also able to vectorize the resulting code for the individual nodes of the  machine. The user'specifies the distribution of the program's data via an interactive language. Program flow  and dependence analysis information, using both intraprocedural and interprocedural analysis techniques,  is computed and made available to the user, who may select individual transformation strategies or request  other services via menus. SUPERB puts a good deal of effort into optimizing the target program, extracting  communication from loops whenever possible, and combining individual communication statements (by vectorization and fusion) to reduce the overall communication cost ([16]). Simple reductions are recognized and  handled by the system. SUPERB handles full Fortran 77, dealing with common blocks and equivalencing.
    Its implementation was completed in 1989 and thus it was the first system which compiled code for DMMPs from Fortran 77 and a description of the distribution of data. SUPERB provides special support for handling  work arrays, as are commonly used in Fortran codes, for example to store several grids in one array.  The experience and success gained with SUPERB and other experimental parallelization systems for  DMMPs led to a new focus of research: the provision of appropriate high-level language constructs  for the specification of data distributions. Vienna Fortran [8, 43], developed within the ESPRIT project  GENESIS in joint work by the University of Vienna and ICASE, Nasa Langley Research Center, is a machine-independent language extension to Fortran, which includes high-level features for specifying virtual processor  structures, distributing data across sets of processors, dynamically modifying distributions, and formulating  explicitly parallel loops.
    This paper will focus on SUPERB and Vienna Fortran, which are discussed in detail in Sections 3 and 4, after an introduction to the basic notation and terminology (Section 2). The rest of the paper deals  with the relationship between Vienna Fortran and HPF (Section 5), an advanced compilation technique for  dealing with irregular data accesses (Section 6), and an overview of related work (Section 7), followed by the  conclusion. Extract: Related Work
    Related Work
    An early attempt to provide higher-level language constructs for the specification of numerical algorithms on DMMPs is DINO [34, 35]. DINO is explicitly parallel, providing a set of C language extensions. Non-local  data may be read and written; thus DINO does not conform to the owner computes paradigm. Remote  accesses are marked by the user. DINO has been fully specified and implemented.  The description of SUPERB in [42] is the first journal publication in the area of compiling Fortran for  DMMPs. Callahan and Kennedy propose a similar compilation approach in [6].
    The concept of defining processor arrays and distributing data to them was first introduced in the programming language BLAZE [25] in the context of shared memory systems with non-uniform access times.
    This research was continued in the Kali programming language [28] for distributed memory machines, which requires that the user'specify data distributions in much the same way that Vienna Fortran does. It permits  both standard and user-defined distributions. The design of Kali has greatly influenced the development of  Vienna Fortran. In particular, the parallel FORALL loops of Vienna Fortran were first defined in Kali and  implemented with the inspector-executor paradigm as described in Section 6.
    The Parti routines and the ARF compiler ([41, 38]), developed by Saltz and co-workers at ICASE, represent techniques developed to handle the kind of codes written for sparse and unstructured problems in  scientific computing. They are designed to handle the general case of arbitrary data mappings, and efficient  techniques were developed for a number of subproblems.
    A commercially available system is the MIMDizer ([30]) which may be used to parallelize sequential Fortran programs according to the SPMD model. The MIMDizer takes a similar approach to SUPERB; it  deals with a number of specific Fortran issues, including a very flexible handling of common blocks.  The programming language Fortran D [13] proposes a Fortran language extension in which the programmer specifies the distribution of data by aligning each array to a decomposition, which corresponds to an  HPF template (see Section 5), and then specifying a distribution of the decomposition to a virtual machine.  These are executable statements, and array distributions are dynamic only. A subset of Fortran D -- roughly  corresponding to SUPERB -- has been implemented for the iPSC/860 [20].
    The source language for the Crystal compiler built by Li and Chen at Yale University ([26]) is the functional language Crystal, which includes constructs for specifying data parallelism. Thus there is a certain amount of parallelism explicit in the original code. Experimental compilers have been constructed for  the iPSC hypercube and the nCUBE; they place particular emphasis on an analysis of the communication  requirements to generate efficient communication.
    Dataparallel C ([19]) is a SIMD extension of the C language which is a slightly modified version of the original C* for the Connection Machine. Like DINO, it is explicitly parallel and requires the user to  specify a local view of computations. Dataparallel C compilers have been constructed for both shared and  distributed memory machines.
    Cray Research Inc. has announced MPP Fortran [32], a set of language extensions to Cray Fortran which enable the user to specify the distribution of data and work. They provide intrinsics for data distribution  and permit redistribution at subroutine boundaries. Further, they permit the user to structure the executing  processors by giving them a shape and weighting the dimensions. Several methods for distributing iterations  of loops are provided.
    In the Cray programming model, many of the features of shared memory parallel languages have been retained: these include critical sections, events and locks. New instructions for node I/O are provided.  Other systems include AL, which has been implemented on the Warp systolic array processor [40],  Pandore, a C-based system [2], Id Nouveau, a compiler for a functional language [33], Oxygen [36],  ASPAR [22], Adapt, developed at the University of Southampton [29], and the Yale Extensions [10]. In  a few systems, dynamic data distributions have been implemented within narrow constraints [3, 2].
    The systems described above are not the only efforts to provide either suitable language constructs for mapping code onto DMMPs or to generate message passing programs from higher--level code. Other  important approaches include Linda [1], Strand [12], and Booster [31].
          in Parallel Computing, Vol. 20, 1994 view details
  • Mehrotra, Piyush; Van Rosendale, John; Zima, Hans "High Performance Fortran: History, Status and Future" Technical Report TR 97-8, Institute for Software Technology and Parallel Systems, University of Vienna, September 1997. view details Extract: SUPERB
    SUPERB was an interactive restructuring tool, developed at the University of Bonn, which translated Fortran 77 programs into message-passing Fortran for the Intel iPSC, the GENESIS machine, and the SUPRENUM machine. The user'specified the distribution of the program's data via an interactive language.  Program flow and dependence information, using both intraprocedural and interprocedural analysis techniques, was computed and made available to the user, who could select individual transformation strategies or  request other services via menus. SUPERB performed coarse-grain parallelization for a distributed-memory  machine and was also able to vectorize the resulting code for the individual nodes of the machine. Extract: Conclusion
    Conclusion
    HPF is a well-designed language which can handle most data parallel scientific applications with reasonable facility. However, as architectures evolve and scientific programming becomes more sophisticated, the limi-  tations of the language are becoming increasingly apparent. There are at least three points of view one could  take:
    1. HPF is too high-level a language --- MPI-style languages are more appropriate.
    2. HPF is too low-level a language --- aggressive compiler technologies and improving architectures obviate the need for HPF-style compiler directives.
    3. The level of HPF is about right, but extensions are required to handle some applications for some upcoming architectures.

    All three of these alternatives are being actively pursued by language researchers. For example, HPC++ [?] is an effort to design an HPF-style language using C++ as a base. On the other hand, F - - [?] is an attempt  to provide a lower-level data-parallel language than HPF. Like HPF, F - - provides a single thread of flow  control. But unlike HPF, F - - requires all communication to be explicit using "get'' and "put'' primitives.

    While it is difficult to predict where languages will head, the coming generation of SMP-cluster ar- chitectures may induce new families of languages which will take advantage of the hardware support for  shared-memory semantics with an SMP, while covering the limited global communication capability of the  architectures. In this effort the experience gained in the development and implementation of HPF will surely  serve us well.
          in Parallel Computing, Vol. 20, 1994 view details