Vienna Fortran(ID:1714/vie002)


Hans Zima U Vienna. Data- parallel extension of Fortran 77 for distributed memory multiprocessors.


Related languages
Fortran 90 => Vienna Fortran   Extension of
Kali => Vienna Fortran   Incorporated some features of
SUPERB => Vienna Fortran   Evolution of

References:
  • Benkner, S. Chapman, B.M. Zima, H.P. "Vienna Fortran 90" Proceedings Scalable High Performance Computing Conference, 1992. SHPCC-92 Williamsburg, VA, USA pp51-59 view details Abstract: Vienna Fortran 90 is a language extension of Fortran 90 which enables the user to write programs for distributed memory multiprocessors using global data references only. Performance of software on such systems is profoundly influenced by the manner in which data is distributed to the processors. Hence, Vienna Fortran 90 provides the user with a wide range of facilities for the mapping of data to processors. It combines the advantages of the shared memory programming paradigm with mechanisms for explicit user control of those aspects of the program which have the greatest impact on efficiency. The paper presents the major features of Vienna Fortran 90 and gives examples of their use External link: Online copy Extract: Introduction
    Introduction
    A number of distributed-memory multiprocessing systems (such as Intel's hypercube series and the NCUBE) are now commercially available. Other such systems are currently under development or have been announced in recent months. These architectures are relatively inexpensive to build, and are potentially scalable to very large numbers of processors. Hence their share of the market is likely to increase in the near future.
    The most important single difference between them and other computer architectures is the fact that the memory is physically distributed among the processors; the time required to access a non-local datum may be an order of magnitude higher than the time taken to access locally stored data. This has important consequences for program efficiency. In particular, the management of data, with the twin goals of both spreading the computational workload and minimizing the delays caused when a processor has to wait for non-local data, becomes of paramount importance.
    A major difficulty with the current generation of distributed memory computing systems, however, is that they generally lack programming tools for software development at a suitably high level. The user is forced to manage all details of the distribution of data and work to the processors. This results in a programming style which is tedious, time-consuming and error prone. It has led to particularly slow software development cycles and, in consequence, high costs for software.
    Thus research is now concentrated on the provision of appropriate high-level language constructs to enable users to design programs in much the same way as they are accustomed to on a sequential machine. Several proposals have been put forth in recent months for a set of language extensions to achieve this, in particular (but not only) for Fortran, and current compiler research is aimed at implementing them.
    The lan uage proposals include the Yale Extensions, Fortran D [6], under development
    at Rice University, (and a proposed extension to FortranD), Digital Equipment Corporation's High Performance Fortran proposal, the language extensions to f77 planned by A ray Research Inc, and suggested extensions from Thinking Machines researchin compiler technology has so far resulted in the development of prototype systems which are able to convert programs written using global data references to code for distributed memory systems. These include Kali, SUPERB, and the MIMDizer. these systems require the user to specify the distribution of the program's data. This data distribution is then used to guide the process of restructuring the code into an SPMD (Single Program Multiple Data) program for execution on the target distributed memory multiprocessor. The compiler analyzes the source code, translating global data references into local and non-local references based on the distributions specified by the user. The non-local references are satisfied by inserting appropriate messagepassing statements in the generated code. Finally, the communication is optimized where possible, in particular by combining statements and by sending data at the earliest possible point in time.
    Based upon the experience gained with these systems, Vienna Fortran has been proposed as a machineindependent language extension to FORTRAN 77 and to Fortran 90 for writing programs for distributed-memory multiprocessor systems; one of its main aims is to provide the user with a suitable means of specifying the distribution of data in a pro ram. Vienna Fortran programs are written using ,lofa1 indices. It is the task of the compiler to insert communication statements where required. The resulting code is not only simpler, but also considerably more flexible, enabling users to modify a data distribution without major repro rammin . Moreover, the features of Vienna Fortran $so provi%e a powerful and convenient means for specifying parallel algorithms, which may exploit the properties of the chosen data distribution in order to obtain efficient code. Since the distribution of data is crucial for performance,
    Vienna Fortran provides a variety of methods to distribute data. This includes direct specification of distributions, distribution by alignment or by referring to the distribution of another array, and the use of mapping arrays to support indirect and irregular distributions. Some of these will be described in the following. Note that in Vienna Fortran, the term "distribution" includes replication, where array elements are mapped to sets of processors. Thus an array distribution is simply a mappin of the elements of the array to (non-empty) sets o d processors. For a full description of Vienna Fortran, see [33]. Vienna Fortran 90 (VF90) is a subset of Fortran 90 with Vienna Fortran extensions to do all of the above. Furthermore, it includes language constructs for controlling the mechanisms of dynamic redistribution and distributed argument passing, and provides a parallel loop. In this work, the basic language features of Vienna Fortran 90 are presented.
    This paper is organized as follows: the following sections introduce the main elements of VF90; features particularly relevant in the context of Fortran90 are emphasized. This is followed by an example pr* gram which illustrates thhe capabilities of the language. The paper concludes with a discussion of related work.
  • Chapman, B. et al, "Programming In Vienna Fortran" pp31-50 view details
          in Scientific Programming 1(1) (Aug 1992) view details
  • Chapman, Barbara M.; Mehrotra, Piyush; Zima, Hans P. "User Defined Mappings in Vienna Fortran" pp72-75 view details
          in SIGPLAN Workshop on Languages, Compilers, and Run-Time Environments for Distributed Memory Multiprocessors 1992: Boulder, Colorado view details
  • Zima, H.; Brezany, P.; Chapman, B.; Mehrotra, P. and Schwald, A. "Vienna Fortran - a language specification" Internal Report 21, ICASE, Hampton, VA, March 1992. view details
          in SIGPLAN Workshop on Languages, Compilers, and Run-Time Environments for Distributed Memory Multiprocessors 1992: Boulder, Colorado view details
  • Zima, H. P. Brezany, P. and B. M. Chapman, "SUPERB and Vienna Fortran" pp1487-1517 view details Extract: Introduction
    Introduction
    Since the advent of early distributed-memory multiprocessing systems (DMMPs) such as Caltech's Cosmic Cube and the German supercomputer SUPRENUM less than a decade ago, these architectures have rapidly gained user acceptance and are today offered by most major manufacturers. Current DMMPs include  Intel's hypercubes, the Paragon, the nCUBE, Thinking Machine's CM-5, and the Meiko Computing Surface. DMMPs are relatively inexpensive to build, and are potentially scalable to large numbers of processors.  However, these machines are difficult to program: the non-uniformity of the memory which makes local  accesses much faster than the transfer of non-local data via message-passing operations implies that the  locality of algorithms must be exploited in order to achieve acceptable performance. The management of  data, with the twin goals of both spreading the computational workload and minimizing the delays caused  when a processor has to wait for non-local data, becomes of paramount importance.
    When a code is parallelized by hand, the programmer must distribute the program's work and data to the processors which will execute it. One of the common approaches to do so makes use of the regularity  of most numerical computations. This is the so-called Single Program Multiple Data (SPMD) or  data parallel model of computation. With this method, the data arrays in the original program are each  partitioned and mapped to the processors. This is known as distributing the arrays. A processor is then  thought of as owning the data assigned to it; these data elements are stored in its local memory. Now the  work is distributed according to the data distribution: computations which define the data elements owned  by a processor are performed by it -- this is known as the owner computes paradigm. The processors then  execute essentially the same code in parallel, each on the data stored locally. Accesses to non-local data must  be explicitly handled by the programmer, who has to insert communication constructs to send and receive  data at the appropriate positions in the code. The details of message passing can become surprisingly  complex: buffers must be set up, and the programmer must take care to send data as early as possible,  and in economical sizes. Furthermore, the programmer must decide when it is advantageous to replicate  computations across processors, rather than send data. A major characteristic of this style of programming is that the performance of the resulting code depends  to a very large extent on the data distribution selected. It determines not only where computation will  take place, but is also the main factor in deciding what communication is necessary. The communication  statements as well as the data distribution are hardcoded into the program. It will generally require a great  deal of reprogramming if the user wants to try out different data distributions. This programming style can  be likened to assembly programming on a sequential machine -- it is tedious, time-consuming and error prone.
    Thus much research activity has been concentrated on providing programming tools for DMMPs. One of the first such tools is SUPERB[42], an interactive restructurer which was developed in the SUPRENUM  project ([42]) starting in 1985. It translates Fortran 77 programs into message passing Fortran for the  SUPRENUM machine [18], the Intel iPSC, and the GENESIS machine. SUPERB performs coarse-grain  parallelization for a DMMP and is also able to vectorize the resulting code for the individual nodes of the  machine. The user'specifies the distribution of the program's data via an interactive language. Program flow  and dependence analysis information, using both intraprocedural and interprocedural analysis techniques,  is computed and made available to the user, who may select individual transformation strategies or request  other services via menus. SUPERB puts a good deal of effort into optimizing the target program, extracting  communication from loops whenever possible, and combining individual communication statements (by vectorization and fusion) to reduce the overall communication cost ([16]). Simple reductions are recognized and  handled by the system. SUPERB handles full Fortran 77, dealing with common blocks and equivalencing.
    Its implementation was completed in 1989 and thus it was the first system which compiled code for DMMPs from Fortran 77 and a description of the distribution of data. SUPERB provides special support for handling  work arrays, as are commonly used in Fortran codes, for example to store several grids in one array.  The experience and success gained with SUPERB and other experimental parallelization systems for  DMMPs led to a new focus of research: the provision of appropriate high-level language constructs  for the specification of data distributions. Vienna Fortran [8, 43], developed within the ESPRIT project  GENESIS in joint work by the University of Vienna and ICASE, Nasa Langley Research Center, is a machine-independent language extension to Fortran, which includes high-level features for specifying virtual processor  structures, distributing data across sets of processors, dynamically modifying distributions, and formulating  explicitly parallel loops.
    This paper will focus on SUPERB and Vienna Fortran, which are discussed in detail in Sections 3 and 4, after an introduction to the basic notation and terminology (Section 2). The rest of the paper deals  with the relationship between Vienna Fortran and HPF (Section 5), an advanced compilation technique for  dealing with irregular data accesses (Section 6), and an overview of related work (Section 7), followed by the  conclusion. Extract: Related Work
    Related Work
    An early attempt to provide higher-level language constructs for the specification of numerical algorithms on DMMPs is DINO [34, 35]. DINO is explicitly parallel, providing a set of C language extensions. Non-local  data may be read and written; thus DINO does not conform to the owner computes paradigm. Remote  accesses are marked by the user. DINO has been fully specified and implemented.  The description of SUPERB in [42] is the first journal publication in the area of compiling Fortran for  DMMPs. Callahan and Kennedy propose a similar compilation approach in [6].
    The concept of defining processor arrays and distributing data to them was first introduced in the programming language BLAZE [25] in the context of shared memory systems with non-uniform access times.
    This research was continued in the Kali programming language [28] for distributed memory machines, which requires that the user'specify data distributions in much the same way that Vienna Fortran does. It permits  both standard and user-defined distributions. The design of Kali has greatly influenced the development of  Vienna Fortran. In particular, the parallel FORALL loops of Vienna Fortran were first defined in Kali and  implemented with the inspector-executor paradigm as described in Section 6.
    The Parti routines and the ARF compiler ([41, 38]), developed by Saltz and co-workers at ICASE, represent techniques developed to handle the kind of codes written for sparse and unstructured problems in  scientific computing. They are designed to handle the general case of arbitrary data mappings, and efficient  techniques were developed for a number of subproblems.
    A commercially available system is the MIMDizer ([30]) which may be used to parallelize sequential Fortran programs according to the SPMD model. The MIMDizer takes a similar approach to SUPERB; it  deals with a number of specific Fortran issues, including a very flexible handling of common blocks.  The programming language Fortran D [13] proposes a Fortran language extension in which the programmer specifies the distribution of data by aligning each array to a decomposition, which corresponds to an  HPF template (see Section 5), and then specifying a distribution of the decomposition to a virtual machine.  These are executable statements, and array distributions are dynamic only. A subset of Fortran D -- roughly  corresponding to SUPERB -- has been implemented for the iPSC/860 [20].
    The source language for the Crystal compiler built by Li and Chen at Yale University ([26]) is the functional language Crystal, which includes constructs for specifying data parallelism. Thus there is a certain amount of parallelism explicit in the original code. Experimental compilers have been constructed for  the iPSC hypercube and the nCUBE; they place particular emphasis on an analysis of the communication  requirements to generate efficient communication.
    Dataparallel C ([19]) is a SIMD extension of the C language which is a slightly modified version of the original C* for the Connection Machine. Like DINO, it is explicitly parallel and requires the user to  specify a local view of computations. Dataparallel C compilers have been constructed for both shared and  distributed memory machines.
    Cray Research Inc. has announced MPP Fortran [32], a set of language extensions to Cray Fortran which enable the user to specify the distribution of data and work. They provide intrinsics for data distribution  and permit redistribution at subroutine boundaries. Further, they permit the user to structure the executing  processors by giving them a shape and weighting the dimensions. Several methods for distributing iterations  of loops are provided.
    In the Cray programming model, many of the features of shared memory parallel languages have been retained: these include critical sections, events and locks. New instructions for node I/O are provided.  Other systems include AL, which has been implemented on the Warp systolic array processor [40],  Pandore, a C-based system [2], Id Nouveau, a compiler for a functional language [33], Oxygen [36],  ASPAR [22], Adapt, developed at the University of Southampton [29], and the Yale Extensions [10]. In  a few systems, dynamic data distributions have been implemented within narrow constraints [3, 2].
    The systems described above are not the only efforts to provide either suitable language constructs for mapping code onto DMMPs or to generate message passing programs from higher--level code. Other  important approaches include Linda [1], Strand [12], and Booster [31].
          in Parallel Computing, Vol. 20, 1994 view details
  • Mehrotra, Piyush; Van Rosendale, John; Zima, Hans "High Performance Fortran: History, Status and Future" Technical Report TR 97-8, Institute for Software Technology and Parallel Systems, University of Vienna, September 1997. view details Extract: Conclusion
    Conclusion
    HPF is a well-designed language which can handle most data parallel scientific applications with reasonable facility. However, as architectures evolve and scientific programming becomes more sophisticated, the limi-  tations of the language are becoming increasingly apparent. There are at least three points of view one could  take:
    1. HPF is too high-level a language --- MPI-style languages are more appropriate.
    2. HPF is too low-level a language --- aggressive compiler technologies and improving architectures obviate the need for HPF-style compiler directives.
    3. The level of HPF is about right, but extensions are required to handle some applications for some upcoming architectures.

    All three of these alternatives are being actively pursued by language researchers. For example, HPC++ [?] is an effort to design an HPF-style language using C++ as a base. On the other hand, F - - [?] is an attempt  to provide a lower-level data-parallel language than HPF. Like HPF, F - - provides a single thread of flow  control. But unlike HPF, F - - requires all communication to be explicit using "get'' and "put'' primitives.

    While it is difficult to predict where languages will head, the coming generation of SMP-cluster ar- chitectures may induce new families of languages which will take advantage of the hardware support for  shared-memory semantics with an SMP, while covering the limited global communication capability of the  architectures. In this effort the experience gained in the development and implementation of HPF will surely  serve us well.
          in Parallel Computing, Vol. 20, 1994 view details