CABAL(ID:7008/cab002)

Parallel CAMAL to run on supercomputers 


for Cambridge And Bath ALgebra-system

New generation of CAMAL designed as a parallel language to run on Hitachi supercomputer at Cambridge


Related languages
CAMAL => CABAL   Influence
COBALG => CABAL   Influence

References:
  • Norman, Arthur and Fitch, John "CABAL: polynomial and power series algebra on a parallel computer" pp196-203 view details Abstract: The work on designing or adapting algebra systems to exploit parallel hardware that we have seen to date has concentrated on trying the speed-up that can come from concurrent exploration of branches in search trees or overlapped computation in demanding algorithms. We in this paper, on the other hand, are almost exclusively concerned with the fact that large-scale parallel computers are frequently configured with very large amounts of primary memory. Our system distributes very large formulae across the multiple processors in such a parallel computer in such a way that calculations that would otherwise fail for lack of memory can be completed. Our initial system described here is designed to perform simple polynomial and power series calculations, but even in such apparently limited domains the easing of memory restrict ions can sometimes prove useful. Results from our system, called CABAL (Cambridge And Bath ALgebra-system), are included both from runs on small networks of conventional workstations and using a Hitachi SR2201 massively parallel super-computer. The main result is a demonstration of the feasibility of the design. DOI Extract: Introduction
    Introduction
    Throughout the history of computer algebra there have been problems that have strained the computing resources of the day, and consequently systems which have been written explicitly to address the limitations. The fact that many algebraic algorithms exhibit rapid growths in computing time and memory needs as problem size increases means that this straining is liable to persist, even though now many problems that used to be on the boundary of feasibility can now be solved on cheap desktop computers. Although the time taken to solve problems has generally attracted most of the attention both in discussions of algorithmic improvements, and in particular in the exploitation of parallel hardware, in many cases memory use is actually a more severe limit to what can be computed. As we are concerned with very large address spaces, the techniques of virtual shared memory are not helpful. One of the most obvious ways in which computers have been developing so that memory limits can be relaxed is the introduction of architectures with 64bit addressability. For data warehousing these systems can be most satisfactory; however our work is based on an expectation that over the next few years most machine intended for scientific use and with many gigabytes of primary memory will in fact be styled as "massively parallel" computers with a CPU for every few hundred megabytes of storage. We will, of course, be very happy if we can get some speed-up from the parallel processing that this makes possible, but our main objective will be to work around the awkwardness of having multiple address spaces in use when solving a single problem. This problem has on occasion been realised in other context, but is not often recognised. Previous attempts to deal with algebra problems larger than memory have included Schoonschip and COBALG. There are some similarities between these and the system described here, but there are also significant differences.
    The sorts of problem that will interest us most will be those that suffer from severe intermediate expression swell,  where huge amounts of workspace are needed during the processing of a problem even when the result may end up of manageable size. Of course the ideal resolution of such problems would be better algorithms that avoid the blow-up, but that ideal can not always be attained. So in the meanwhile brute force reliance on huge amounts of memory can be the only realistic way to get problems solved. Big memory needs can arise in two fundamentally different ways: in the first as a program runs it generates a very huge number of expressions which eventually combine to yield the result. In such cases one could reasonably hope to store each expression on a single processing element, and with good luck that would fit naturally with a strategy whereby expressions could be processed concurrently. The second class of problems, and the one we consider here, works with a relatively fixed number of expressions, but thins expressions grow in size as the calculation progresses, before finally collapsing at the very end. In these cased it will be necessary to distribute the representation of a single expression across multiple address spaces and naturally all processing of such expressions will lead to substantial inter-processor communication. Our view is that the second of these two classes of problem rep resents the more important challenge for system designers, and so it is the one to which this paper is aimed.
    A study of this nature would be sterile if there were only a few problems that fitted its profile, or if the main problems that were to be addressed were only of narrow specialist interest. However our belief is that this is not problem in our case: finding symbolic solutions to non-trivial symbolic sets of simultaneous linear equations is both manifestly of practical interest and well known to suffer fairly painfully from intermediate expression swell. GCD and factorisations problems degenerate and suffer from the same problem in cases where sparseness preserving techniques are not successful. There may also be applications where very high order multivariate power series expansions need to be computed in order to get sufficiently accurate results when the  symbolic expansion is evaluated numerically, and here again a modest number of very large expressions will be the order of the day.
    This work stands in contrast to the more mainstream work on using parallel computers to achieve higher performance in particular algorithms, and especially Grobner base calculations and factorisation. The interest of these others has been in high level algorithm development for such calculations, a fruitful area as shown by Amrheim, Gloor and Kuechlin. We on the other hand, more like Schreiner and Hong, are interested in the lowest level of representation and basic operations; but we are not primarily interested in speedup (although we would regard any we achieved as a bonus), but with large data sizes. The Multipol project shows an alternative approach to large-scale distribution. In summary, we are concerned with the uses of a parallel computer in order to handle problems whose data is too large to be contained in a single processor and single address space. We show such a design and describe a preliminary implementation which shows encouraging performance. Extract: Conclusions
    Conclusions
    We have presented a design for, and an initial implementation of, an algebra system intended to perform very large scale calculations on massively parallel computers, not so much with the intent ion of finding high performance in time, but to perform calculations which require more address space than is provided on a conventional single processor. The design process has required a major rethink of many of the mainstays of conventional algebra systems. We are led to a distributed polynomial representation by the need for uniform loading when handling sparse polynomials. Similarly the use of a hash table representation is a consequence of the same sparsity. We cannot exploit advanced algorithms such as Karatsuba multiplication, and have had to develop new variants of classical algorithms.
    It may appear that the CABAL system is closer to the algebra systems of the 1960s or 1970s than it is to the 1990s. This shouId come as no surprise. In the pioneer days the creator of an algebra system was often constrained by the small size of the target computer, and so was constrained in the kinds of solutions. We too are pushing against the limitations of the computers, and are led in ways we did not expect. The system with which we are most familiar, CAMAL used packed exponent vectors and a distributed representation, with an explicit return memory allocation. We did not start with the intention of re-creating CAMAL, but the parallels emerged as our work progressed. Similarly the COBALG experiment had a similar aim; running problems which were larger than main memory. Here our solutions are significantly different, but still a family resemblance keeps appearing. Schoonschip was faced with a particular arcMtecture, and exploited its peculiar features to deliver higher performance that could have been expected. We have demonstrated with CABAL that it is possible to use massively parallel computers to execute symbolic  programs which require more memory than is available on a single processor, and this is a contribution to the way in which distributed memory can be used. It especially pleases us that our design ends up showing speed-up from the use of parallelism even in as unpromising example as multiplying sparse polynomials where communication costs will be inherently huge.
    This brush with massive parallelism has of course refreshed our respect for the difficulties of getting synchronisation and flow control correct in parallel algorithms, and the difficulties of debugging when examples that fail may involve quantities of data that exceed the total amount of disc space on some of our computers.
    Our efforts now need to continue with refinements to our basic polynomial package, which are almost bound to reveal ways in which we can speed it up significantly, followed by experiments involving its application to problems in linear algebra and similar areas which generate large polynomials.

          in Proceedings of the Second International Symposium on Parallel Symbolic Computation Maui, Hawaii 1997 view details