# SIN(ID:2502/sin001)

## Symbolic maths package

for Symbolic INtegration

Joel Moses 1967 Project MAC, MIT

Symbolic maths package

Places

Structures:
Related languages
 SIN => CARPS Incorporated features of SIN => SCHATCHEN subsystem SIN => Scratchpad Influence

References:
• Moses, J. "Expert Symbolic Integration" MAC-TR-47, Proj. MAC, MIT, Dec. 1967 view details
• Charniak, Eugene "Computer Solution of Calculus Word Problems" view details Extract: Introduction
Introduction
The research described in this paper had as its goal the creation of a program which solves freshman calculus word problems.  The program, CARPS (CAlculus Rate Problem Solver), is restricted to rate problems.  It is described in greater detail in MAC-TR-51 (Thesis). CARPS was primarily motivated by Bobrow's work on STUDENT? a program which solves high school algebra word problems.  An understanding of STUDENT is sufficiently important to our work that we shall analyze Bobrow's program in the second section of the paper.
CARPS is written in two languages. The bulk of the coding is in LISP. There are, however, large sections which require a great deal of pattern matching, something in which LISP is not particularly powerful.  These sections were written in CONVERT, a language especially designed for pattern matching.  Since CONVERT is embedded in LISP it was an especially convenient choice because we could easily switch back and forth between the two languages.  Both of these languages were available on the Project MAC PDP-6 time sharing system which was used in this research. Also available were J. Moses' algebraic simplification and differentiation routines which are described in [Moses].  The PDP-6 system which has a quarter million words of core storage, gave us a decided advantage over Bobrow, whose program had to fit into a 32K 7094 LISP system, whereas ours wallows in the comparative luxury of 45K of memory.
Extract: STUDENT
STUDENT
As we mentioned earlier Bobrow's program, STUDENT, solves algebra word problems.  To really understand how STUDENT works we should go through a problem and see how the program solves it.
Extract: Conclusions
Conclusions
In section 2 we noted several aspects of STUDENT which we felt needed further work.  Let us compare CARPS and STUDENT in these areas.
1)  By storing information in terms of structures, CARPS is better able to recognize that two phrases describe the same object.
2)  Again because of the use of structures CARPS can gather information about an object in piecemeal fashion.  STUDENT was essentially required to generate one equation for each sentence in the problem description.  In calculus word problems it is not uncommon to have two or three sentences providing information for one equation.
3)  CARPS to a limited degree is able to use its knowledge to parse its input sentences.  For example we saw how ALTITUDE was interpreted as ALTITUDE OF THE FILTER because CARPS knew that since the filter was a cone and cones have altitudes, the filter had an altitude.
4)  Whereas STUDENT has only one soiutlon method (i.e., solution of linear equations), CARPS has several and can decide which is appropriate for a given problem.  CARPS machinery for solving its equations (e.g., differentiation, simplification) is also more complex than STUDENT'S.
5)  CARPS utilizes a more sophisticated gramtical analysis of the sentences than STUDENT.  This is used both in breaking up sentences and in generating the internal structures.  Up to now 14 problems have been solved by CARPS. Most were taken from standard texts, sometimes with slight modifications.  (Variations in the statements of problems would, of course, increase the number of problems solvable by CARPS.)  The other problems solved by CARPS are in Appendix A.
Many of the improvements that we claim for CARPS were necessitated by the increased
complexity of the problems that we expected it to solve.  However many weaknesses of STUDENT are still present in some form in our design.
1)  An important weakness in the program is due to its dependence on key words to signify the type of problem (i.e., distance or volume) and the method of solution to be used. What one would like to have in a calculus problem solver is a program which would use the information presented in the problem to figure out relationships among the elements (e.g., similar triangles) and actually propose the method of solution. The answers which could be provided by such a program are currently built into CARPS' data base for several cases, but this scheme severely limits the power of the program.
2)  Another weakness of CARPS is its limited knowledge of English syntax.  It would not be too difficult for CARPS to learn new syntactic rules by adding these rules to its CONVERT subroutines. Actually what would be more satisfying would be a different method of parsing the sentences into components of structures.  Currently the CONVERT rules are attempted one at a time until one matches the sentence. A better approach would be an incremental left to right parse which, when finished with the sentence, would have.translated it into the internal model.
3)  A very powerful calculus word problem solver will require a good deal of "cormnon sense'' knowledge.  Consider this problem which we gave to CARPS
(A LADDER 20.0 FT LONG LEANS AGAINST A HOUSE /.  FIND THE RATE AT WHICH THE TOP OF THE LADDER IS MOVING DOWNWARD IF ITS FOOT IS 12.0 FT FROM THE HOUSE AND MOVING AT THE RATE 2.0 ET PER SEC /.)
Much to our surprise CARPS was not able to solve it. A closer look at the problem shows why.  The last phrase mentions that the ladder is moving at the rate 2 ft per second.  CARPS has, as an internal check, the requirement that associated with each velocity must be the direction of the velocity.  The point is that the problem never gave this direction. Most people, however, would assume that it was moving horizontally away from the nouse.  T'ne reason of course, is that a familiarity with ladders or the law of gravity tells us that this is the most likely way for it to be moving.
This is not an isolated incident.  Consider the problem
(A BARGE WHOSE DECK IS 10 FT BELOW THE LEVEL OF A DOCK IS BEING DRAWN IN BY MEANS OF A CABLE ATTACHED TO THE DECK AND PASSING THROUGH A RING ON THE DOCK. WHEN THE BARGE IS 24 FT FROM AND APPROACHING THE DOCK AT 3/4 FT / SEC, HOW FAST IS THE CABLE BEING PULLED IN ?)
The ship is moving horizontally towards the dock. The problem does not mention this.
Nor is the need for a body of "real world" information unique to calculus problems.  Consider an example taken from a children's story.  "Mike's costume had big ruffles.  He refused to wear it." The question is "Why?". A program which will answer correctly must know that boys usually will not wear girl's clothes, and anything with ruffles on it must be for girls.
It is our belief that a program which is to understand second grade stories must have at its disposal a body of facts comparable to that of an average seven year old.  More generally, computer understanding of any body of literature will require a data base similar to that of a human reader.  The major problems facing computer understanding of natural language are, in our opinion, collecting, storing, and utilizing such large bodies of information.

in Donald E. Walker, Lewis M. Norton (Eds.): Proceedings of the 1st International Joint Conference on Artificial Intelligence IJCAI-69, Washington, DC, May 1969. William Kaufmann, 1969 view details
• Moses, J. The integration of a class of special functions with the Risch algorithm. Memo MAc-M-421, Proj. MAC, MIT, Sept. 1969. view details
in Donald E. Walker, Lewis M. Norton (Eds.): Proceedings of the 1st International Joint Conference on Artificial Intelligence IJCAI-69, Washington, DC, May 1969. William Kaufmann, 1969 view details
• Moses, Joel "Algebraic simplification: a guide for the perplexed" view details
in [ACM] CACM 14(08) August 1971 view details
• Moses, Joel "Symbolic integration: the stormy decade" view details Abstract: Three approaches to symbolic integration in the 1960's are described. The first, from artificial intelligence, led to Slagle's SAINT and to a large degree to Moses' SIN. The second, from algebraic manipulation, led to Manove's implementation and to Horowitz' and Tobey's reexamination of the Hermite algorithm for integrating rational functions. The third, from mathematics, led to Richardson's proof of the unsolvability of the problem for a class of functions and for Risch's decision procedure for the elementary functions. Generalizations of Risch's algorithm to a class of special functions and programs for solving differential equations and for finding the definite integral are also described.
in [ACM] CACM 14(08) August 1971 view details
• Engelman, C. "Algebraic Manipulation Languages" view details Extract: FORMAC
The best known, purely symbolic systems are, of course, Formac and its current version PL/IFORMAC (Petrick, 1971; pp. 105-114). Formac was the first widely available general-purpose algebraic manipulation system and served for a period to define the field. Certainly, there was a time when one could have safely made the statement that the majority of all mechanical symbolic mathematical computations had been done within Formac. The practical success of these systems, in spite of their rigidity with respect to user modifications and their lack of any seminumerical facilities for rational function computations, is probably due to the overall intelligence of the facilities that were provided. Above all, they were certainly sufficient to support the dominant application area of truncated power series expansion. Current support is minimal. Extract: Symbolic systems
SYMBOLIC SYSTEMS. We should mention first a sequence of three early programs for the simplification of general symbolic mathematical expressions represented as prefix-notation tree structures. The first, at M.I.T., was due to Hart, and the other two were due to Wooldridge and Korsvold at Stanford. The latter has survived in current usage as a result of its incorporation, subject to modification, into the MATHLAB, MACSYMA, and SCRATCHPAD systems.

In the mid-1960s there appeared two systems, Formula Algol and FAMOUS, which, while dedicated to the symbolic manipulation of mathematical expressions, presented the user with almost no built-in automatic simplification facilities. This was due, at least in the case of FAMOUS, to a conscious decision that, since the "simplicity" of an expression is surely context- dependent, it should be reasonable to present the user with complete control over the simplification process. That is, the user'should be compelled to define all transformations, rather than, as with most systems, be permitted simply to switch on and off the transformations supplied by the system architects. No system of this species has ever solved the inherent efficiency problems to the extent that it could serve more than didactic purposes. Probably neither Formula Algol nor FAMOUS could be revived today.

Another lost symbolic system of importance is the Symbolic Mathematical Laboratory of W. A. Martin. This system provided high-quality 2-D graphics on a DEC-340 display and was also the first to employ a light pen for subexpression selection. In some ways, it represented a degree of interaction that has not been duplicated by any subsequent system. Nor were its innovative internal programming techniques restricted to its graphics facilities. Of particular interest is the use of hash coding for subexpression matching (Petrick, 1971; pp. 305-310).
in Encyclopedia of Computer Science, Ralston, Anthony, and Meek, Chester L. (eds) New York, NY Petrocelli/Charter 1976 view details
• Geddes, K.O. ; Czapor S.R. and G. Labahn, "Algorithms for Computer Algebra" Kluwer Academic Publishers, Boston, 1992 view details Extract: Extract from Chapter one
A BRIEF HISTORICAL SKETCH
-------------------------

The development of systems for symbolic mathematical computation first became
an active area of research and implementation during the decade 1961-1971.
. . .
. . .

To put the decade 1961-1971 into perspective, let us recall that FORTRAN
appeared about 1958 and ALGOL in 1960. These two languages were designed
primarily for numerical mathematical computation.
Then in 1960/1961 came the development of LISP, a language for list
processing. LISP was a major advancement on the road to languages for
symbolic computation. An operation such as symbolic differentiation which
is foreign to FORTRAN and ALGOL is relatively easy in LISP. (Indeed this
is one of the standard programming assignments for students first learning
LISP.) As will be noted later, several computer algebra systems were
written in LISP.

1961-1966
---------

In 1961, James Slagle at M.I.T. wrote a LISP program called SAINT
for Symbolic Automatic INTegration.
This was one of the earliest applications of LISP to symbolic computation
and it was the first comprehensive attempt to program a computer to behave
like a freshman calculus student.
The program was based on a number of heuristics for indefinite integration
and it performed about as well as a good calculus student.

One of the first systems for symbolic computation was FORMAC, developed
by Jean Sammet, Robert Tobey, and others at IBM during the period 1962-1964.
It was a FORTRAN preprocessor (a PL/I version appeared later) and it was
designed for the manipulation of elementary functions including, of course,
polynomials and rational functions.
Another early system was ALPAK, a collection of FORTRAN-callable subroutines
written in assembly language for the manipulation of polynomials and rational
functions. It was designed by William S. Brown and others at Bell Laboratories
and was generally available about 1964.
A language now referred to as Early ALTRAN was designed at Bell Laboratories
during the period 1964-1966. It used ALPAK as its package of computational
procedures.

There were two other significant systems for symbolic computation developed
during this period. George Collins at IBM and the University of Wisconsin
(Madison) developed PM, a system for polynomial manipulation, an early
version of which was operational in 1961 with improvements added to the
system through 1966. The year 1965 marked the first appearance of MATHLAB,
a LISP-based system for the manipulation of polynomials and rational
functions, developed by Carl Engelman at M.I.T. It was the first interactive
system designed to be used as a symbolic calculator. Included among its
many firsts was the use of two-dimensional output to represent its
mathematical output.

The work of this period culminated in the first ACM Symposium on Symbolic
and Algebraic Manipulation held in March 1966 in Washington, D.C.
That conference was summarized in the August 1966 issue of the Communications
of the ACM.

1966-1971
---------

In 1966/1967, Joel Moses at M.I.T. wrote a LISP program called SIN
(for Symbolic Integrator). Unlike the earlier SAINT program, SIN was
algorithmic in approach and it was also much more efficient.
In 1968, Tony Hearn at Stanford University developed REDUCE, an
interactive LISP-based system for physics calculations. One of its
principal design goals was portability over a wide range of platforms,
and as such only a limited subset of LISP was actually used.
The year 1968 also marked the appearance of Engelman's MATHLAB-68,
an improved version of the earlier MATHLAB interactive system, and of
the system known as Symbolic Mathematical Laboratory developed by
William Martin at M.I.T. in 1967.
The latter was a linking of several computers to do symbolic manipulation
and to give good graphically formatted output on a CRT terminal.

The latter part of the decade saw the development of several important
general purpose systems for symbolic computation.
ALTRAN evolved from the earlier ALPAK and Early ALTRAN as a language and
system for the efficient manipulation of polynomials and rational functions.
George Collins developed SAC-1 (for Symbolic and Algebraic Calculations)
as the successor of PM for the manipulation of polynomials and rational
functions. CAMAL (CAMbridge Algebra system) was developed by David Barton,
Steve Bourne, and John Fitch at the University of Cambridge. It was
implemented in the BCPL language, and was particularly geared to
computations in celestial mechanics and general relativity.
REDUCE was redesigned by 1970 into REDUCE 2, a general purpose system
with special facilities for use in high-energy physics calculations.
It was written in an ALGOL-like dialect called RLISP, avoiding the
cumbersome parenthesized notation of LISP, while at the same time retaining
its original design goal of being easily portable.
SCRATCHPAD was developed by J. Griesmer and Richard Jenks at IBM Research
as an interactive LISP-based system which incorporated significant portions
of a number of previous systems and programs into its library, such as
MATHLAB-68, REDUCE 2, Symbolic Mathematical Library, and SIN.
Finally, the MACSYMA system first appeared about 1971.
Designed by Joel Moses, William Martin, and others at M.I.T., MACSYMA was
the most ambitious system of the decade.
Besides the standard capabilities for algebraic manipulation, it included
facilities to aid in such computations as limit calculations, symbolic
integration, and the solution of equations.

The decade from 1961 to 1971 concluded with the Second Symposium on
Symbolic and Algebraic Manipulation held in March 1971 in Los Angeles.
The proceedings of that conference constitute a remarkably comprehensive
account of the state of the art of symbolic mathematical computation in 1971.

1971-1981
---------

While all of the languages and systems of the sixties and seventies began
as experiments, some of them were eventually put into "production use''
by scientists, engineers, and applied mathematicians outside of the
original group of developers. REDUCE, because of its early emphasis on
portability, became one of the most widely available systems of this decade.
As a result it was instrumental in bringing computer algebra to the attention
of many new users. MACSYMA continued its strong development, especially
with regard to algorithm development. Indeed, many of the standard
techniques (e.g. integration of elementary functions, Hensel lifting,
sparse modular algorithms) in use today either came from, or were strongly
influenced by, the research group at M.I.T. It was by far the most powerful
of the existing computer algebra systems.

SAC/ALDES by G. Collins and R. Loos was the follow-up to Collins' SAC-1.
It was a non-interactive system consisting of modules written in the ALDES
(Algebraic DEScription) language, with a translator converting the results
to ANSI FORTRAN. One of its most notable distinctions was in being the only
major system to completely and carefully document its algorithms.
A fourth general purpose system which made a significant mark in the late
1970's was muMATH. Developed by David Stoutemyer and Albert Rich at the
University of Hawaii, it was written in a small subset of LISP and came
with its own programming language, muSIMP.
It was the first comprehensive computer algebra system which could actually
run on the IBM family of PC computers.
By being available on such small and widely accessible personal computers,
muMATH opened up the possibility of widespread use of computer algebra
systems for both research and teaching.

In addition to the systems mentioned above, a number of special purpose
systems also generated some interest during the 1970's. Examples of these
include: SHEEP, a system for tensor component manipulation designed by
Inge Frick and others at the University of Stockholm;
TRIGMAN, specially designed for computation of Poisson series and written
in FORTRAN by W. H. Jeffreys at University of Texas (Austin);
and SCHOONSCHIP by M. Veltman of the Netherlands for computations in
high-energy physics.
Although the systems already mentioned have all been developed in
North America and Europe, there were also a number of symbolic manipulation
programs written in the U.S.S.R. One of these is ANALITIK, a system
implemented in hardware by V. M. Glushkov and others at the Institute of
Cybernetics, Kiev.

1981-1991
---------

Due to the significant computer resource requirements of the major
computer algebra systems, their widespread use remained (with the exception
computing resources. With the introduction of microprocessor-based
workstations, the possibility of relatively powerful desk-top computers
became a reality. The introduction of a large number of different computing
environments, coupled with the often nomadic life of researchers (at least
in terms of workplace locations) caused a renewed emphasis on portability
for the computer algebra systems of the 1980's.
More efficiency (particularly memory space efficiency) was needed in order
to run on the workstations that were becoming available at this time,
or equivalently, to service significant numbers of users on the
time-sharing environments of the day.
This resulted in a movement towards the development of computer algebra
systems based on newer "systems implementation'' languages such as C,
which allowed developers more flexibility to control the use of
computer resources. The decade also marked a growth in the commercialization
of computer algebra systems. This had both positive and negative effects
on the field in general. On the negative side, users not only had to
pay for these systems but also they were subjected to unrealistic claims
as to what constituted the state of the art of these systems. However,
on the positive side, commercialization brought about a marked increase in
the usability of computer algebra systems, from major advances in user
interfaces to improvements to their range of functionality in such areas
as graphics and document preparation.

The beginning of the decade marked the origin of MAPLE.
Initiated by Gaston Gonnet and Keith Geddes at the University of Waterloo,
its primary motivation was to provide user accessibility to computer algebra.
MAPLE was designed with a modular structure: a small compiled kernel of
modest power, implemented completely in the systems implementation
language C (originally B, another language in the "BCPL family'')
and a large mathematical library of routines written in the user-level
MAPLE language to be interpreted by the kernel. Besides the command
interpreter, the kernel also contained facilities such as integer and
rational arithmetic, simple polynomial manipulation, and an efficient
memory management system. The small size of the kernel allowed it to be
implemented on a number of smaller platforms and allowed multiple users
to access it on time-sharing systems.
Its large mathematical library, on the other hand, allowed it to
be powerful enough to meet the mathematical requirements of researchers.

Another system written in C was SMP (Symbolic Manipulation Program) by
Stephen Wolfram at Caltech. It was portable over a wide range of machines
and differed from existing systems by using a language interface that was
rule-based. It took the point of view that the rule-based approach was the
most natural language for humans to interface with a computer algebra
program. This allowed it to present the user with a consistent,
pattern-directed language for program development.

MATHEMATICA and DERIVE.
MATHEMATICA is a second system written by Stephen Wolfram (and others). It
is best known as the first system to popularize an integrated environment
supporting symbolics, numerics, and graphics. Indeed when MATHEMATICA
first appeared in 1988, its graphical capabilities (2-D and 3-D plotting,
including animation) far surpassed any of the graphics available on
existing systems. MATHEMATICA was also one of the first systems to
successfully illustrate the advantages of combining a computer algebra
system with the easy-to-use editing features on machines designed to use
graphical user-interfaces (i.e. window environments). Based on C,
MATHEMATICA also comes with its own programming language which closely
follows the rule-based approach of its predecessor, SMP.

DERIVE, written by David Stoutemyer and Albert Rich, is the follow-up to
the successful muMATH system for personal computers. While lacking the
wide range of symbolic capabilities of some other systems, DERIVE has an
impressive range of applications considering the limitations of the 16-bit
PC machines for which it was designed.
It has a friendly user interface, with such added features as two-dimensional
input editing of mathematical expressions and 3-D plotting facilities.
It was designed to be used as an interactive system and not as a programming
environment.

Along with the development of newer systems, there were also a number of
changes to existing computer algebra systems. REDUCE 3 appeared in 1983,
this time with a number of new packages added by outside developers.
MACSYMA bifurcated into two versions, DOE-MACSYMA and one distributed by
SYMBOLICS, a private company best known for its LISP machines.
Both versions continued to develop, albeit in different directions,
was developed during this decade by Richard Jenks, Barry Trager,
Stephen Watt and others at the IBM Thomas J. Watson Research Center.
A successor to the first SCRATCHPAD language, it is the only
"strongly typed'' computer algebra system. Whereas other computer algebra
systems develop algorithms for a specific collection of algebraic domains
(such as, say, the field of rational numbers or the domain of polynomials
over the integers), AXIOM allows users to write algorithms over general
fields or domains.

As was the case in the previous decade, the eighties also found a number
of specialized systems becoming available for general use.
Probably the largest and most notable of these is the system CAYLEY,
developed by John Cannon and others at the University of Sydney, Australia.
CAYLEY can be thought of as a "MACSYMA for group theorists.''
It runs in large computing environments and provides a wide range
of powerful commands for problems in computational group theory.
An important feature of CAYLEY is a design geared to answering questions not
only about individual elements of an algebraic structure, but more
importantly, questions about the structure as a whole. Thus, while one
could use a system such as MACSYMA or MAPLE to decide if an element in a
given domain (such as a polynomial domain) has a given property (such as
irreducibility), CAYLEY can be used to determine if a group structure is
finite or infinite, or to list all the elements in the center of the
structure (i.e. all elements which commute with all the elements of the
structure).

Another system developed in this decade and designed to solve problems
in computational group theory is GAP (Group Algorithms and Programming)
developed by J. Neubueser and others at the University of Aachen, Germany.
If CAYLEY can be considered to be the "MACSYMA of group theory,'' then GAP
can be viewed as the "MAPLE of group theory.'' GAP follows the general
design of MAPLE in implementing a small compiled kernel (in C) and a large
group theory mathematical library written in its own programming language.

Examples of some other special purpose systems which appeared during this
decade include FORM by J. Vermaseren, for high energy physics calculations,
LiE, by A.M. Cohen for Lie Algebra calculations,
MACAULAY, by Michael Stillman, a system specially built for computations
in Algebraic Geometry and Commutative Algebra,
and PARI by H. Cohen in France, a system oriented mainly for number theory
calculations. As with most of the new systems of the eighties, these last
two are also written in C for portability and efficiency.

-------------------------------------------

Research in computer algebra is a relatively young discipline, and the
research literature is scattered throughout various journals devoted to
mathematical computation. However, its state has advanced to the point where
there are two research journals primarily devoted to this subject area: the
and "Applicable Algebra in Engineering, Communication and Computing"
Other than these two journals, the primary source of recent research
advances and trends is a number of conference proceedings.
Until recently, there was a sequence of North American conferences and
a sequence of European conferences.
The North American conferences, primarily organized by ACM SIGSAM
(the ACM Special Interest Group on Symbolic and Algebraic Manipulation),
include SYMSAM '66 (Washington, D.C.), SYMSAM '71 (Los Angeles),
SYMSAC '76 (Yorktown Heights), SYMSAC '81 (Snowbird),
and SYMSAC '86 (Waterloo).
The European conferences, organized by SAME (Symbolic and Algebraic
Manipulation in Europe) and ACM SIGSAM, include the following whose
proceedings have appeared in the Springer-Verlag series
"Lecture Notes in Computer Science":
EUROSAM '79 (Marseilles), EUROCAM '82 (Marseilles),
EUROCAL '83 (London), EUROSAM '84 (Cambridge),
EUROCAL '85 (Linz), and EUROCAL '87 (Leipzig).
Starting in 1988, the two streams of conferences have been merged
and they are now organized under the name ISSAC (International Symposium
on Symbolic and Algebraic Computation),
including ISSAC '88 (Rome), ISSAC '89 (Portland, Oregon),
ISSAC '90 (Tokyo), ISSAC '91 (Bonn) and ISSAC '92 (Berkeley).

-----------------------------------------------
Professor Keith Geddes
Symbolic Computation Group
Department of Computer Science
University of Waterloo
Waterloo  ON  N2L 3G1