LISP-STAT(ID:3276/)

Stats system in lisp 


Statistical system designed as an extensible dialect of Common LISP. Implemented as a dialect of XLISP


Related languages
LISP-STAT => XLISP-STAT   Implementation

References:
  • Tierney, L. "XLISP-STAT: A Statistical Environment Based on the XLISP Language (Version 2.0)," Technical Report No. 528, 1989 School of Statistics, University of Minnesota. view details Extract: Introduction
    XLISP-STAT is a statistical environment built on top of the XLISP programming language. This document is intended to be a tutorial introduction to the basics of XLISP-STAT. It is written primarily for the Apple Macintosh version, but most of the material applies to other versions as well; some points where other versions differ are outlined in an appendix. The first three sections contain the information you will need to do elementary statistical calculations and plotting. The fourth section introduces some additional methods for generating and modifying data. The fifth section describes some additional features of the Macintosh user interface that may be helpful. The remaining sections deal with more advanced topics, such as interactive plots, regression models, and writing your own functions. All sections are organized around examples, and most contain some suggested exercises for the reader.

    This document is not intended to be a complete manual. However, documentation for many of the commands that are available is given in the appendix. Brief help messages for these and other commands are also available through the interactive help facility described in Section 5.1 below.

    XLISP itself is a high-level programming language developed by David Betz and made available for unrestricted, non-commercial use. It is a dialect of Lisp, most closely related to the Common Lisp dialect. XLISP also contains some extensions to Lisp to support object-oriented programming. These facilities have been modified in XLISP-STAT to implement the screen menus, plots and regression models. Several excellent books on Common Lisp are available. One example is Winston and Horn [22]. A book on XLISP itself has recently been published. Unfortunately it is based on XLISP 1.7, which differs significantly from XLISP 2.0, the basis of XLISP-STAT 2.0.

    XLISP-STAT was originally developed for the Apple Macintosh. It is now also available for UNIX systems using the X11 window system, for Sun workstations under the SunView window system, and, with only rudimentary graphics, for generic 4.[23]BSD UNIX systems. The Macintosh version of XLISP-STAT was developed and compiled using the Lightspeed C compiler from Think Technologies, Inc. The Macintosh user interface is based on Paul DuBois' TransSkel and TransEdit libraries. Some of the linear algebra and probability functions are based on code given in Press, Flannery, Teukolsky and Vetterling [14]. Regression computations are carried out using the sweep algorithm as described in Weisberg [21].

    This tutorial has borrowed several ideas from Gary Oehlert's MacAnova user's Guide [13]. Many of the on-line help entries have been adopted directly or with minor modifications from the Kyoto Common Lisp System. Most of the examples used in this tutorial have been taken from Devore and Peck [11]. Many of the functions added to XLISP-STAT were motivated by similar functions in the S statistical environment [2,3].

    The present version of XLISP-STAT, Version 2.0, seems to run fairly comfortably on a Mac II or Mac Plus with 2MB of memory, but is a bit cramped with only 1MB. It will not run in less than 1Mb of memory. The program will occasionally bomb with an ID=28 if it gets into a recursion that is too deep for the Macintosh stack to handle. On a 1MB Mac it may also bomb with an ID=15 if too much memory has been used for the segment loader to be able to bring in a required code segment.

    Development of XLISP-STAT was supported in part by grants of an Apple Macintosh Plus computer and hard disk and a Macintosh II computer from the MinneMac Project at the University of Minnesota, by a single quarter leave granted to the author by the University of Minnesota, by grant DMS-8705646 from the National Science Foundation, and by a research contract with Bell Communications Research. Extract: Why XLISP-STAT Exists
    Why XLISP-STAT Exists
    There are three primary reasons behind my decision to produce the XLISP-STAT environment. The first is to provide a vehicle for experimenting with dynamic graphics and for using dynamic graphics in instruction. Second, I wanted to be able to experiment with an environment supporting functional data, such as mean functions in nonlinear regression models and prior density and likelihood functions in Bayesian analyses. Finally, I was interested in exploring the use of object-oriented programming ideas for building and analyzing statistical models. I will discuss each of these points in a little more detail in the following paragraphs.

    The development of high resolution graphical computer displays has made it possible to consider the use of dynamic graphics for understanding higher-dimensional structure. One of the earliest examples is the real time rotation of a three dimensional point cloud on a screen -- an effort to use motion to recover a third dimension from a two dimensional display. Other techniques that have been developed include brushing a scatterplot -- highlighting points in one plot and seeing where the corresponding points fall in other plots. A considerable amount of research has been done in this area, see for example the discussion in Becker and Cleveland [4] and the papers reproduced in Cleveland and McGill[8]. However most of the software developed to date has been developed on specialized hardware, such as the TTY 5620 terminal or Lisp machines. As a result, very few statisticians have had an opportunity to experiment with dynamic graphics first hand, and still fewer have had access to an environment that would allow them to implement dynamic graphics ideas of their own. Several commercial packages for microcomputers now contain some form of dynamic graphics, but most do not allow users to customize their plots or develop functions for producing specialized plots, such as dynamic residual plots. XLISP-STAT provides at least a partial solution to these problems. It allows the user to modify a scatter plot with Lisp functions and provides means for modifying the way in which a plot responds to mouse actions. It is also possible to add functions written in C to the program. On the Macintosh this has to be done by adding to the source code. On some unix systems it is also possible to compile and dynamically load code written in C or FORTRAN.

    An integrated environment for statistical calculations and graphics is essential for developing an understanding of the uses of dynamic graphics in statistics and for developing new graphical techniques. Such an environment must essentially be a programming language. Its basic data types must include types that allow groups of numbers -- data sets -- to be manipulated as entire objects. But in model-based analyses numerical data are only part of the information being used. The remainder is the model itself. Sometimes a model is easily characterized by specifying a set of numbers. A normal linear regression model with  errors might be described by the number of covariates, the coefficients and the error variance. On the other hand, in many cases it is easier to specify a model by specifying a function. To specify a normal nonlinear regression model, for example, one might specify the mean function. If our language is to allow us to specify this function within the language itself then the language must support a functional data type with full rights: It has to be possible to define functions that manipulate functions, return functions, apply functions to arguments, etc.. The choice I faced was to define a language from scratch or use an existing language. Because of the complexity of issues involved in functional programming I decided to use a dialect of a well understood functional language, Lisp. The syntax of Lisp is somewhat unfamiliar to most users of statistical packages, but it is easy to learn and several good tutorials are available in local book stores. I considered the possibility of using Lisp to write a top level interface with a more "natural'' syntax, but I did not see any way of doing this without complicating access to some of the more powerful features of Lisp or running into some of the pitfalls of functional programming. I therefore decided to retain the basic Lisp top level syntax. To make the manipulation of numerical data sets easier I have redefined the arithmetic operators and basic numerical functions to work on lists and arrays of data.

    Having decided to use Lisp as the basis for my environment XLISP was a natural choice for several reasons. It has been made available for unrestricted, non-commercial use by its author, David Betz. It is small (for a Lisp system), its source code is available in C, and it is easily extensible. Finally, it includes support for object-oriented programming. Object-oriented programming has received considerable attention in recent years and is particularly natural for use in describing and manipulating graphical objects. It may also be useful for the analysis of statistical data and models. A collection of data and assumptions may be represented as an object. The model object can then be examined and modified by sending it messages. Many different kinds of models will answer similar questions, thus fitting naturally into an inheritance structure. XLISP-STAT's implementation of linear and nonlinear regression models as objects, with nonlinear regression inheriting many of its methods from linear regression, is a first, primitive attempt to exploit this programming technique in statistical analysis
  • Tierney, L "LISP-STAT: An Object-Oriented Environment for Statistical Computing and Dynamic Graphics", Wiley New York, NY: 1990. view details
  • Tierney, Luke "Generalized Linear Models in Lisp-Stat," Technical Report No. 557, (1991), School of Statistics, University of Minnesota. view details
  • Tierney, Luke "Recent developments and future directions in Lisp-Stat" Technical Report No. 608, (1995), School of Statistics, University of Minnesota view details Extract: Introduction
    Introduction
    Lisp-Stat is an extensible statistical computing environment for data analysis, statistical instruction and research, with an emphasis on providing a framework for  exploring the use of dynamic graphical methods. Extensibility is achieved by basing Lisp-Stat on the Lisp language, in particular on a subset of Common Lisp. Lisp-Stat extends  standard Lisp arithmetic operations to perform element-wise operations on lists and vectors, and adds a variety of basic statistical and linear algebra functions. A portable window  system interface forms the basis of a dynamic graphics system that is designed to work  identically in a number of different graphical user interface environments, such as the Macintosh operating system, the X window system, and Microsoft Windows. A prototype-based  object-oriented programming system is used to implement the graphics system and to allow  it to be customized and adapted. The object-oriented programming system is also used as  the basis for statistical model representations, such as linear and nonlinear regression models  and generalized linear models.
    Lisp-Stat was first release in 1989. It has been used for data analysis, as a research tool, and for implementing several larger projects (e. g. Cook and Weisberg, 1994; Young, 1993).  Based on experience gained from this use, the system is currently being redesigned. The redesign is evolutionary, with backward compatibility a major objective. The redesign project  can be divided into six major segments: the basic Lisp system, data representation and  operating system issues, the object system, the graphical system, the statistical component,  and the user interface. They will be attacked in this order. The redesign of the basic Lisp  system is nearly complete, and some of the changes are outlined in the next section. The  third section describes some of the issues involved in the later stages of the revision; this  section is more speculative in nature. The final section briefly discusses the importance of  extensibility in a statistical software environment. Extract: New features
    New features
    Lisp-Stat was originally designed as a specification to be implemented on various Lisp systems. The requirement on the Lisp system base is that it support an appropriate subset of  the Common Lisp standard (Steele, 1990). The reason for using the Common Lisp specification as a base was that Common Lisp is a rich, high-level language with many features that  are already provided and do not need to be designed and documented from scratch. Even  though the XLISP language lacked some important Common Lisp features, it was useful as  an initial implementation base for Lisp-Stat since it was small and freely available in source  form. It was hoped that a transition to a full Common Lisp implementation could be made  in the future.
    Unfortunately this hope has not been fulfilled. There are a number of reasons, including the continued high cost of commercial Common Lisp implementations, the uncertain future  of free and of commercial implementations, and the lack of standardization in window system  and foreign function interfaces. Instead, XLISP has been brought closer to a full Common  Lisp implementation by adding many Common Lisp functions and some key missing features.  The most important added features are multiple values, packages, typed vectors, and a byte  code compiler. Other changes include a new garbage collector and new random number  generators. Many of these changes have been folded into the standard XLISP distribution.  Other features contributed to the standard distribution by Tom Almy and others have also  been or will shortly be incorporated into the XLISP-STAT base. In particular, Tom Almy's  unlimited precision integer arithmetic functions will be added in the near future.
    2.1 New Common Lisp Features in XLISP
    2.1.1 Multiple Values
    Multiple values are useful when a function needs to return one primary value and several secondary ones that may be but often are not of interest. Using multiple values avoids the  need to make and take apart a list. The hash table lookup function gethash, for example,  returns the item found as its first value or NIL if no item is found. A second value is t if an  item was found, NIL otherwise. This makes it possible to distinguish an item with value NIL  from an item not found.
    Several other high level languages support multiple values. One example is MATLAB. For example, the eig function in MATLAB returns only the eigen values when a single  answer is requested; if two values are asked for, it returns the eigenvectors and eigenvalues.  Functions that are only called for their side effects can return no values.
    2.1.2 Packages
    If a language is to allow the development of substantial subsystems, then it is critical to provide some form of name space management to allow a system to export only its interface  and to hide and protect implementation details. Common Lisp manages name spaces by  organizing its symbols into collections called packages. Each package is divided into internal  and external (or exported) symbols. A package can use other packages, thus making their  symbols accessible within the package. Within a package, only symbols in the package and  external symbols of packages used by the package can be referenced directly using their  names. As a simple example, if a file contains the code
    (defpackage ''MY-PACKAGE''
    (:use ''COMMON-LISP'')
    (:export ''MY-FUNCTION''))
    (defun utility () ...)
    (defun my-function () ... (utility) ...)
    then all symbols in the ''COMMON-LISP'' package and all symbols like utility that are in ''MY-PACKAGE'' are accessible in ''MY-PACKAGE'', but only the exported symbol my-function  will be available to other packages that use ''MY-PACKAGE''.
    Packages are not modules in the sense of ADA, MODULA-2 or MODULA-3, and they have many shortcomings: They do not allow separate exporting of variables and functions,  only symbols; symbols cannot be imported under alternate names; there is no support for  organizing separate compilation of system components. But they are a useful first step and  can be used as the basis for more sophisticated module systems. Support for Common Lisp  packages is now available in XLISP; a proper module system (e. g. Curtis and Rauen, 1990;  Davis et al., 1994) may be added in the future.
    2.1.3 Pathnames
    The Common Lisp pathname functions allow the portable specification of hierarchical directory structures. For example, the expression
    (make-pathname :directory '(:relative ''a'') :name ''b'')
    produces ''a/b'' in UNIX, ''a"b'' in MD DOS, and '':a:b'' on the Macintosh. Using these functions it is possible to describe directory structures of a system in a portable way.
    2.1.4 Typed Vectors
    Vectors and arrays can be restricted to contain only elements of certain specified types. This allows more efficient storage of floating point data and also facilitates the interface to  C code by allowing the address of the vector data to be passed directly to a C function.  The linear algebra subsystem of Lisp-Stat is being re-implemented to take advantage of this  ability. In particular, an interface to Level 1 BLAS and some Level 2 and Level 3 BLAS  routines (Anderson et al., 1992) will be provided to allow destructive modification of floating  point arrays. The details of the interface are still under development. Once they have been  completed, they will allow users to implement efficient linear algebra routines at the Lisp  level.
    2.2 The Byte Code Compiler
    The byte code compiler translates a Lisp function definition into a string of bytes that form an instruction sequence for a virtual machine (VM), a fast interpreter for the byte  code language. Interpreting byte code is not as fast as executing native machine code, but  with a good design the interpreter overhead can be minimized. Byte codes themselves are  usually machine-independent, thus making it possible to transfer byte compiled files from  one machine to another. In addition, the VM can be implemented in C, thus eliminating  hardware dependencies of a native code compiler.
    To illustrate what the compiler does, consider the function for adding up a list of numbers shown in Figure 1a. The dolist macro is expanded in Figure 1b to show the options for  local transfer of control in the loop body (the inner tagbody) and for nonlocal exit (the  enclosing block) that an interpreter has to consider. The compiler recognizes that neither  of these is needed and is able to simplify the code down to the set of instructions shown in  Figure 2.
    Unlike many other byte code VM's, the XLISP VM is not based on a stack model. Instead, the basic instructions are of a three-address-code nature (Aho et al., 1986, Chapter  8). Thus the instruction (add2 x y z) adds the values stored at offsets x and y and stores  the result at offset z from the current frame base. This design seems to produce faster  code than a stack-based design for benchmarks that should be representative of statistical  applications. When the example function given here is applied to a list of 1000 integers, the  byte compiled code is approximately ten times faster than the interpreted version. Functions  in which most iteration is already done in the vectorized code will experience a much smaller  improvement.
    The use of byte codes has a long history, including, for example, the pcode of the UCSD Pascal system. Recent versions of the Microsoft C compiler have re-introduced the use of  byte code as an option to take advantage of the fact that byte code is often more compact  than native machine code. Another recent use of byte code is in the Java language (Gosling,  1995), where the machine-independence of byte code is used to allow transferring compiled  small applications, or applets, for local use by the HotJava World Wide Web browser.  The XLISP byte code compiler is based on the design of the ORBIT Scheme compiler  (Krantz et al., 1986), which uses conversion to continuation passing style to support a variety  of code transformation optimizations (Friedman et al., 1992). The code produced is properly  tail recursive; thus iterative computations expressed using recursion will be compiled to  iterative code.
    Even though the XLISP byte code compiler can already speed up computations considerably, there is still room for improvement. Additional code analysis and support for type  declarations will in some cases allow direct use of native machine data types for integers and  floating point numbers instead of boxed representations. Optimization strategies designed  to improve imperative code, such as static single assignment analysis, which can be related  to continuation passing representation (Kelsey, 1995), may also help. It is also possible to  replace byte code on a particular machine by threaded code, or to generate C code from the  intermediate assembly code and use a local C compiler to produce native code.
    The compiler developed up to now is a standard Lisp compiler with only very minimal adaptations to statistical applications. Future work will explore the possibility of incorporating support for vectorized arithmetic and graphical operations at the compiler level in  order to optimize performance in statistical applications.
    2.3 New Garbage Collection System
    The original XLISP memory management system used a mark-and-sweep garbage collector. This collector has the advantage of requiring only two bits of storage per node to implement,  but the disadvantage of scanning the entire heap on each collection. With a large heap  this can result in pauses long enough to degrade interactive performance. To address this  problem, the mark-and-sweep collector was replaced by a simple two-generation generational  collector in the spirit of Appel (1989). Generational collectors are based on the assumption  that most allocated objects are very short-lived. By distinguishing recently allocated objects  from older ones, the collector can usually reclaim adequate space from minor collections in  which only the newer nodes are examined. Only rarely is a full collection involving all nodes  required. Since the number of active new nodes in the system at any given time is usually  very small, the minor collections are very fast and hardly noticeable. Major collections take  about as long as mark-and-sweep collections, but occur much less frequently.
    Generational collectors are usually implemented as copying collectors, but the resulting data motion would make designing functions that call back to XLISP from C or FORTRAN  quite difficult. A treadmill-type in-place design (Baker, 1992; Wilson, 1992) was therefore  used. The nominal space overhead for this approach is considerably larger than for mark-and-sweep: six bytes per node on 32-bit hardware. However on many workstations alignment  requirements force enough free space into each node to accommodate this overhead, thus  eliminating the space cost on these systems. A compromise that may be worth exploring is  to have a first generation that is copied into a fixed second generation. This may provide the  advantages of fast allocation achieved by copying collectors without some of the drawbacks  that moving data has for call-backs (Doligez and Leroy, 1993).
    More work is needed to optimize tuning of the new memory management system to typical statistical activities. The use of adaptive tuning strategies may be explored as well.  Support for weak pointers and finalization will also be added.
    2.4 New Random Number Generators
    The Marsaglia lagged Fibonacci generator used in older versions of XLISP-STAT has been replaced as the default generator by L'Ecuyer's version of the Wichman-Hill generator  (L'Ecuyer, 1986; Bratley et al., 1987, Algorithm UNIFL). The original generator is still  available, mainly to allow results produced with this generator to be reproduced. Two additional generators are available as well, Marsaglia's Super-Duper generator as used in S,  and a combined Tauseworthe generator of Tezuka and L'Ecuyer (1991). Random states now  contain both generator and seed information. Having several very different generators available is useful for examining the possible sensitivity of simulation results to the generation  mechanism. At present the set of available generators is fixed. In the future, a mechanism  for adding new generators will be provided. Extract: Future Directions
    3 Future Directions
    3.1 Additional Data Representations
    Until recently data sets of floating point numbers could only be represented in Lisp-Stat as lists or as generic vectors. This requires storing each number in a separate node, and can be  quite wasteful. With the addition of typed arrays, it is now possible to use more compact  storage. Once typed arrays have been fully integrated, this should increase the size of data  sets that can be handled conveniently on standard memory configurations to the level of  hundreds of thousands of observations. For larger data sets in the range of several millions  of observations, more effective representations will be needed. One possibility is to allow the  contents of disk files to be treated as an array. Memory mapped file support may be useful  on operating systems where it is available. Since large data sets might only be accessible  over a network, remotely stored arrays should be supported as well. To reflect the fact that  files may be read-only, it will be necessary to allow arrays to be made read-only as well.  It will also be useful to be able to reference smaller subsets of larger arrays indirectly, to  support shared sub-arrays.
    Once adequate support for basic handling of larger data sets is available, algorithms for sparse array manipulation will need to be added, and other algorithms will need to be  re-examined to insure that they have adequate numerical properties even for large input  arrays. To support adding new algorithms, the current minimal C and FORTRAN interface  will need to be improved. Recent developments that have resulted in the inclusion of shared  libraries in most operating systems will greatly facilitate this effort.
    3.2 Communication and Parallel Processing
    The ability to communicate with other applications running locally or remotely is becoming increasingly important. Several new languages have been proposed recently with a structure designed to allow them to take advantage of features of the World Wide Web. Two  examples are Java (Gosling and McGilton, 1995) and Obliq (Cardelli, 1995). Lisp-Stat has  already been used as a teaching tool in conjunction with the World Wide Web (Rossini and  Rosenberger, 1994). Its use with the Web can be enhanced by adding some of the ideas  found in Java as well as some lower level communication mechanisms. Security issues that  have played a major role in the design of Java will also need to be examined to insure that  Lisp-Stat can be used safely with the Web.
    Adding basic interprocess communication mechanisms such as sockets and X properties for UNIX, Apple events for the Macintosh, and DDE and OLE for MS Windows, will allow Lisp-Stat to take advantage of other applications available in those environments. In  addition, in a networked environment these mechanisms can form the basis of a parallel processing environment. The PVM system under UNIX (Geist et al., 1994) is designed around  this approach. Either a similar system can be implemented, or an interface to PVM can be  provided to allow Lisp-Stat to take advantage of the multiple workstation environments that  are now quite common.
    Another form of parallelism worth exploring is the use of threads or light-weight processes with shared global memory. Allowing long-running computations to coexist with a graphical  user interface is accomplished much more naturally with a threads mechanism than the form  of manual implementation that is currently required. In addition, threads allow a system to  take advantage of shared memory multiprocessors which are also becoming more common.  The SR language (Andrews and Olsson, 1993) provides a useful framework for integrating  both separate processes and threads.
    One component of Lisp-Stat that is inherently parallel, though the current implementation is serial, is the vectorized arithmetic system. Recent advances in the understanding  of nested parallel vector languages (NESL Blelloch, 1994; Proteus Goldberg et al., 1994)  may be useful in redesigning this system to be more expressive by making it easier to define  vectorized functions at the user level, and more efficient by allowing parallel architecture to  be exploited when it is available. One possibility is to re-implement the Lisp-Stat vectorized  arithmetic system using the CVL library (Blelloch et al., 1994), which provides implementations for workstations, the Connection Machines CM2 and CM5, the Cray Y-MP and the  MasPar MP2.
    3.3 The Object System
    The Lisp-Stat object system is both unusual and conventional. It is unusual in being based on prototypes rather than classes, and it is conventional in using only single dispatching for  handling methods. The use of prototypes instead of classes seems to have been successful,  and a number of recent object-oriented languages with a similar emphasis on interactive use  have taken this route as well. Many Lisp-based object systems, such as CLOS (Steele, 1990),  the EuLisp object system (Padget et al., 1994), and Dylan (Apple Computer, 1994) use  multiple dispatching. Other languages that use multiple dispatching are Cecil (Chambers,  1993) and S. Multiple dispatching has more expressive power than single dispatching, but  also represents a more complex programming paradigm. Most work on object-oriented design  (e. g. Rumbaugh et al., 1991) is based on the single dispatch model. Only recently have  researchers begun to formulate a framework for understanding multiple dispatch (Chambers,  1992). If these efforts are successful, then it may be worth reconsidering the use of multiple  dispatching. For now, single dispatching appears adequate and better understood.
    There are situations where it would be useful to develop specialized object-oriented subsystems to support a particular project. This might be to provide increased efficiency or  increased expressive power. Such a system can be built from scratch, but would be easier  to construct if it could leverage off of the existing system. The need for customized object  systems has lead to the development of meta-object protocols (Kiczales et al., 1991; Padget  et al., 1994). It may prove useful to design a meta-object protocol for Lisp-Stat as well and  to implement the current protocol as a special case.
    An area of considerable current research and commercial interest is the development of standards for linking objects in separate applications and on remote systems. Some of the  projects with this objective are OpenDoc, OLE, SOM, CORBA, and ILU. Most approaches  seem to be working towards compliance with CORBA. ILU (Janssen et al., 1995), which  provides a CORBA interface, or Fresco (Linton and Price, 1993), which is based on CORBA,  may provide an effective means for integrating object linking into Lisp-Stat. Providing a  standard linking mechanism will allow Lisp-Stat to more easily communicate with other  programs, either using them as compute engines or serving them as a compute engine. It  will also allow Lisp-Stat sessions on separate workstations to communicate with one another  in a transparent fashion.
    The current Lisp-Stat object system does not provide a standardized broadcasting mechanism for efficiently distributing change notifications to interested objects. At present such  broadcasts have to be implemented by hand (Tierney, 1993). An efficient, standard mechanism is needed to adequately support the Model-View-Controller paradigm that has become  central to graphical user interface design. A mechanism similar to the one used in Smalltalk  will need to be incorporated.
    3.4 The Graphics System
    The current Lisp-Stat graphics system was designed as a compromise between flexibility, simplicity, and efficiency. The goal of redesigning the graphics system is to increase the  flexibility of the system while maintaining or improving on simplicity and efficiency. For  example, the original design identifies plots with their containing windows. This simplifies  the user model for dealing with plots, but prevents placing multiple plots in the same window.  Similarly, dialog items were considered part of special dialog windows, thus preventing the  integration of standard dialog items with plot windows.
    The new design will support a hierarchical window structure in which each top level window contains a nested hierarchy of widgets. Each widget can be an elementary item  such as a button or a slider, or another collection of widgets. Geometry managers will be  provided to facilitate display-independent layout management. The design of the Tk toolkit  (Ousterhout, 1994) may provide a useful model to explore. With increases in workstation  speed experienced in recent years it may also be possible to represent plots as collections  of widgets. This will again increase flexibility, but may need to be deferred if it is still too  costly in performance on current hardware.
    In addition to supporting standard widgets and widgets defined in Lisp-Stat, the graphics system should also support externally defined widgets, such as OLE controls. The ability  to embed widgets related to other processes running locally or remotely also needs to be  explored. The Fresco toolkit (Linton and Price, 1993), which is based on the CORBA  standard for distributed objects, may provide a useful model or a possible basis for this  development.
    Within the statistical graphs themselves, it would be helpful to provide more programmability to layout features, such as the axes on a plot. It would also be useful to provide  primitives for managing symbols or other glyphs that represent groups of points rather than  just individual points. This would provide a useful superstructure for histograms as well as  for binned scatterplots (Carr, 1991)
    Finally, it would be useful to allow closer adaptation to native GUI standards, but without sacrificing code portability. This is difficult to achieve, but is facilitated somewhat by the  convergence of features in different GUI's, such as the Macintosh, MS Windows, and Motif,  that has occurred over the last few years.
    3.5 Models and Data
    The current statistical model system has proven quite effective for code re-use, but it does have some design features that are now generally considered to be unfortunate form the  point of view of object-oriented design (Rumbaugh et al., 1991). In particular, it would be  better to design a nonlinear regression model to have a linear regression model component  for code re-use (a has-a relationship) and delegate appropriate messages to this component,  instead of having nonlinear regression models inherit from the linear regression model (an is-a  relationship). Using inheritance means that even inappropriate methods are inherited; using  containment and delegation provides more reasonable control. To assist with this change, the  object system should be modified to provide direct support to delegating messages received  by one object to another object, usually a slot value of the original receiver.
    It will also be necessary to design a useful data set prototype that is capable of storing attribute information, such as whether the values of data are to be interpreted as numerical  values or factor levels. The lack of such a system has forced several users to develop variants  of their own.
    3.6 Syntax and User Interface Issues
    Lisp syntax is often perceived as a bit of an impediment to the use of the language. There is considerable debate about the degree to which this impediment is real or perceived. The  success of Lisp-Stat to date suggests it may be less of an issue that is sometimes claimed.  Nevertheless, alternate syntaxes, at least for parts of the system, are worth exploring. For  example, a simple infix parser may be useful for specifying mathematical formulas. It may  also be useful to develop a simple vector subscripting language similar to the ones used in S  or MATLAB.
    Developing a textual syntax that is more natural, in some sense, than Lisp's parenthesized prefix syntax while remaining as powerful is a difficult task. Even though discussions of  syntax pros and cons often focus on a comparison of infix and prefix notation, probably  a more significant aspect of Lisp syntax that can make it hard to follow at times is that  there are no syntactic cues to help distinguish special forms, or syntactic keywords, from  standard functions -- the programmer has to know which symbols refer to special forms.  This is a weakness of Lisp syntax, but at the same time it is also a great strength: there  are no syntactic impediments to the introduction of new special forms. This makes Lisp a  programmable language that can be used to define new, problem-specific languages (Graham,  1994). Achieving this level of flexibility with an infix syntax is extremely difficult; this is  reflected by the long delay introduced into the Dylan project by their decision to adopt an  infix syntax (Apple Computer, 1994)
    A promising alternative to a textual syntax is a visual one. Research on visual languages has met with some successes (Cox et al., 1989; Khoros Rasure et al., 1990; Burnett et  al., 1994) and has also seen some application in statistical computing (Oldford and Peters,  1988). Another interesting and related area of research is programming by example, or  programming by demonstration (Cypher, 1993). It is still too early to tell whether there  are visual paradigms that are sufficiently universal to be intuitive and easy to use, while  at the same time retaining the expressive power of their textual counterparts. But even if  these approaches cannot entirely replace a textual syntax, it may be possible to develop very  useful and effective visual interfaces to significant portions of a statistical system. This could  greatly enhance the ease of use for those portions, but it comes at a price: Unless all features  are accessible using a visual interface, a barrier is established between those portions that  are and those that are not. Learning to use simpler aspects of the system does not provide  any assistance at reaching beyond this barrier. The result could be to discourage, rather  than encourage, experimentation and development; this would be unfortunate. Extract: Discussion
    4 Discussion
    The major objective of Lisp-Stat is to provide a flexible system that can easily be extended both in its numerical and its graphical capabilities. The current system represents a first  step; the revisions currently in progress are designed to bring it closer towards this goal.  Extensibility is critical for a system to be able to adapt to new statistical problems and  ideas. Having an extensible system allows research to progress more rapidly, since new ideas  are easier to test and refine. But it also gives a data analyst more flexibility to adapt methods  to a problem instead of having to adapt problems to available methods. In short, having an  extensible computing environment helps to reduce the gap between statistical research and  practice, which is to the benefit of both.
    The Heidelberg Workshop where this paper was presented provided a nice opportunity to illustrate the advantages of extensibility. In an evening session Andreas Buja presented a new  idea for interactively controlling a tour of four-dimensional space (Buja and ???, 1996). The  idea was clearly excellent, but it was hard to appreciate fully without being able to try it out.  It was promised that the idea would be incorporated in a future release of XGobi, but it was  not clear when that might be available. Fortunately, by taking advantage of the extensible  nature of Lisp-Stat, I was able to put together a simple implementation in an hour or two  that evening, and could then begin to experiment with it the next day. The implementation  was pedestrian to be sure, but adequate as a prototype. Having the prototype to experiment  with helped to underscore the quality of the basic idea.