Thomas(ID:3453/tho001)


Dylan compiler implemented in Scheme at DEC


Related languages
Dylan => Thomas   Compiled by
Thomas => Bob   Augmentation of

Resources
  • FTP site for Thomas at DEC

    "
  • Info for Thomas
    i
  • usenet: "Thomas" system now available
    Thomas, a compiler written at Digital Equipment Corporation's
    Cambridge Research Laboratory, is now available to the public.  Thomas
    compiles a language compatible with the language described in the book
    "Dylan(TM) an object-oriented dynamic language" by Apple Computer
    Eastern Research and Technology, April 1992.

    The Thomas system is written in Scheme and is available to run under
    any one of three public implementations of Scheme: MIT's CScheme,
    DEC's Scheme->C, and Marc Feeley's Gambit.  It can run on a wide range
    of machines including the Macintosh, PC compatibles, Vax, MIPS, Alpha,
    and 680x0.  Thomas generates IEEE compatible Scheme code.  The entire
    system (including sources) is available by anonymous ftp from:

           crl.dec.com:pub/DEC/Thomas
           gatekeeper.pa.dec.com:pub/DEC/Thomas
           altdorf.ai.mit.edu:archive/Thomas

    In building Thomas, our goals (in order of priority) were:

    (1) To learn about the Dylan(TM) language, by building an
        implementation based solely on the description in the book.

    (2) To help others learn about the language by producing source code
        for an implementation that was well structured, easy to read, and
        was publically available.

    (3) To build a system we could use to actually write small Dylan(TM)
        programs, to get a feel for the language through using it.

    We feel we have met these three goals as well as can be expected in a
    four week project with three people.  It was never our intention to
    produce an implementation that performs well, and Thomas has no
    optimizations of any kind.  It does not perform well.  This reflects
    our goals and not necessarily the design of the language itself.

    Thomas is NOT Dylan(TM).  We have not received approval for the use of
    the trademark, and we have not received a copy of a test suite other
    than the examples from the book itself.  We may, at some future date,
    pursue these issues with Apple.  The Thomas system was built with no
    direct input, aid, assistance or discussion with Apple.  All design
    and implementation decisions in Thomas reflect choices by the Thomas
    implementors based on reading the book published by Apple.  These
    decisions must not be construed in any way as deriving from Apple
    Computer Corporation or its employees.

    We have made every effort to minimize the differences between Thomas
    and Dylan(TM), and to remove bugs, but help from others would be
    greatly appreciated.  The original development team consisted of:

           Matt Birkholz (Birkholz@crl.dec.com)
           Jim Miller (JMiller@crl.dec.com)
           Ron Weiss (RWeiss@crl.dec.com)

    In addition, Joel Bartlett (Bartlett@wrl.dec.com), Marc Feeley
    (Feeley@iro.umontreal.ca), Guillermo Rozas (Jinx@zurich.ai.mit.edu)
    and Ralph Swick (Swick@crl.dec.com) contributed time and energy to the
    initial release.
    external link
  • usenet: What I learned from building Dylan/Thomas


    I've been maintaining an unintentional "radio silence" for the last
    several weeks while working with Ron Weiss and Matt Birkholz to build
    the Thomas system.  Now that I'm out of that hole, I'd like to update
    others on some of what I learned in the process and respond to a few
    of the questions raised earlier on these mailing lists.

    I. Despite all of the hairy stuff involving keywords, specializers,
    multiple inheritance and the rest of it, the generic dispatch system
    of Dylan (maybe also CLOS?) was remarkably simple to build and
    understand.  The key procedure is in the file generic.scm, and is
    simplified here (I removed support for multiple values, error
    checking, and renamed some functions for clarity):

       (define (generic-dispatch original-args generic-function)
         (let ((applicable-methods
             (sorted-applicable-methods
              (generic-function.methods generic-function)
              original-args)))
           (let next-method-loop ((remaining-methods applicable-methods)
                         (current-args original-args))
          (apply (car remaining-methods)
              (if (null? (cdr remaining-methods))
                  #F
                  (lambda (next-method . these-args)
                    (next-method-loop (cdr remaining-methods)
                             (if (null? these-args)
                                 current-args
                                 these-args))))
              current-args))))

    This requires one non-trivial support routine:

      sorted-applicable-methods finds the applicable methods and sorts
        them from most- to least-specific.  We used a topological sort of
        all of the classes in the system.  This is used to provide an
        ordering of the classes for comparing with individual arguments to
        the method, which seems logical enough:

        (define (match-specializer? object specializer)
          ;; Computes a distance value from the object to the specializer.
          ;; This distance is either
          ;;   #F  -- doesn't match the specializer
          ;;    0  -- the specializer is a singleton for exactly this object
          ;;   <n> -- ordering of the class representing this specializer in
          ;;          the ordering of all classes (always >= 1)
          (if (singleton? specializer)
           (if (eq? object (singleton.object specializer)) 0 #F)
           (if (subclass? (get-type object) specializer)
               (class.generality specializer)
               #F)))

        We then used lexicographic ordering to break ties for multiple
        argument dispatch, which is specified in Dylan but seems basically
        arbitrary.  This latter is a place where I'd expect a Scheme
        system to say "unspecified".

      I won't reproduce the intervening code, but if you look for a while
      you'll see that sorted-applicable-methods requires two things:

      a) A way, given a method under consideration, to find the
         specializers that restrict the types of arguments to which it can
         be applied.  In response to one of the questions on the Scheme
         list, this is where the need for WEAK-CONS arises.  Since methods
         are first-class in Dylan (Dylan/method <=> Scheme/lambda) and can
         be attached at any time to a generic function, it is necessary at
         runtime to be able to locate the specializers for a method.  This
         requires building a data structure (an a-list or hash table, for
         example) that uses methods as keys to locate their specializers.
         But such a table would grow indefinitely in size unless the
         garbage collector removes entries that correspond to methods that
         have been eliminated from the system.  [There is a similar need
         to locate a description of a generic function at runtime, used
         for the error detection mechanism I eliminated from this
         simplified code.]

      b) A mechanism for determining the most specific type of an object
         at runtime.  The procedure "get-type" (in the file
         runtime-top.scm) performs the operation.  It is a kludge and very
         order sensitive.  I found this function rather unpleasant to
         maintain as the system grew:

         (define (get-type obj)
           (cond ((instance? obj) (instance.class obj))
              ((number? obj)               ; Might be wrong
               (if (real? obj)
                (if (exact? obj)
                    (if (integer? obj)
                     <integer>
                     <ratio>)
                    <float>)
                <complex>))
              ((class? obj) <class>)
              ((singleton? obj) <singleton>)
              ((null? obj) <empty-list>)
              ((slot? obj) <slot-descriptor>)
              ((pair? obj) <pair>)
              ((vector? obj) <simple-object-vector>)
              ((string? obj) <byte-string>)
              ((char? obj) <character>)
              ((procedure? obj)
               (cond ((dylan::generic-function? obj) <generic-function>)
                  ((dylan::method? obj) <method>)
                  (else <function>)))
              ((keyword? obj) <keyword>)
              ((symbol? obj) <symbol>)
              (else <object>)))

         This observation has lead me back to a conversation we had about
         the "portable record proposal" which went unresolved at our last
         meeting.  The issue revolved around the pair of requirements for
         disjointness of types and opacity of representation.  Several of
         us discussed this at some length at the LISP conference and I
         came away convinced of the need for a set of operations, separate
         from the record package, for manufacturing disjoint user types.
         At the Authors meeting, Jonathan Rees pointed out that he had
         made such a proposal some time ago but that it had not been acted
         upon.  I'd like to have us act on it: Jonathan, can you find it
         and resubmit it?

    II. On the question about teaching generic dispatch as opposed to
    object dispatch, I find myself for the first time disagreeing with
    Brian Harvey.  While I haven't actually tried to teach this material
    for two years now, I seem to recollect that I taught numbers very
    early in the course.  They are the one place in Scheme where generic
    operation is deeply built into the system, and it provides a simple
    and logical place to start when explaining generic dispatch.  The
    symbolic algebra system at the end of Chapter 2 of Structure and
    Interpretation takes this much farther, but perhaps at the expense of
    the gut-level understanding of the issue that comes from simply asking
    the question "How does Scheme do that with its numbers, anyway?".  The
    very simple example of adding integers to real numbers explains a lot
    to anyone familiar with representations.  For those unfamiliar with
    numerical representations, the introduction of generic sequence
    operations (like the ones in Dylan, but simplified) is another obvious
    starting point.  I find the INITIAL-STATE, NEXT-STATE, CURRENT-ELEMENT
    iteration protocol very elegant (although my taste would add
    CURRENT-KEY to all collections, not just explicit key sequences; even
    at the cost of an extra cons-cell for the state of a list).

    I, personally, have always found the object dispatch method both
    limiting as a programming style and consequently rather unpleasant to
    teach.  Its main advantage, from what I've seen so far, is one of
    modularity.  The class heterarchy is conceptually elegant and works
    well, but unfortunately leads to a number of questions about where an
    object gets its behavior from, which inevitably leads to questions
    about HOW the object was implemented; this is precisely the
    abstraction violation that I thought object systems were intending to
    avoid.

    III. I've written my first large, portable Scheme program.  It
    required exactly two extensions to the language (WEAK operations, a
    minimal error system), but is modularized in a way that makes the port
    easy.  I like this way of writing my programs, and I think it gives me
    a new way to think about the issue of portability.  The Scheme
    language has the flexibility to allow most reasonable programs to be
    divided into an implementation-independent part which calls procedures
    defined in a implementation-dependent part.  This isn't new; but this
    is the first time I've actually tried to make it work and it did!  The
    ease of porting this system from one Scheme to another contrasts
    rather sharply with my attempts to port MIT Scheme's implementation
    from one version of C to another, and to similar experiences in
    porting Pascal and BCPL programs in an earlier existence.

    IV. To the Authors:

    I'd really like to see some hooks into the garbage collector
    standardized.  This implementation experience, as well as a paper I
    wrote several years ago that touches on some of the other things one
    can do using a garbage collector, convinced me that it isn't going to
    be that hard to figure out "the right set".  I see two "must do" items
    and two optional ones:

       MUST DO 1: Have a way to specify a set of "precious" items which
         the garbage collector is not at liberty to remove, but which it
         will attempt to monitor and report when they are no longer
         required for any other reason.  The two optional sets are
         specific implementations of this idea which exist in several
         current Scheme implementations, but which may not be the best
         abstraction of the service.

       MUST DO 2: Have a way to explicitly invoke the equivalent of the
         garbage collector's structure-preserving walk given an arbitrary
         starting object.  One form this could take is a procedure that
         takes the root object and a vector; it stores into the vector all
         the items found from walking that object and returns a count of
         all of the objects encountered (this allows the operation to
         occur (conceptually, at least) without CONSing).  An alternate
         formulation is to provide a function which takes an object and
         returns a collection of the sub-components of that object
         sufficient to allow a full structure walk from source-level
         Scheme.

       OPTIONAL 1: Have a way to run a procedure at the end of every
         garbage collection before returning to user code.  This is how
         the semi-portable code in Thomas implements populations and one
         dimensional "weak" tables on top of WEAK-CONS.  It is one way to
         perform the report function in MUST DO 1, but not the only
         possible way.  All three of the Schemes I used for Thomas (MIT,
         Scheme->C, and Gambit) have some such hook, but all have slightly
         different names and signatures for the function that installs the
         GC daemon(s).

       OPTIONAL 2: The WEAK-CONS set of operations (WEAK-PAIR?, WEAK-CONS,
         WEAK-C{A,D}R, SET-WEAK-C{A,D}R!) are now also available in all
         three Scheme systems.  It took Joel Bartlett only one day to add
         them to his conservative generational collector; they are nice
         because of their simplicity.  They lack the generality of MUST DO
         1, and would require something like the finalization proposal
         that was circulated before our last meeting (but not discussed).

    I'd also like to see some people take a good, hard look at the Thomas
    code for generic operations and the class hierarchy.  It is very
    portable and provides some very good capabilities.  I'd like someone
    to take an independent look at that code and see if it can be packaged
    nicely and put into the (putative) Scheme library.  If we can do that,
    we should be able to add a lot of the Thomas runtime system to the
    same Scheme library and have a very good start at a powerful
    programming library -- one thing that (IMHO) has long held up the
    adoption of Scheme as a "real" language.

    --Jim

    external link