Thomas(ID:3453/tho001)
Dylan compiler implemented in Scheme at DEC
Related languages
Dylan |
=> |
Thomas | |
Compiled by |
Thomas |
=> |
Bob | |
Augmentation of |
Resources - FTP site for Thomas at DEC
"
- Info for Thomas
i
- usenet: "Thomas" system now available
Thomas, a compiler written at Digital Equipment Corporation's Cambridge Research Laboratory, is now available to the public. Thomas compiles a language compatible with the language described in the book "Dylan(TM) an object-oriented dynamic language" by Apple Computer Eastern Research and Technology, April 1992.
The Thomas system is written in Scheme and is available to run under any one of three public implementations of Scheme: MIT's CScheme, DEC's Scheme->C, and Marc Feeley's Gambit. It can run on a wide range of machines including the Macintosh, PC compatibles, Vax, MIPS, Alpha, and 680x0. Thomas generates IEEE compatible Scheme code. The entire system (including sources) is available by anonymous ftp from:
crl.dec.com:pub/DEC/Thomas gatekeeper.pa.dec.com:pub/DEC/Thomas altdorf.ai.mit.edu:archive/Thomas
In building Thomas, our goals (in order of priority) were:
(1) To learn about the Dylan(TM) language, by building an implementation based solely on the description in the book.
(2) To help others learn about the language by producing source code for an implementation that was well structured, easy to read, and was publically available.
(3) To build a system we could use to actually write small Dylan(TM) programs, to get a feel for the language through using it.
We feel we have met these three goals as well as can be expected in a four week project with three people. It was never our intention to produce an implementation that performs well, and Thomas has no optimizations of any kind. It does not perform well. This reflects our goals and not necessarily the design of the language itself.
Thomas is NOT Dylan(TM). We have not received approval for the use of the trademark, and we have not received a copy of a test suite other than the examples from the book itself. We may, at some future date, pursue these issues with Apple. The Thomas system was built with no direct input, aid, assistance or discussion with Apple. All design and implementation decisions in Thomas reflect choices by the Thomas implementors based on reading the book published by Apple. These decisions must not be construed in any way as deriving from Apple Computer Corporation or its employees.
We have made every effort to minimize the differences between Thomas and Dylan(TM), and to remove bugs, but help from others would be greatly appreciated. The original development team consisted of:
Matt Birkholz (Birkholz@crl.dec.com) Jim Miller (JMiller@crl.dec.com) Ron Weiss (RWeiss@crl.dec.com)
In addition, Joel Bartlett (Bartlett@wrl.dec.com), Marc Feeley (Feeley@iro.umontreal.ca), Guillermo Rozas (Jinx@zurich.ai.mit.edu) and Ralph Swick (Swick@crl.dec.com) contributed time and energy to the initial release.
external link
- usenet: What I learned from building Dylan/Thomas
I've been maintaining an unintentional "radio silence" for the last several weeks while working with Ron Weiss and Matt Birkholz to build the Thomas system. Now that I'm out of that hole, I'd like to update others on some of what I learned in the process and respond to a few of the questions raised earlier on these mailing lists.
I. Despite all of the hairy stuff involving keywords, specializers, multiple inheritance and the rest of it, the generic dispatch system of Dylan (maybe also CLOS?) was remarkably simple to build and understand. The key procedure is in the file generic.scm, and is simplified here (I removed support for multiple values, error checking, and renamed some functions for clarity):
(define (generic-dispatch original-args generic-function) (let ((applicable-methods (sorted-applicable-methods (generic-function.methods generic-function) original-args))) (let next-method-loop ((remaining-methods applicable-methods) (current-args original-args)) (apply (car remaining-methods) (if (null? (cdr remaining-methods)) #F (lambda (next-method . these-args) (next-method-loop (cdr remaining-methods) (if (null? these-args) current-args these-args)))) current-args))))
This requires one non-trivial support routine:
sorted-applicable-methods finds the applicable methods and sorts them from most- to least-specific. We used a topological sort of all of the classes in the system. This is used to provide an ordering of the classes for comparing with individual arguments to the method, which seems logical enough:
(define (match-specializer? object specializer) ;; Computes a distance value from the object to the specializer. ;; This distance is either ;; #F -- doesn't match the specializer ;; 0 -- the specializer is a singleton for exactly this object ;; <n> -- ordering of the class representing this specializer in ;; the ordering of all classes (always >= 1) (if (singleton? specializer) (if (eq? object (singleton.object specializer)) 0 #F) (if (subclass? (get-type object) specializer) (class.generality specializer) #F)))
We then used lexicographic ordering to break ties for multiple argument dispatch, which is specified in Dylan but seems basically arbitrary. This latter is a place where I'd expect a Scheme system to say "unspecified".
I won't reproduce the intervening code, but if you look for a while you'll see that sorted-applicable-methods requires two things:
a) A way, given a method under consideration, to find the specializers that restrict the types of arguments to which it can be applied. In response to one of the questions on the Scheme list, this is where the need for WEAK-CONS arises. Since methods are first-class in Dylan (Dylan/method <=> Scheme/lambda) and can be attached at any time to a generic function, it is necessary at runtime to be able to locate the specializers for a method. This requires building a data structure (an a-list or hash table, for example) that uses methods as keys to locate their specializers. But such a table would grow indefinitely in size unless the garbage collector removes entries that correspond to methods that have been eliminated from the system. [There is a similar need to locate a description of a generic function at runtime, used for the error detection mechanism I eliminated from this simplified code.]
b) A mechanism for determining the most specific type of an object at runtime. The procedure "get-type" (in the file runtime-top.scm) performs the operation. It is a kludge and very order sensitive. I found this function rather unpleasant to maintain as the system grew:
(define (get-type obj) (cond ((instance? obj) (instance.class obj)) ((number? obj) ; Might be wrong (if (real? obj) (if (exact? obj) (if (integer? obj) <integer> <ratio>) <float>) <complex>)) ((class? obj) <class>) ((singleton? obj) <singleton>) ((null? obj) <empty-list>) ((slot? obj) <slot-descriptor>) ((pair? obj) <pair>) ((vector? obj) <simple-object-vector>) ((string? obj) <byte-string>) ((char? obj) <character>) ((procedure? obj) (cond ((dylan::generic-function? obj) <generic-function>) ((dylan::method? obj) <method>) (else <function>))) ((keyword? obj) <keyword>) ((symbol? obj) <symbol>) (else <object>)))
This observation has lead me back to a conversation we had about the "portable record proposal" which went unresolved at our last meeting. The issue revolved around the pair of requirements for disjointness of types and opacity of representation. Several of us discussed this at some length at the LISP conference and I came away convinced of the need for a set of operations, separate from the record package, for manufacturing disjoint user types. At the Authors meeting, Jonathan Rees pointed out that he had made such a proposal some time ago but that it had not been acted upon. I'd like to have us act on it: Jonathan, can you find it and resubmit it?
II. On the question about teaching generic dispatch as opposed to object dispatch, I find myself for the first time disagreeing with Brian Harvey. While I haven't actually tried to teach this material for two years now, I seem to recollect that I taught numbers very early in the course. They are the one place in Scheme where generic operation is deeply built into the system, and it provides a simple and logical place to start when explaining generic dispatch. The symbolic algebra system at the end of Chapter 2 of Structure and Interpretation takes this much farther, but perhaps at the expense of the gut-level understanding of the issue that comes from simply asking the question "How does Scheme do that with its numbers, anyway?". The very simple example of adding integers to real numbers explains a lot to anyone familiar with representations. For those unfamiliar with numerical representations, the introduction of generic sequence operations (like the ones in Dylan, but simplified) is another obvious starting point. I find the INITIAL-STATE, NEXT-STATE, CURRENT-ELEMENT iteration protocol very elegant (although my taste would add CURRENT-KEY to all collections, not just explicit key sequences; even at the cost of an extra cons-cell for the state of a list).
I, personally, have always found the object dispatch method both limiting as a programming style and consequently rather unpleasant to teach. Its main advantage, from what I've seen so far, is one of modularity. The class heterarchy is conceptually elegant and works well, but unfortunately leads to a number of questions about where an object gets its behavior from, which inevitably leads to questions about HOW the object was implemented; this is precisely the abstraction violation that I thought object systems were intending to avoid.
III. I've written my first large, portable Scheme program. It required exactly two extensions to the language (WEAK operations, a minimal error system), but is modularized in a way that makes the port easy. I like this way of writing my programs, and I think it gives me a new way to think about the issue of portability. The Scheme language has the flexibility to allow most reasonable programs to be divided into an implementation-independent part which calls procedures defined in a implementation-dependent part. This isn't new; but this is the first time I've actually tried to make it work and it did! The ease of porting this system from one Scheme to another contrasts rather sharply with my attempts to port MIT Scheme's implementation from one version of C to another, and to similar experiences in porting Pascal and BCPL programs in an earlier existence.
IV. To the Authors:
I'd really like to see some hooks into the garbage collector standardized. This implementation experience, as well as a paper I wrote several years ago that touches on some of the other things one can do using a garbage collector, convinced me that it isn't going to be that hard to figure out "the right set". I see two "must do" items and two optional ones:
MUST DO 1: Have a way to specify a set of "precious" items which the garbage collector is not at liberty to remove, but which it will attempt to monitor and report when they are no longer required for any other reason. The two optional sets are specific implementations of this idea which exist in several current Scheme implementations, but which may not be the best abstraction of the service.
MUST DO 2: Have a way to explicitly invoke the equivalent of the garbage collector's structure-preserving walk given an arbitrary starting object. One form this could take is a procedure that takes the root object and a vector; it stores into the vector all the items found from walking that object and returns a count of all of the objects encountered (this allows the operation to occur (conceptually, at least) without CONSing). An alternate formulation is to provide a function which takes an object and returns a collection of the sub-components of that object sufficient to allow a full structure walk from source-level Scheme.
OPTIONAL 1: Have a way to run a procedure at the end of every garbage collection before returning to user code. This is how the semi-portable code in Thomas implements populations and one dimensional "weak" tables on top of WEAK-CONS. It is one way to perform the report function in MUST DO 1, but not the only possible way. All three of the Schemes I used for Thomas (MIT, Scheme->C, and Gambit) have some such hook, but all have slightly different names and signatures for the function that installs the GC daemon(s).
OPTIONAL 2: The WEAK-CONS set of operations (WEAK-PAIR?, WEAK-CONS, WEAK-C{A,D}R, SET-WEAK-C{A,D}R!) are now also available in all three Scheme systems. It took Joel Bartlett only one day to add them to his conservative generational collector; they are nice because of their simplicity. They lack the generality of MUST DO 1, and would require something like the finalization proposal that was circulated before our last meeting (but not discussed).
I'd also like to see some people take a good, hard look at the Thomas code for generic operations and the class hierarchy. It is very portable and provides some very good capabilities. I'd like someone to take an independent look at that code and see if it can be packaged nicely and put into the (putative) Scheme library. If we can do that, we should be able to add a lot of the Thomas runtime system to the same Scheme library and have a very good start at a powerful programming library -- one thing that (IMHO) has long held up the adoption of Scheme as a "real" language.
--Jim
external link
|