BLISS/10(ID:3373/bli006)

Country: us
- languages for us
- us/1974
Began: 1974
Sammet:MLT
- MLT
- MLT/1974
- MLT/us

Bliss for PDP-10

BLISS/10

Implementation

References:

BLISS-10 Programmer's Reference Manual, Digital Equipment Corporation, Maynard, Mass., 1974 view details

Wichmann, B. A. "Ackermann's function: a study in the efficiency of calling procedures" BIT 16 (1976), pp103-110 view details Abstract: A six line recursive procedure is used to assess the efficiency of the procedure calling mechanism in ALGOL-like languages. The results from some 40 systems varying from ALGOL 68 and PL/I to System Implementation Languages for minicomputers are presented and compared. A hundred to one variation in performance occurs with this test, the major reasons for which are given.
Extract: Introduction
Introduction
There are several areas where traditional high-level languages are notably less efficient than hand-coding. Two of these areas are accessing array elements and calling procedures. Although in the former case much research has been done to improve the efficiency (resulting in the ALCOR and FORTRAN H2 compilers), little work has been published on calling procedures. In many scientific computations procedures are not called very frequently but this is not true in non-numeric work. Since modularisation by means of procedures is one of the most powerful programming tools in structuring complex problems, an inefficient calling mechanism can be very detrimental. In recent years there has been an increased use of high-level languages as assembler replacements ? the so-called 'System Implementation Languages'. Such languages must achieve a high degree of object code efficiency if their use is to become more widespread. The purpose of this paper is to compare the procedure calling mechanism of traditional high-level languages with both system implementation languages and machine-code.
Extract: Ackermann's function
Ackermann's function
In view of the paper by Sundblad [1], a reasonable test case for study seemed to be the evaluation of Ackermann's function. This function has the advantage of being very short and not requiring large integer values for typical cases, yet being complex enough to be a significant test. The figures given in Sundblad's paper also provided a basis for comparison before many values had been obtained.
Following Sundblad, we calculate Ackermann (3,n) for increasing n, in accordance with the complete program listing below. Three figures are determined from the test, the average execution time per call, the number of instructions executed per call and the amount of stack space required for each call. Although the average time per call can be determined from running the program, the other two figures cannot be found so easily and indeed, in many cases the figures are not available.

[1]Y. Sundblad, The Ackermann function. A theoretical, computational and formula manipulative study. BIT 11 (1971), 107119.

Extract: Acknowledgements
Acknowledgements
This work has been inspired by the International Federation for Information Processing Working Group 2.4. The desire of the group to obtain information on the performance of system implementation languages has led to tests such as this. It would not have been possible to write this paper without the active help of the following persons in running the test: ? Mr. L. Ammeraal (Mini-ALGOL 68), Dr R. Backhouse (B5500), Dr J. G. P. Barnes (RTL/2), Dr D. A. Bell (PL516), Dr H. Boom (Cyber 73 ALGOL 68), Mr P. Klint (C-UNIX), Mr R. Conradi (MARY, CYBER 74, 1108, CDC3300), Mr W. Findlay (PASCAL 1906A & machine code), Mr W. B. Foulkes (PASCAL 370/158). Professor G. Goos (B6700), Mr. V. Hath-way (BCPL MOD 1), Mr M. Healey (PL/I OPT), Professor J. J. Horning (Sue 11), Mr B. Jones (ALGOL 60, Dclft 370/165), Dr M. MeKeag (PASCAL & ALGOL 60, 1906S), Mr Z. Mocsi (R10), Dr J. Palme (DEC10 SIMULA & machine code), Mr M.J. Parsons (PALGOL), Dr M. Richards (BCPL), Professor J. S. Rohl (Manchester 1900 ALGOL), Mr P. D. Stephens (IMP), Dr P. Wetherall (ALGOL 68-R), Professor N. Wirth (PASCAL), Professor D. B. Wortman (370/165 machine code) and Professor W. A. Wulf (Bliss 10, Bliss 11 & machine code).
Extract: Results of tests (altered to include authors where known)

Language	Computer	Time per call (ms)	Instructions per call	Words per call	Characteristic (see below)	Written by (if stated)
ALGOL 60	B6700	41.2	16	13	ND2VO	Professor G. Goos
ALGOL 60	B5500 Mk XV.l.01	135	19.5	7	NL2VO	Dr R. Backhouse
ALGOL 60	EMAS 4/75	46.7	21	18	ND2VO
ALGOL 60	1906A Manchester	29.2	33.5	30	ND2VR	Professor J. S. Rohl
ALGOL 60	KDF9 Mk2	532	68.5	10	CD2VR
ALGOL 60	1906S XALV	70.9	(120)	13	CD2TR	Dr M. McKeag
ALGOL 60	370/165 Delft	43.8	(142)	?	CD2TR	Mr B. Jones
ALGOL 60	NU 1108	175	(175)	9	CD2TR	Mr R. Conradi
ALGOL W	360/67 Mk2	121	(74)	(16-45)	HD2TR
IMP	ICL 4/75	46	17.5	18	ND2VO	Mr P. D. Stephens
SIMULA	1108	120	(120)	9	HD2TR	Mr R. Conradi
SIMULA	DEC1O(KI1O)	317	(158)	?	HD2TR	Dr J. Palme
SIMULA	CYSEN 74	380	(800)	(15)	HD2TR	Mr R. Conradi
ALGOL 68	1907F (no heap)	134	28	15	NO2VO	Mr. L. Ammeraal
ALGOL 68	1907F (heap)	167	34	15	HD2VR	Mr. L. Ammeraal
ALGOL 68	CDC 6400(subset)	45.3	51	?	HD2VO	Dr P. Wetherall
ALGOL 68	Cyber 73 vl.0.8	67.8	(60)	7	HD2VR	Dr H. Boom
Bliss 10	DEClO(KAlO)	53.15	15	5	NLWVR	Professor W. A. Wulf /Mr Z. Mocsi
PL/I	360/65 OPT v1.2.2	101	(61)	68	HD2AO	Mr M. Healey
PL/I	360/65 F v5.4	351	(212)	(70)	HD2AR
PASCAL	1906S	19.1	32.5	11	HDWVR	Professor N. Wirth/Mr W. Findlay/Dr M. McKeag
PASCAL	CDC 6400	34	38.5	6	HDWVO	Professor N. Wirth
PASCAL	370/158	39	42.5	30	HDWVE	Professor N. Wirth/Mr W. B. Foulkes
RTL/2	4/70	46	14.5	14	NLWVO	Dr J. G. P. Barnes
RTL/2	PDP11/20	(107)	30.5	?	CLWVH	Dr J. G. P. Barnes
PALGOL	PDP11/20	(46)	13	3	NLWVO	Mr M.J. Parsons
Bliss/11	PDP11/20 OPT	31	8	2	NLWVO	Professor W. A. Wulf /Professor J. J. Horning
MARY	SM4	105	30.5	9	COWVR	Mr R. Conradi
CORAL 66	MOD 1	(21)	15.5	3	NLWVO
PL516	DDP516	84.5	37	2	CLWVH	Dr D. A. Bell
C (UNIX)	PDP 11/45	50.4	26	?	NLWVR	Mr P. Klint
BCPL	370/165	5.9	19	9	NLWVR	Dr M. Richards
BCPL	PDP 11/45	48	20.5	7	NLWVO	Dr M. Richards
BCPL	MOD 1	(35)	25	7	NLWVR	Dr M. Richards/Mr. V. Hathaway
Machine code	Most machines	?	5-14	1-2	NLWVO	Dr J. Palme/Professor D. B. Wortman /Professor W. A. Wulf

Extract: Program listing
Program listing
begin
integer i, j, k, k1; real t1, t2;
integer procedure Ackermann(m, n); value m, n; integer m, n; Ackermann := if m = 0 then n + l
else if n = 0 then Ackermann(m -1,1) else Ackermann(m - 1, Ackermann(m, n - 1));
k:= 16; k1 := 1; for i := 1 step 1 until 6 do begin
t1 := cputime;
j : = Ackermann(3, i);
t2 := cputime;
if j = A;-3 then
outtext(1, 'WRONG VALUE'); outreal(1, t2 ? t1);
comment Now output time per call;
outreal(1, 3 x (t2 - t1)/(512 x k1 - 15 x k + 9x i + 37))); k1 := 4 x k1; k := 2 x k end end
Extract: Properties of the algorithm
Properties of the algorithm
To determine the execution characteristics of the test one needs the following properties of the algorithm (which can be easily proved by induction). The number of executions of each leg of the algorithm in calculating Ackermann (3, n) is:
first leg:          (64x4|n-    72x2|n + 6xn + 26)/3
second leg:       (                     24 x 2 | n - 3 x n - 12)/3
third leg:         (64x4|n-    72x2|n + 6xn + 23)/3
total
no. of calls =   (128 x4|n-   120x2|n + 9xn + 37)/3
Hence for large n, the average number of instructions per call is half the number of instructions in the first and third legs. Both legs contain the procedure entry and exit overheads and the third leg includes two unsuccessful tests for equality with zero. So the number of instructions executed per call can be found by examining the machine code produced by the compiler. Unfortunately this is not always easy and sometimes almost impossible. To provide an approximate comparative value, an estimate has been made from the time using a Gibson mix value [2] to extrapolate from a similar machine. In a few cases, the instructions executed arc known but the time has not been measured. In these cases, an estimate for the time is given based upon published instruction times.
The amount of stack space required by the program is determined by the maximum depth of procedure calls. This is given by Sundblad as
Ackermann(3, n) - 1 = 2 | (n + 3) - 4
and occurs when the third leg is being evaluated at all but the last of those levels. Hence the amount of stack space required doubles when n is increased by one. For a more detailed discussion on the storage requirements and the definition of the 'maximum depth of recursion' see [5]. To measure this space, Sundblad observed the maximum value of 11 for which Ackermann (3, n) could be calculated using 26K words of store (which he called the 'capability'). Although this measurement is easy to take, it only provides a rough estimate of the space needed per call. Examination of the machine code of the third leg of the algorithm gives the precise value. In fact, the number of words per call is the difference in the address of 4m in the inner call to the address of the parameter in the routine. It is not easy to calculate this value from the code, and so a post-mortem print may be the easiest way to find the value. Rather than give the capability, the number of words per call is listed with the results. If only the capability is known, bounds can be given for the number of words per call by assuming that between 3K and 10K words are needed out of the 26K store for the program and run-time system.
Extract: Notes on the results and Factors influencing the execution speed
Notes on the results
Estimates of missing values are included in the table in brackets and have been calculated in the manner described above. The program listing in all cases has followed the coding given very closely. The only exceptions are 1) the machine code examples, 2) the PASCAL and SUE systems which have no conditional expressions, and 3) PL516 which follows the hardware of the machine in not supporting recursion (stacking is performed by subroutines).

Factors influencing the execution speed
Factors influencing the call of a recursive procedure vary from inherent problems with the architecture of the machine to purely software problems on the design of the procedure calling mechanism. Against each of the results above is a sequence of letters and digits which describes the principle characteristics governing the execution performance.
Recursion. On most modern machines with a few base registers and indexing facilities, the basic overhead of recursion is very small indeed. A few minicomputers do not have adequate base registers or addressing facilities to support recursion. The Honeywell DDP5l6 is of this type, hence with the high-level assembler PL516 stacking must be performed explicitly. On some machines, although the addressing logic is available an additional time penalty occurs by addressing via pointers. On the Modular 1 implementation of CORAL 66, recursion is an option. In fact the code produced with recursion is more efficient than that without recursion. The reason for this is that the short address length of the Modular 1 means that base registers must be loaded to access the variables of another procedure. This is a larger overhead than that incurred using a single stack register which only needs to he incremented and decremented by a small amount. Without recursion, the number of instructions is increased to 19.5 (30% more).
Storage allocation. In order to execute this program, a stack storage scheme is necessary. It is sometimes possible on conventional hardware to use an area in the store as stack without performing an explicit software check for stack overflow. One can then rely upon a hardware address violation which should permit the output of a suitable diagnostic. Such systems are marked by an N, whereas C denotes a software stack overflow check. Languages such as ALGOL 68, PL/I and SIMULA require more than a simple stack and hence must perform an overflow check in a manner which allows a recovery to perform some garbage collection. Systems like this arc marked with an H. PASCAL permits additional storage apart from the stack but without garbage collection. Although no garbage collection is involved, PASCAL is marked with an H. Only SIMULA requires an actual garbage collection during the execution of this test. The method used to administer the stack is partly a matter of language design and partly a matter of implementation. For instance, although ALGOL 68 requires a heap, the ALGOL 68-R implementation will run a stack only if the program does not require the heap. The difference, shown above, is that an additional subroutine is called on procedure entry to check that adequate stack space is available.
Complete display. Block structured languages in general require a base pointer for every block whose variables can be accessed at the relevant part of the program. This facility is marked with a D. Languages such as BCPL and Burroughs B5500 ALGOL restrict access to local and global identifiers only, permitting a significant economy in pointers. These languages are marked with an L. The actual method of implementing the display can vary independently of the language. For instance, ALGOL W and IMP have every base pointer in a register, the PASCAL implementations above backchain the display (from the local procedure) and ALGOL 68-R keeps a copy in core of the full display in the local block. In almost all cases, there will be small variations in the calling overhead depending upon the relevant positions (in the display) of the call and the procedure.

Dynamic stack storage. In ALGOL 60, the sizes of arrays are not, in general, known until program execution. This facility, implemented with second order working store, requires a small overhead on the administration of stack storage even when no arrays are being used, as in this test. Languages requiring such storage are marked with a 2 whereas systems allowing only storage whose size can be determined at compile time are marked with a W. PALGOL and RTL/2 are LW languages and hence require only one stack pointer (and one pointer for static storage depending upon the addressing structure).
Parameter handling. The most expensive mechanism is marked with a T which denotes a 'thunk' that is required to implement the ALGOL 60 call by name [4]. A thunk can be avoided with this test by performing all parameter checking at compile time and using only the value mechanism (V). The KDF9, B5500 and Atlas ALGOL compilers all use the value mechanism. The use of the thunk mechanism by the other ALGOL 60 compilers is caused by the problem of handling the call of formal procedures whose parameters cannot be specified [3]. With value parameters, the ICL 1900 ALGOL compiler uses a simplified thunk mechanism but the Manchester ALGOL compiler uses the value mechanism. The PL/I systems use call by address which is marked with an A. With recursion, the addresses of parameters are dynamic and in consequence this method is less efficient than call by value.
Open code or subroutines. Systems which handle this test entirely by open code are marked with an O, whereas the use of subroutines is marked with an R. The PL/I optimising compiler generates a subroutine call, but it is not invoked unless insufficient stack space is given (which did not happen in this
case).
Extract: Conclusion
Conclusion.
Ackermann's function provides a simple method of comparing the efficiency of the procedure calling mechanism of a language permitting recursion. The results show a very wide variation in performance even for languages containing no inherent complications. Additional instructions required in ALGOL 68, PL/I and PASCAL to check for stack overflow are quite insignificant compared to the hundreds of extra instructions executed by the inefficient implementations of ALGOL 60. There is no doubt that 'System Implementation Languages' give very much better results on this test without reducing the facilities to the programmer. Machine independence seems to be realised in this case without any measurable cost as BCPL shows.
Does Ackermann s function represent a good test for a system implementation language? Unfortunately no statistical information is available to the author on the use of procedures in operating systems and compilers etc. Hence it is not known if, for instance, two parameters is typical. The large amount of stack space used is certainly not typical and can result in pathological situations. For instance, stack space is claimed in 64 word units under George 3 on a 1906A, but is not released. Hence during the first part of the algorithm when the stack is being increased a large operating system overhead occurs. During the second part when the stack is rarely increased beyond its previous maximum, there is no significant overhead. The computational part of testing for the equality with zero, jumping and adding or subtracting one seems very typical of non-numeric work. On the 360 computer, the fact that the the algorithm is very short ( <4K bytes) results in a small saving, but on the IMP compiler which can take advantage of this the speed was only the increased by 3%.

From the better figures produced by system implementation languages, the code for Ackermann is roughly divided as follows:
instructions
subroutine entry and exit      2 stacking return address        1 setting up environment        3 checking for stack overflow     2 (if check made) restoring old environment      3 (on procedure exit) setting parameters          2x1=2
body of Ackermann         8
Total                   21