Language for Protocol Compilation
Language for Protocol Compilation
Eddie Kohler, Frans Kaashoek, David Montgomery; LCS at MIT (PDOS group) 1998
from PROLAC home page:
"Prolac is a programming language designed for writing readable, modular, extensible, and efficient network protocol implementations. It was designed pragmatically, for implementation, rather than as a prescriptive specification language with interesting theoretical properties. Object-oriented languages, functional languages, C, and Yacc are the strongest influences on its design. This combination of influences -- and a focus on minimal, elegant syntax -- led to some novel features (module operators, for instance), despite a general preference for time-tested techniques."
The International Organization for Standards (ISO) has defined two formal description techniques originally intended for developing the ISO OSI protocol suite. These techniques are LOTOS and Estelle. LOTOS, based on Milner’s Calculus of Communicating Systems, is an algebraic technique with functional properties. Like many functional languages, it is very effective for describing its fundamental abstraction, processes. Unfortunately, also like many functional languages, these abstractions are rigid and may not fit existing protocols without some pain - exactly the pain that protocol languages are supposed to avoid. LOTOS users report some problems with using the language for specification, and LOTOS presents very serious challenges to the compiler writer, including valid programs which potentially generate an infinite number of processes. It seems doubtful that LOTOS specifications can be made to run efficiently.
Estelle is one of many protocol languages based on a finite state machine model much like that often used in parsers. Estelle, like LOTOS, includes asynchronous parallelism; it is based on Pascal, and includes a module system (bound up with the parallelism structure) where processes (modules) communicate through broadcast signals. Estelle can be used to create semi-automatic implementations of reasonable performance. However, experience with large protocols is not reported, and high performance is not discussed. Furthermore, the state machine model and its cousin the Petri machine model have intrinsic problems for modeling protocols: the division into states often does not correspond to anything real in the protocol, and relationships between states can become very complicated and difficult to change, even in carefully layered protocols.
Esterel is a version of Estelle without asynchronous parallelism: an Esterel specification has a defined sequential execution. High performance Esterel compilers are being developed; however, full implementation of a large protocol is not reported. In terms of language, Esterel suffers from many of the same problems as Estelle due to their common extended finite state machine model.
RTAG is based on a context-free attribute grammar. RTAG provides a relatively natural syntax with equivalent modeling power to extended finite state machines. Efficient compilers are reported in the literature, although efficient turns out to mean arguably efficient enough for research use, i.e., simple protocols suffer a factor of 2 slowdown. This slowdown comes partially from parallelism in the language. RTAG is not always suitable for existing protocols, just as yacc is not always suitable for describing computer languages.
The x-kernel provides an infrastructure and various tools for creating protocol stacks. Prolac is complementary with the x-kernel, which focuses on organizing protocol stacks; Prolac is primarily designed for implementing a single protocol.
Morpheus, an object-oriented protocol language, enforces a large number of constraints on the protocol programmer. These constraints are restrictive enough that existing protocol specification[s] may not be implementable in Morpheus. While some of the constraints are meant to increase the knowledge available to the compiler to enable domain-specific optimizations, others seem to exist solely to prevent the programmer from making bad design choices. Impressive results are reported for UDP speedup, but we regard Morpheus’s inflexibility and inability to implement real protocols as definitive. Abstract: Prolac is a new statically-typed object-oriented programming language designed for implementing network protocols. Prolac is designed to make protocol specifications readable to human beings, and thus more likely to be correct; easily extensible to accommodate protocol enhancements; and efficient when compiled.
We present an overview of the Prolac language and a discussion of issues and principles in its design, as well as a preliminary language reference manual. The prolacc optimizing protocol compiler is also described. A prototype TCP specification is presented that is both readable and extensible; experience with the specification suggests that, even untuned, Prolac overhead is negligible on normal networks.
DOI Extract: Introduction
Most familiar programming idioms handle network protocols badly - even modern languages are stressed by common protocol characteristics like complicated control flow, soft modularity boundaries, and stringent efficiency requirements. This makes protocol code hard to read, verify and maintain. Specialized languages are a promising area for solutions to this problem, and network protocol languages and compilers have been an active research area for decades.
Most existing protocol languages focus on verification. Their underlying theoretical models are designed for testing and provability, often making pragmatic goals like real-world implementation difficult to achieve. Even languages designed with pragmatism in mind can have theoretical models that are difficult to program.
In this paper, we describe a language that takes a different approach. Prolac is a lightweight object-oriented language tailored for network protocol implementation. It is focused on readability rather than provability, and on the human programmer rather than a machine verifier; protocol implementation requirements inspired its design. No part of Prolac is difficult to compile into efficient low-level code, as we demonstrate with our TCP implementation. Section 3 describes the Prolac language and its compiler in more detail. Section 4 presents the reimplementation of most of TCP in Prolac. Our TCP is modular, readable, and extensible compared to other implementations in 4.4BSD and Linux 2.0. Modules and methods at-e used to break complex functionality into focused parts, and the protocol’s top-down design remains visible in the final implementation; all this has no significant performance overhead. Four TCP extensions (delayed acknowledgements, slow start, fast retransmit, and header prediction) are implemented through subclassing as add-ons to a clean base. These extensions are simple-each one fits in a single source file with less than 60 lines of Prolac-and can be independently turned on with no changes to the base protocol. The Prolac TCP runs inside a Linux 2.0 kernel, interfaces with the networking subsystem, and is able to exchange packets with other, unmodified TCPs with roughly the same end-to-end performance as unmodified Linux, as discussed in Section 5.
The contributions of this work are the Prolac language, including several novel language features; a new way of structuring a TCP implementation in Prolac, giving superior readability and extensibility; and a preliminary performance analysis of this Prolac TCP.
Extract: Related work
Many previous protocol languages have been designed for verification, not readability or implementation. Prolac uses ideas from some of these languages, but we found that specific language features designed with protocols in mind-for example, parallelism to model both sides of a connection - often worked against readability, implementability, extensibility, or all three. Prolac’s final design is more conventional and less domain-specific than these languages; the protocol domain generally affected the details of our versions of common concepts, not specific language features.
Two protocol languages, or formal description techniques, were originally designed for developing the OS1 protocol suite: LOTOS and Estelle. Estelle, the language intended for implementation, is Pascal-like; it structures a protocol as a set of finite state machines running in parallel and communicating via broadcast signals. We find Estelle specifications difficult to read because of this, although it is well suited for state analysis and test generation. Semi-automatic implementations of Estelle specifications have been built, but finite state machines make specifications complicated and difficult to change, even for carefully layered protocols.
Esterel addressed some of Estelle’s implementation difficulties by removing its asynchronous parallelism, leaving a completely sequential language. This worked. Impressive performance results are reported for a restricted Esterel version of TCP, better than a similarly restricted BSD TCP; this convinced us to leave parallelism out of Prolac. However, Esterel still shares Estelle’s formal model, interlocking finite state machines, and the problems this causes: complexity, unfamiliarity, unreadability, and difficulty of modification or extension. The Esterel TCP did not include connection es- tablishment, and appears not to include important extensions like congestion avoidance.
RTAG is based on a different formal model: context-free attribute grammars. RTAG is more readable than LOTOS and Estelle, but large RTAG specifications, like large attribute grammars generally, become hard to read since the name-space is flat. An early version of Prolac resembled RTAG, but readability and other issues have pushed it in the direction of conventional programming languages. RTAG’s performance is problematic, again due to parallelism in the language. The x-kernel [ 121, which introduces an explicit architecture for constructing and composing protocols, is orthogonal to Prolac. WC focus on making a single protocol implementation readable; the x-kernel provides a uniform interface between protocols and aims to improve the structure and performance of protocol layering.
Morpheus, another object-oriented language for protocol implementation, is based on x-kernel ideas. To force clean protocol designs and enable domain-specific optimizations, it puts many constraints on the programmer. As a result, existing protocol specifications may not be implementable in Morpheus. Its compiler has not been written.