Language for Protocol Compilation 

Language for Protocol Compilation

Eddie Kohler, Frans Kaashoek, David Montgomery; LCS at MIT (PDOS group) 1998

from PROLAC home page:
"Prolac is a programming language designed for writing readable, modular, extensible, and efficient network protocol implementations. It was designed pragmatically, for implementation, rather than as a prescriptive specification language with interesting theoretical properties. Object-oriented languages, functional languages, C, and Yacc are the strongest influences on its design. This combination of influences -- and a focus on minimal, elegant syntax -- led to some novel features (module operators, for instance), despite a general preference for time-tested techniques."

  • Kohler, Eddie "Prolac: A Language For Protocol Compilation" MSc MIT September, 1997 view details Extract: Related work
    The International Organization for Standards (ISO) has defined two formal description techniques originally intended for developing the ISO OSI protocol suite. These techniques are LOTOS and Estelle. LOTOS, based on Milner’s Calculus of Communicating Systems, is an algebraic technique with functional properties. Like many functional languages, it is very effective for describing its fundamental abstraction, processes. Unfortunately, also like many functional languages, these abstractions are rigid and may not fit existing protocols without some pain - exactly the pain that protocol languages are supposed to avoid. LOTOS users report some problems with using the language for specification, and LOTOS presents very serious challenges to the compiler writer, including valid programs which potentially generate an infinite number of processes. It seems doubtful that LOTOS specifications can be made to run efficiently.

    Estelle is one of many protocol languages based on a finite state machine model much like that often used in parsers. Estelle, like LOTOS, includes asynchronous parallelism; it is based on Pascal, and includes a module system (bound up with the parallelism structure) where processes (modules) communicate through broadcast signals. Estelle can be used to create semi-automatic implementations of reasonable performance. However, experience with large protocols is not reported, and high performance is not discussed. Furthermore, the state machine model and its cousin the Petri machine model have intrinsic problems for modeling protocols: the division into states often does not correspond to anything real in the protocol, and relationships between states can become very complicated and difficult to change, even in carefully layered protocols.

    Esterel is a version of Estelle without asynchronous parallelism: an Esterel specification has a defined sequential execution. High performance Esterel compilers are being developed; however, full implementation of a large protocol is not reported. In terms of language, Esterel suffers from many of the same problems as Estelle due to their common extended finite state machine model.

    RTAG is based on a context-free attribute grammar. RTAG provides a relatively natural syntax with equivalent modeling power to extended finite state machines. Efficient compilers are reported in the literature, although efficient turns out to mean arguably efficient enough for research use, i.e., simple protocols suffer a factor of 2 slowdown. This slowdown comes partially from parallelism in the language. RTAG is not always suitable for existing protocols, just as yacc is not always suitable for describing computer languages.

    The x-kernel provides an infrastructure and various tools for creating protocol stacks. Prolac is complementary with the x-kernel, which focuses on organizing protocol stacks; Prolac is primarily designed for implementing a single protocol.

    Morpheus, an object-oriented protocol language, enforces a large number of constraints on the protocol programmer. These constraints are restrictive enough that existing protocol specification[s] may not be implementable in Morpheus. While some of the constraints are meant to increase the knowledge available to the compiler to enable domain-specific optimizations, others seem to exist solely to prevent the programmer from making bad design choices. Impressive results are reported for UDP speedup, but we regard Morpheus’s inflexibility and inability to implement real protocols as definitive. Abstract: Prolac is a new statically-typed object-oriented programming language designed for implementing network protocols. Prolac is designed to make protocol specifications readable to human beings, and thus more likely to be correct; easily extensible to accommodate protocol enhancements; and efficient when compiled.

    We present an overview of the Prolac language and a discussion of issues and principles in its design, as well as a preliminary language reference manual. The prolacc optimizing protocol compiler is also described. A prototype TCP specification is presented that is both readable and extensible; experience with the specification suggests that, even untuned, Prolac overhead is negligible on normal networks.
  • Kohler, Eddie "Prolac language reference manual", January 1999. view details External link: Online copy
  • Kohler, Eddie; Kaashoek, M. Frans; Montgomery, David R. "A readable TCP in the Prolac protocol language" Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, 1999, Cambridge, Massachusetts, United States view details Abstract: Prolac is a new statically-typed, object-oriented language for network protocol implementation. It is designed for readability, extensibility, and real-world implementation; most previous protocol languages, in contrast, have been based on hard-to-implement theoretical models and have focused on verification. We present a working Prolac TCP implementation directly derived from 4.4BSD. Our implementation is modular - protocol processing is logically divided into minimally-interacting pieces; readable - Prolac encourages top-down structure and naming intermediate computations; and extensible - subclassing cleanly separates protocol extensions like delayed acknowledgements and slow start. The Prolac compiler uses simple global analysis to remove expensive language features like dynamic  dispatch, resulting in end-to-end performance comparable to an unmodified Linux 2.0 TCP.

    DOI Extract: Introduction

    Most familiar programming idioms handle network protocols badly - even modern languages are stressed by common protocol  characteristics like complicated control flow, soft modularity  boundaries, and stringent efficiency requirements. This makes protocol code hard to read, verify and maintain. Specialized  languages are a promising area for solutions to this problem, and network protocol languages and compilers have been an active research area for decades.

    Most existing protocol languages focus on verification. Their underlying theoretical models are designed for testing and provability, often making pragmatic goals like real-world implementation difficult to achieve. Even languages designed with pragmatism in mind can have theoretical models that are difficult to program.

    In this paper, we describe a language that takes a different approach. Prolac is a lightweight object-oriented language  tailored for network protocol implementation. It is focused on readability rather than provability, and on the human programmer rather than a machine verifier; protocol implementation requirements inspired its design. No part of Prolac is difficult to compile into efficient low-level code, as we demonstrate with our TCP implementation. Section 3 describes the Prolac language and its compiler in more detail. Section 4 presents the reimplementation of most of TCP in Prolac. Our TCP is modular, readable, and extensible compared to other implementations in 4.4BSD and Linux 2.0. Modules and methods at-e used to break complex  functionality into focused parts, and the protocol’s top-down  design remains visible in the final implementation; all this has no significant performance overhead. Four TCP extensions (delayed acknowledgements, slow start, fast retransmit, and header prediction) are implemented through subclassing as add-ons to a clean base. These extensions are simple-each one fits in a single source file with less than 60 lines of Prolac-and can be independently turned on with no changes to the base protocol. The Prolac TCP runs inside a Linux 2.0 kernel, interfaces with the networking subsystem, and is able to exchange packets with other, unmodified TCPs with roughly the same end-to-end performance as  unmodified Linux, as discussed in Section 5.

    The contributions of this work are the Prolac language, including several novel language features; a new way of structuring a TCP implementation in Prolac, giving superior readability and extensibility; and a preliminary performance analysis of this Prolac TCP.
    Extract: Related work
    Many previous protocol languages have been designed for verification, not readability or implementation. Prolac uses ideas from some of these languages, but we found that  specific language features designed with protocols in mind-for example, parallelism to model both sides of a connection - often worked against readability, implementability,  extensibility, or all three. Prolac’s final design is more conventional and less domain-specific than these languages; the protocol domain generally affected the details of our versions of  common concepts, not specific language features.

    Two protocol languages, or formal description techniques, were originally designed for developing the OS1 protocol suite: LOTOS and Estelle. Estelle, the  language intended for implementation, is Pascal-like; it  structures a protocol as a set of finite state machines running in parallel and communicating via broadcast signals. We find Estelle specifications difficult to read because of this,  although it is well suited for state analysis and test  generation. Semi-automatic implementations of Estelle  specifications have been built, but finite state machines make specifications complicated and difficult to change, even for carefully layered protocols.

    Esterel addressed some of Estelle’s implementation difficulties by removing its asynchronous parallelism, leaving a completely sequential language. This worked. Impressive performance results are reported for a restricted Esterel  version of TCP, better than a similarly restricted BSD TCP; this convinced us to leave parallelism out of Prolac. However, Esterel still shares Estelle’s formal model, interlocking finite state machines, and the problems this causes: complexity, unfamiliarity, unreadability, and difficulty of modification or extension. The Esterel TCP did not include connection es- tablishment, and appears not to include important extensions like congestion avoidance.

    RTAG is based on a different formal model: context-free attribute grammars. RTAG is more readable than LOTOS and Estelle, but large RTAG specifications, like large attribute grammars generally, become hard to read since the name-space  is flat. An early version of Prolac resembled RTAG, but readability and other issues have pushed it in the direction of conventional programming languages. RTAG’s performance is problematic, again due to parallelism in the language. The x-kernel [ 121, which introduces an explicit  architecture for constructing and composing protocols, is orthogonal to Prolac. WC focus on making a single protocol  implementation readable; the x-kernel provides a uniform interface between protocols and aims to improve the structure and performance of protocol layering.

    Morpheus, another object-oriented language for protocol implementation, is based on x-kernel ideas. To force clean protocol designs and enable domain-specific  optimizations, it puts many constraints on the programmer. As a result, existing protocol specifications may not be implementable in Morpheus. Its compiler has not been written.