View on GitHub

Deterministic Workflows

A collaboration between UPenn & Indiana University.

Background

Achieving determinism on real software systems remains difficult. Even a batch-processing job, whose task is to map input bits to output bits, risks nondeterminism from thread scheduling, system calls, CPU instructions, and leakage of environmental information such as date or CPU model.

Research Project

In 2015 we asked ourselves why there was no available, deployable way to run real Linux software deterministically. We thus set out to create user-space deterministic execution sandboxes. This project follows from our prior work on deterministic parallel programming, which we conducted at the language- and library-level as well as by modifying the operating system and computer architecture.

Detflow

Our first prototype, called DetFlow was described in a paper at OOPSLA’17 [1]. DetFlow uses a mix of language-support and runtime sandboxing to achieve an end-to-end determinism guarantee.

In DetFlow, process-parallel programs are allowed if they are controlled by a parallel coordinator process written with special language support (a Haskell monad). Subprocess are determinized using a runtime sandbox, but are sequentialized to keep overheads low. DetFlow has well below 5\% overhead in all our experiments. However, this prototype uses LD_PRELOAD libc interception which is not sufficiently general or robust for a production quality implementation.

DetTrace

Our second prototype of a runtime determinism sandbox is called, DetTrace. DetTrace uses ptrace and is much more general than DetFlow, but is also higher-overhead. In case studies, we have run the build and test code for 12,130 Debian packages inside DetTrace, and have run large applications like blender and pdflatex inside the deterministic sandbox as well.

A preprint for DetTrace will be posted here soon.

Commercialization

Our end-to-end userspace determinism approach is being commercialized by Cloudseal Inc.
More discussion of determinism and reproducibility problems can be found on the blog. Cloudseal is building low-overhead record-and-replay-as-a-service (for bug and crash repro). The core is a deterministic execution capability that minimizes the amount of recording needed, and eliminates unnecessary nondeterminism – nondeterminism which creates headaches like flaky tests.

(In this third generation, production implementation of the approach, binary instrumentation is used to avoid ptrace and achieve both the low overhead of DetFlow and the generality of DetTrace.)

Binary Instrumentation

Believe it or not, in spite of decades of work on binary instrumentation and mechanisms for hooking functions or syscalls, we’ve had to develop new techniques for low-overhead interposition on Linux system calls and nondeterministic instructions. Part of the reason for this is needing an all-userspace approach.

The basic idea of our approach is to avoid both the high startup overhead of full binary translation approaches (Pin, DynamoRIO) and avoid the high cost of trap instructions in breakpoint-based tracing frameorks (DTrace, SystemTap, etc). This is done by patching guest process code in-place and injecting new code in the guest, but a fully general and performant solution requires solving a number of challenges (see the PLDI papers linked below).

Deterministic Libraries and Languages

The deterministic workflows project follows from several years of research on deterministic parallel programming, mainly using Haskell as a vehicle (though other projects target other languages, like Deterministic Parallel Java). An example system our work in this area is the LVish system for programming with monotonic concurrent data structures [4,5].

Deterministic Operating Systems and Architectures

In this work, we looked at enforcing determinism without requiring source-level changes to programs, instead allowing ourselves to freely change the operating system, compiler, and even computer architecture. Representive papers include Eurosys’15, and ASPLOS’10,11 [6,7,8]. Research of this kind serves to show what is possible, while the emphasis of the deterministic workflows project is to emphasize deployable user-space solutions that achieve end-to-end determinism on existing computers and operating systems.

People

Baojun Wang Kelly Shiptoski Joe Devietti
Omar Navarro Leija Ryan Newton Ryan Scott

References

  1. (OOPSLA’17) “Monadic Composition for Deterministic, Parallel Batch Processing”, R Scott, O Navarro Leija, J Devietti, and R Newton, ACM SIGPLAN conference on Object-oriented Programming, Systems, Languages and Applications.

  2. (PLDI’16) “Living on the edge: Rapid-toggling probes with cross modification on x86”, B Chamith, B Svensson, L Dalessandro, R Newton. ACM SIGPLAN conference on Programming Language Design and Implementation.

  3. (PLDI’17) “Instruction Punning: Lightweight Instrumentation for x86-64”, B Chamith, B Svensson, L Dalessandro, R Newton. ACM SIGPLAN conference on Programming Language Design and Implementation.

  4. (POPL’14) “Freeze After Writing: Quasi-Deterministic Parallel Programming with LVars and Handlers”, L Kuper, A Turon, N Krishnaswami, R Newton. ACM SIGPLAN Principals of Programming Languages.

  5. (PLDI’14) “Taming the Parallel Effect Zoo: Extensible Deterministic Parallelism with LVish”, L Kuper, A Todd, S Tobin-Hochstadt, R Newton. ACM SIGPLAN Programming Languages Design and Implementation.

  6. (Eurosys’15) “High-Performance Determinism with Total Store Order Consistency”, T Merrifield, J Devietti and J Eriksson. European Conference on Computer Systems.

  7. (ASPLOS’11) “RCDC: A Relaxed-Consistency Deterministic Computer”, J Devietti, J Nelson, T Bergan, L Ceze and D Grossman. International Conference on Architectural Support for Programming Languages & Operating Systems.

  8. (ASPLOS’10) “CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution”. T Bergan, O Anderson, J Devietti, L Ceze and D Grossman. International Conference on Architectural Support for Programming Languages & Operating Systems.