Partially redundant fence elimination for x86, ARM and Power processors (CC 2017)

Robin Morisset, Francesco Zappa Nardelli

We show how partial redundancy elimination (PRE) can be instantiated to perform provably correct fence elimination for multi-threaded programs running on top of the x86, ARM and IBM Power relaxed memory models. We have implemented our algorithm in the backends of the LLVM compiler infrastructure. The optimisation does not induce an observable overhead at compile-time and can result in up-to 10% speedup on some benchmarks.

Common compiler optimisations are invalid in the C11 memory model and what we can do about it (POPL 2015)

Viktor Vafeiadis, Thibaut Balabonski, Soham Chakraborty, Robin Morisset, Francesco Zappa Nardelli

We show that the weak memory models introduced by the 2011 C and C++ standards does not permit many of common source-to-source program transformations (such as expression linearisation and "roach motel" reordering) that modern compilers perform and that are deemed correct. As such it cannot be used to define the semantics of intermediate languages of compilers, as, for instance, LLVM aimed to. We consider a number of possible local fixes, some strenghtening and some weakening the model. We evaluate the proposed fixes by determining which program transformations are valid with respect to each of the patched models. We provide formal Coq proofs of their correctness or counterexamples as appropriate.

Compiler testing via a theory of sound optimisations in the C11/C++11 memory model (PLDI 2013) (software)

Robin Morisset, Pankaj Pawan, Francesco Zappa Nardelli;
Work done as part of my end-of-master's degree internship at INRIA, under the supervision of Francesco Zappa Nardelli.

Compilers sometimes generate correct sequential code but break the concurrency memory model of the programming language: these subtle compiler bugs are observable only when the miscompiled functions interact with concurrent contexts, making them par- ticularly hard to detect. In this work we design a strategy to reduce the hard problem of hunting concurrency compiler bugs to differen- tial testing of sequential code and build a tool that puts this strategy to work. Our first contribution is a theory of sound optimisations in the C11/C++11 memory model, covering most of the optimisations we have observed in real compilers and validating the claim that common compiler optimisations are sound in the C11/C++11 memory model. Our second contribution is to show how, building on this theory, concurrency compiler bugs can be identified by comparing the memory trace of compiled code against a reference memory trace for the source code. Our tool identified several mistaken write introductions and other unexpected behaviours in the latest release of the gcc compiler.

Preliminary Design of the SAFE Platform (PLOS 2011)

André DeHon, Ben Karel, Thomas F.Knight, Jr., Gregory Malecha, Benoît Montagu, Robin Morisset, Greg Morrisett, Benjamin C. Pierce, Randy Pollack, Sumit Ray, Olin Shivers, Jonathan M. Smith, Gregory Sullivan;
Work done as part of a 5 month internship at UPenn under the supervision of Benjamin C. Pierce

Safe is a clean-slate design for a secure host architecture, coupling advances in programming languages, operating systems, and hardware, and incorporating formal methods at every step. The project is still at an early stage, but we have identified a set of fundamental architectural choices that we believe will work together to yield a high-assurance system. We sketch the current state of the design and discuss several of these choices.

Copilot: A Hard Real-Time Runtime Monitor (International Conference on Runtime Verification 2010)

Lee Pike, Alwyn Goodloe, Robin Morisset, Sebastian Niller;
Work done as part of a 3 month internship at NIA (Virigina, US) under the supervision of Alwyn Goodloe and Lee Pike (working at Galois).

We address the problem of runtime monitoring for hard realtime programs--a domain in which correctness is critical yet has largely been overlooked in the runtime monitoring community. We describe the challenges to runtime monitoring for this domain as well as an approach to satisfy the challenges. The core of our approach is a language and compiler called Copilot. Copilot is a stream-based dataflow language that generates small constant-time and constant-space C programs, implementing embedded monitors. Copilot also generates its own scheduler, obviating the need for an underlying real-time operating system.

PhD thesis

Compiler optimisations and relaxed memory models

Robin Morisset, under the supervision of Francesco Zappa Nardelli, with funding from a Google European PhD Fellowship

Modern multiprocessors architectures and programming languages exhibit weakly consistent memories. Their behaviour is formalised by the memory model of the architecture or programming language; it precisely defines which write operation can be returned by each shared memory read. This is not always the latest store to the same variable, because of optimisations in the processors such as speculative execution of instructions, the complex effects of caches, and optimisations in the compilers.
In this thesis we focus on the C11 memory model that is defined by the 2011 edition of the C standard. Our contributions are threefold.
First, we focused on the theory surrounding the C11 model, formally studying which compiler optimisations it enables. We show that many common compiler optimisations are allowed, but, surprisingly, some important ones are forbidden.
Secondly, building on our results, we developed a random testing methodology for detecting when mainstream compilers such as GCC or Clang perform an incorrect optimisation with respect to the memory model. We found several bugs in GCC, all promptly fixed. We also implemented a novel optimisation pass in LLVM, that looks for special instructions that restrict processor optimisations - called fence instructions - and eliminates the redundant ones.
Finally, we developed a user-level scheduler for lightweight threads communicating through first-in first-out single-producer single-consumer queues. This programming model is known as Kahn process networks, and we show how to efficiently implement it, using C11 synchronisation primitives. This shows that despite its flaws, C11 can be usable in practice.