Building and Testing Distributed Systems

Dr. Charles Killian
Purdue University, Computer Science

1:00pm Friday, 2 March 2012, ITE325 UMBC

Building distributed systems is particularly difficult because of the asynchronous, heterogeneous, and failure-prone environment where these systems must run. This asynchrony makes verifying the correctness of systems implementations even more challenging. Tools for building distributed systems must often strike a compromise between reducing programmer effort and increasing system efficiency. In my research, we strive to introduce a limited amount of structure and limitations to implementations to enable a wide range of analysis and development assistance. Most prominently, we have built the Mace language and runtime, which translates a concise, expressive distributed system specification into a C++ implementation. The Mace specification importantly exposes three key pieces of structure: atomic events, explicit state, and explicit messaging.

With a few additional contextual annotations, we show how we can support intra-node parallel event processing of these atomic events while still preserving sequenal event consistency—even using variably available computing resources distributed across a cluster. By leveraging these three structural elements, we have further built tools such as a model checker capable of detecting liveness violations in systems code, a performance tester, and an automated malicious protocol tester. Recent research has also explored applications of these key structures in legacy software, that has produced a log anaysis tool that can detect performance problems, and a malicious fault injector that can discover successful performance attacks. Mace has been in development since 2004 and has been used to build a wide variety of Internet-ready distributed systems both by myself and by researchers at places such as Cornell University, Microsoft Research (Redmond, Silicon Valley, and Beijing), HP Labs, UCLA, EPFL, and UCSD. This talk will give an overview of my research, presenting the execution model and its checker, support for event parallelization, and our more recent testing tools.

Charles Killian is an Assistant Professor in the Department of Computer Science at Purdue University. He received an NSF CAREER award in 2011, as well as an HP Open Innovation award. In 2008 he completed his Ph.D. in Computer Science from the University of California, San Diego under the supervision of Amin Vahdat. Before transferring to UCSD in August 2004, he completed his Masters in Computer Science from Duke University with Amin Vahdat. His systems and networking research focuses on building and testing distributed systems, and bridges this research with software engineering, security, data mining, and programming languages. Since 2004 he has implemented the Mace programming language and runtime, built numerous distributed systems, and designed MaceMC, the first model checker capable of finding liveness violations in unmodified systems code and 2007 best paper award at NSDI. Chip has built many additional tools and enhancements since then, including performance testing, work on parallel event processing, automated attack discovery, and data mining logs to discover performance problems.