Using Influence to Understand Complex Systems
Author | : Adam Jamison Oliner |
Publisher | : Stanford University |
Total Pages | : 153 |
Release | : 2011 |
ISBN-10 | : STANFORD:bp425gr1329 |
ISBN-13 | : |
Rating | : 4/5 (29 Downloads) |
Book excerpt: This thesis is concerned with understanding the behavior of complex systems, particularly in the common case where instrumentation data is noisy or incomplete. We begin with an empirical study of logs from production systems, which characterizes the content of those logs and the challenges associated with analyzing them automatically, and present an algorithm for identifying surprising messages in such logs. The principal contribution is a method, called influence, that identifies relationships among components---even when the underlying mechanism of interaction is unknown---by looking for correlated surprise. Two components are said to share an influence if they tend to exhibit surprising behavior that is correlated in time. We represent the behavior of components as surprise (deviation from typical or expected behavior) over time and use signal-processing techniques to find correlations. The method makes few assumptions about the underlying systems or the data they generate, so it is applicable to a variety of unmodified production systems, including supercomputers, clusters, and autonomous vehicles. We then extend the idea of influence by presenting a query language and online implementation, which allow the method to scale to systems with hundreds of thousands of signals. In collaboration with system administrators, we applied these tools to real systems and discovered correlated problems, failure cascades, skewed clocks, and performance bugs. According to the administrators, it also generated information useful for diagnosing and fixing these issues.