Adam Oliner, Stanford University A Scientific Approach to Systems Reliability In order to build reliable systems, we must measure and analyze existing systems. Specifically, if we hope to prevent, detect, and predict failures, we must first understand the effects of interactions among components. My work infers dependencies by finding anomalies correlated in time. I use data from complex production systems including supercomputers, autonomous driving vehicles, and software application communities. In this talk, I will discuss recent work aimed at detecting alerts in supercomputer system logs. Using an algorithm that looks for high-entropy log regions (in the information theoretic sense), we automatically detect 92% of alerts with 90% precision on one of our data sets---this yields a maximal F1 score that is nearly ten times that of current practice. We believe this is both the largest-ever study of systems logs and the first reproducible performance baseline for the task of alert detection.