Always Measure One Level Deeper

作者是JOHN OUSTERHOUT, 主页在 https://web.stanford.edu/~ouster/cgi-bin/home.php 搞过不少系统,最近比较知名的系统就是Raft算法和RAMCloud. 这篇文章作者主要抱怨系统性能分析不够细致比较粗糙,同时也错过了不少系统优化和对系统更加深入理解的机会。

Key Insights:

一次好的性能测试可以加深对系统理解,还有助于提升研究人员的直觉。

A good performance evaluation provides a deep understanding of a system’s behavior, quantifying not only the overall behavior but also its internal mechanisms and policies. It explains why a system behaves the way it does, what limits that behavior, and what problems must be addressed in order to improve the system. Done well, perfor- mance evaluation exposes interesting system properties that were not obvi- ous previously. It not only improves the quality of the system being measured but the developer’s intuition, resulting in better systems in the future.

作者接着提出了5个常见错误

然后提出了4点改进意见:

If you want to understand the performance of a system at a particular level, you must measure not just that level but also the next level deeper. That is, measure the underlying factors that contribute to the performance at the higher level. If you are measuring over- all latency for remote procedure calls, you could measure deeper by break- ing down that latency, determining how much time is spent in the client machine, how much time is spent in the network, and how much time is spent on the server. You could also measure where time is spent on the client and server. If you are measuring the overall throughput of a system, the system probably con- sists of a pipeline containing several components. Measure the utilization of each component (the fraction of time that component is busy). At least one component should be 100% utilized; if not, it should be possible to achieve a higher throughput.

Measuring deeper is the best way to validate top-level measurements and uncover bugs. Once you have col- lected some deeper measurements, ask yourself whether they seem consistent with the top-level measurements and with each other. You will almost certainly discover things that do not make sense; make additional measurements to resolve the contradictions.

Measuring deeper will also indicate whether you are getting the best possi- ble performance and, if not, how to im- prove it. Use deeper measurements to find out what is limiting performance. Try to identify the smallest elements that have the greatest impact on overall performance. For example, if the over- all metric is latency, measure the indi- vidual latencies of components along the critical path; typically, there will be a few components that account for most of the overall latency. You can then fo- cus on optimizing those components.

Do not spend a lot of time agoniz- ing over which deeper measurements to make. If the top-level measurements contain contradictions or things that are surprising, start with measurements that could help resolve them. Or pick measurements that will identify per- formance bottlenecks. If nothing else, choose a few metrics that are most ob- vious and easiest to collect, even if you are not sure they will be particularly illuminating. Once you look at the results, you will almost certainly find things that do not make sense; from this point on, track down and resolve everything that does not make perfect sense. Along the way you will discover other surprises; track them down as well. Over time, you will develop intuition about what kinds of deeper measurements are most likely to be fruitful.

Measuring deeper is the single most important ingredient for high-quality performance measurement. Focusing on this one rule will prevent most of the mistakes anyone could potentially make. For example, in order to make deeper measurements you will have to allocate extra time. Measuring deeper will expose bugs and inconsistencies, so you will not accidentally trust bogus data. Most of the suggestions under Rule 2 (Never trust a number generated by a computer) are actually examples of measuring deeper. You will never need to guess the reasons for performance, since you will have actual data. Your measurements will not be superficial. Finally, you are less likely to be derailed by subconscious bias, since the deeper measurements will expose weakness- es, as well as strengths.