Laws of Performant Software
http://tagide.com/blog/advice/laws-of-peformant-software/
Programming language << Programmers’ awareness of performance. The programming language doesn’t matter as much as the programmers’ awareness about the implementation of that language and its libraries. These days, all mainstream programming languages and their standard libraries are pretty optimized, and can be used to write performant code in a large range of application domains. They can also be used to write horribly performing code. For better or for worse, the high-level languages provide a large surface area of candy features and libraries that are really awesome to use… until you realize they require huge amounts of memory, or have a super-linear behavior with size of input. It is critical that people question “how does this magic actually work?,” go search for the answer, and figure out the best way of scaling things if the convenient candy is not as good as needed. There is usually another, better performing way of doing it even in high-level programming languages. (The main reason why C/C++ programmers don’t run into this as often is because there is an appalling lack of candy in the C/C++ ecosystem… performance isn’t hidden – nothing is!)
程序员的性能意识,远比选择什么样的编程语言重要。主流的程序设计语言和库都已经被充分优化了,但是高级程序设计语言总会添加一些语法糖或者是提供一些fancy的功能。如果你对这些语法糖或者fancy功能的cost没有了解的话,那么很容易就写出性能低下的代码。因此对于这些语法糖或者是fancy功能,一定要弄清楚背后是怎么工作的。
Small design details matter. Here, x and y denote two versions of the same code, and d(x, y) is the difference between them. fτ(x) and fτ(y) are the effects of running that code after a certain time, and d(fτ(x), fτ(y)) is the difference between them. What this law says is that small code differences can have huge effect differences. Some may recognize this law from chaotic systems (or just follow the link!). Are you producing too much garbage unnecessarily? Are you caching things properly? Are you getting rid of useless network packets/requests as early as possible? Are you overusing serialization or using a slow serializer? Are your dictionaries too big to the point of being inefficient? In high-level languages, unless your application design is seriously broken, you don’t need big code changes to make your program perform better; very small code changes can have huge consequences for performance. For example, a cache can be added in 10 LOCs, string concats can be replaced by string builders in the same amount of lines, filters can be added in a couple of lines, etc. Important design decisions, small code changes, big effects!
细节决定成败。索性的是,大部分性能问题都集中于很少一部分代码,而high-level语言可以很容易地重构掉这段代码。
corr(performance degradation, unbounded resource usage) > 0.9. This law says that there is a very high correlation between performance degradation, including crashes, and the unbounded use of resources. The slope from writing small apps to writing robust programs that survive all sorts of abuse from uncontrolled input is very steep, and requires a mindset focused on operation rather than on function. When you don’t limit the use of resources, chances are they will be exhausted. Does your program start ad-hoc threads? Use a threadpool with fixed size. Do you need in-memory dictionaries for fast lookups? Use a bounded cache. Do threads pass data among themselves? Use blocking queues whose capacity is a function of the max amount of data that can potentially be waiting without exhausting the memory. Does your program read text data from the file system? Unless you are 100% sure the lines are always of reasonable size, do not use readline. Limit, limit, limit! Don’t use resources in an unbound manner, or the operation of your program will degrade very quickly after a certain threshold.
控制资源使用
Performance improvements = log (controlled experiments). It seems pretty obvious, but it’s amazing how many times I’ve seen people not being able answer simple questions about their programs such as “how long does this function actually take to run?” or “how much memory does this data structure entail?” Not knowing is not a problem; being oblivious of the need to find out is a big problem! It is imperative to build small, controlled benchmarks with the purpose of measuring things, so to make estimates from there, and to identify and fix performance bottlenecks. This can be done with the help of profilers, too, if you don’t know where the bottlenecks are. But there’s a law of diminishing returns on these experiments: a few parts of the code are bottlenecks, and need to be tuned carefully; most parts don’t contribute much to performance, so there’s not much value in measuring those. A few, meaningfully placed performance measurements can get you 90% of the value of measuring things.
性能改进要做多做实验,多做profiling.
N*bad ≠ good. What this says is that, with sufficiently large data, no amount of nodes, cores, or memory will save you from code written with performance-related design flaws. This law comes from seeing people (students and professionals) using multi-core servers, map-reduce clusters, HPCs, and servers with large RAMs in order to make their software run faster… without first doing a proper job at making the sequential version run as well as it can on a small memory footprint. I have seen systems that took days to finish in a multi-core server and large allocated RAM being reduced to hours in just 1 core and smaller memory due to rewriting the code with performance in mind. Code that doesn’t perform, or that crashes, in a single core with a just few G of RAM will be equally bad in more powerful hardware.
几个坏东西凑在一起并不意味着好,分布式不一定就不单机要能解决问题。可以看这篇文章 Scalability! But at what COST?