Notes on Distributed Systems for Young Bloods
http://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/
The worst characteristic of this list is that it focuses on technical problems with little discussion of social problems an engineer may run into. Since distributed systems require more machines and more capital, their engineers tend to work with more teams and larger organizations. The social stuff is usually the hardest part of any software developer’s job, and, perhaps, especially so with distributed systems development. # 分布式系统通常是由很多团队来合作开发部署的,所以工程师之间的协作就显得更加重要。
- Distributed systems are different because they fail often. Design for failure.
- Writing robust distributed systems costs more than writing robust single-machine systems. distributed systems tend to need actual, not simulated, distribution to flush out their bugs.
- Robust, open source distributed systems are much less common than robust, single-machine systems. # 分布式系统需要真实的环境(成百上千机器的集群)的考验,所以普通工程师很难给出稳定的实现,所以社区的工程师通常是来自大公司的开发者。但是大公司优先级可能和你的公司的优先级不同,所以导致即使软件出现某些问题并且社区意识到了,这个问题也不一定会被修复。
- Coordination is very hard.
- If you can fit your problem in memory, it’s probably trivial.
- “It’s slow” is the hardest problem you’ll ever debug. # 对于性能问题难以分析的原因主要是,我们很难确定整个pipeline中每个部分执行时间。Dapper and Zipkin就是用来解决这类问题的。
- Implement backpressure throughout your system. # 如果没有过载保护,可能会导致级联故障。
- Find ways to be partially available.
- Metrics are the only way to get your job done. # 观察分布式系统,最直接有效的方式就是先观察各个metrics, 而调试分布式系统,最直接有效的办法则是分析log, 但是需要各种metrics来做支持。
- Use percentiles, not averages.
- Learn to estimate your capacity. # 容量规划
- Feature flags are how infrastructure is rolled out. # 使用feature flags(特性开关)来不断迭代整个系统
- Choose id spaces wisely.
- Exploit data-locality.
- Writing cached data back to persistent storage is bad.
- Computers can do more than you think they can. # 2012年底,一个轻量级的webserver,6 processors, 24GB, 承载相对比较复杂的CRUD应用,完全可以做到>1k QPS(< 100ms).
- Use the CAP theorem to critique systems. # 你只能在CA之间做选择
(但是并不意味着整个系统都必须在CA之间选择,我们可以限定到单个请求上) - Extract services.