Fallacies of Distributed Computing Explained

设计分布式计算/系统的几种误区

  1. The network is reliable.
  2. Latency is zero.
  3. Bandwidth is infinite.
  4. The network is secure.
  5. Topology doesn't change.
  6. There is one administrator.
  7. Transport cost is zero.
  8. The network is homogeneous.

我的理解是,突然面对这么多计算/存储资源,很容易产生容量大可以浪费的错觉。 但是资源并不是简单叠加的,想要线性地扩展使用这些资源,需要非常小心和精巧的设计。 如果只是想着把一群高性能机器通过高性能网络连接就是分布式系统,这种想法是错误的。


Bandwidth is infinite

网络上传输大量数据不是新鲜的事情,所以带宽并不是无限的。

However, there are two forces at work to keep this assumption a fallacy. One is that while the bandwidth grows, so does the amount of information we try to squeeze through it. VoIP, videos, and IPTV are some of the newer applications that take up bandwidth. Downloads, richer UIs, and reliance on verbose formats (XML) are also at work– especially if you are using T1 or lower lines. However, even when you think that this 10Gbit Ethernet would be more than enough, you may be hit with more than 3 Terabytes of new data per day (numbers from an actual project).

The main implication then is to consider that in the production environment of our application there may be bandwidth problems which are beyond our control. And we should bear in mind how much data is expected to travel over the wise.


The network is secure

安全需要一个多层次的解决方案,在网络,基础设施以及应用层面上都需要应对。可能会遇到什么安全风险,有什么方法可以缓解和规避。

The implications of network (in) security are obvious–you need to build security into your solutions from Day 1. I mentioned in a previous blog post that security is a system quality attribute that needs to be taken into consideration starting from the architectural level. There are dozens of books that talk about security and I cannot begin to delve into all the details in a short blog post.

In essence you need to perform threat modeling to evaluate the security risks. Then following further analyses decide which risk are should be mitigated by what measures (a tradeoff between costs, risks and their probability). Security is usually a multi-layered solution that is handled on the network, infrastructure, and application levels.

As an architect you might not be a security expert–but you still need to be aware that security is needed and the implications it may have (for instance, you might not be able to use multicast, user accounts with limited privileges might not be able to access some networked resource etc.)


Topology doesn't change

拓扑结构会发生变化,所以需要服务注册和发现机制。

When you're talking about clients, the situation is even worse. There are laptops coming and going, wireless ad-hoc networks , new mobile devices. In short, topology is changing constantly.

What does this mean for the applications we write? Simple. Try not to depend on specific endpoints or routes, if you can't be prepared to renegotiate endpoints. Another implication is that you would want to either provide location transparency (e.g. using an ESB, multicast) or provide discovery services (e.g. a Active Directory/JNDI/LDAP).


Transport cost is zero

即便你可以使用最先进的技术将这个服务搭建起来,你还需要考虑使用这些组件的成本如何,是否cost-effective.

Imagine you have successfully built Dilbert's Google-killer search engine [Adams] (maybe using latest Web 2.0 bells-and-whistles on the UI) but you will fail if you neglect to take into account the costs that are needed to keep your service up, running, and responsive (E3 Lines, datacenters with switches, SANs etc.). The takeaway is that even in situations you think the other fallacies are not relevant to your situation because you rely on existing solutions ("yeah, we'll just deploy Cisco's HSRP protocol and get rid of the network reliability problem") you may still be bounded by the costs of the solution and you'd need to solve your problems using more cost-effective solutions.