Why not RAID-0? It’s about Time and Snowflakes

http://hortonworks.com/blog/why-not-raid-0-its-about-time-and-snowflakes/

Reliability

Before panicking – disk failures are rare. Google’s 2007 paper, Failure Trends in a Large Disk Drive Population, reported that in their datacenters, 1.7% of disks failed in the first year of their life, while three-year-old disks were failing at a rate of 8.6%. About 9% isn’t a good number.（超过三年的硬盘发生问题的概率在9%） 8块超过3年的磁盘同时使用出现问题的概率在1-（1-0.086）^8 = 0.513，这个几率还是相当高的。这个还不是主要的问题，因为JBOD: Just a Box of Disks也会遇到这个问题。

主要问题是，如果一旦一块磁盘出现问题的话，那么所有的磁盘上的数据都需要进行replication.因为RAID0是strip存储的，每个disk上面可能存储一个small block（64KB），而HDFS使用64MB作为block。这就意味着1个HDFS block在10 RAID0 disks上面的话会分摊在10个disk上面，如果一个disk出现问题的话，那么所有的HDFS block都发生损坏就都要进行replication

Every Disk is a Unique Snowflake

On RAID-0 Storage the disk accesses go at the rate of the slowest disk. RAID0带宽瓶颈限制在slowest disk上面
The 2011 paper, Disks Are Like Snowflakes: No Two Are Alike, measured the performance of modern disk drives, and discovered that they can vary in data IO rates by 20%, even when they are all writing to same part of the hard disk.
if you have eight disks, some will be faster than the others, right from day one. And your RAID-0 storage will deliver the performance of the slowest disk right from the day you unpack it from its box and switch it on.