MapReduce: A major step backwards(MapReduce:一个巨大的倒退)- II

Table of Contents

http://www.pgsqldb.org:8079/mwiki/index.php/MapReduce_II

先暂停抱怨几分钟,我们将就上一篇文章的回复中反复出现的特定专题进行一些回答:

1. MapReduce is not a database system, so don't judge it as one

作者给MapReduce实现上提出了4个建议点,认为可以极大提提升性能,而这些特性在PDBMS里面都有:

  • Indexing.
  • Data movement. computation和data尽量在相同的node上面提高data locality
  • Column representation. 减少IO
  • Push, not pull. 除去不必要的物化减少IO

2. MapReduce has excellent scalability; the proof is Google's use

  • Linear scaleup is the gold standard for measuring the scalability of data intensive applications. As far as we are aware there are no published papers that study the scalability of MapReduce in a controlled scientific fashion. MapReduce may indeed scale linearly, but we have not seen published evidence of this. mapreduce依然没有达到linear scaleup水平

3. MapReduce is cheap and databases are expensive

4. We are the old guard trying to defend our turf/legacy from the young turks

Since both of us are among the "gray beards" and have been on this earth about 2 Giga-seconds, we have seen a lot of ideas come and go. We are constantly struck by the following two observations: 因为我俩都是“白胡子”老头了,已经在这个地球上呆了超过2G秒了,我们看到过很多主意的产生和消失。而且我们经常被下面两个现象所烦恼:

  • How insular computer science is. The propagation of ideas from sub-discipline to sub-discipline is very slow and sketchy. Most of us are content to do our own thing, rather than learn what other sub-disciplines have to offer. 计算机科学是多么地孤立。观念从一个子学科传播到另外一个子学科是非常缓慢且残缺的。我们中大多数人都只想做自己的事情,而不是从其它子学科学习已经具备的东西。
  • How little knowledge is passed from generation to generation. In a recent paper entitled "What goes around comes around," (M. Stonebraker/J. Hellerstein, Readings in Database Systems 4th edition, MIT Press, 2004) one of us noted that many current database ideas were tried a quarter of a century ago and discarded. However, such pragma does not seem to be passed down from the "gray beards" to the "young turks." The turks and gray beards aren't usually and shouldn't be adversaries. 代与代之间相传的知识是如此之少。在最近的一篇题为“似水流年“("What goes around comes around," (M. Stonebraker/J. Hellerstein, Readings in Database Systems 4th edition, MIT Press, 2004) )的文章中,我们中的一个提到了很多现代数据库的观点都曾经在四分之一世纪之前尝试过并被抛弃掉。但是,这些试验仿佛并没有从”白胡子“老头传授给”年轻一辈“。年轻或者年长通常不是也不应该成为对立面。