7 Links To Convince You That Big Data Isn't Your Problem

pdf

And yes, big data technologies still have a place in today's world: buzzwords have always been a part of business, and somebody has to purchase all of these disks and RAM and CPU (packed nicely into "commodity servers"). But if you are a penniless startup worrying about how will you close the gap and get in the game, rest assured. The game is mostly imaginary as always!

The first one is that big data is not always necessary to develop a good datadriven application. Let's say we are trying to create personalized messages for ecommerce websites. Is the entire history of all users necessary? Maybe just the last 3 months? How much do they weight? Is a sample of the data enough for building a good model (rather than the entire corpus)? Is the entire corpus tagged (labelled) so that we can use it for building our model? This is not always the situation. Quantity can not always cover for quality.

https://www.kdnuggets.com/2015/08/largest-dataset-analyzed-more-gigabytes-petabytes.html

The second reason is that most big data systems are aimed at being scalable. That means that such a system deployed on two machines (working together on the computations) will perform better than the same system deployed on a single machine (and it goes on: 3 machines are better than 2, 4 are better than 3…etc). The more machines also allow for robustness if one of the machines die in a Hadoop cluster, the others can pick up from where it left and finish the job. So big data systems are usually scalability-oriented. Nice! But do they perform better than a non-scalability-aimed, single-thread based system?

https://blog.acolyer.org/2015/06/05/scalability-but-at-what-cost/

http://www.frankmcsherry.org/graph/scalability/cost/2015/02/04/COST2.html

Here is the (probably copyrighted - sorry about that!) most important part in my opinion: The company has an advantage in deeplearning algorithms for speech recognition in that most video and audio in China is accompanied by text - nearly all news clips, television shows and films are close-captioned and almost all are available to Baidu and Iqiyi, its video affiliate.

So in this domain (speech recognition), using this algorithm (deep learning), big data of this high quality (closed captions) is successfully used. This also works well with image recognition done in neural networks over large corpora of tagged images. Google and Facebook have such datasets and they do it successfully. But not every company is Google or Facebook. Are you?