Thoughts on Systems for Large Datasets: Problems and Opportunities

感觉从2012年之后,jeff工作开始从构建分布式数据库系统,转向构建机器学习和深度学习的分布式系统。这篇presentation后半部分谈到了如何从大量数据,尤其是大量的非结构化的互联网数据里面,找到有价值的信息(通过deep learning)

Areas I Wish New Grads Knew More About

Roughly in two main areas: – issues that arise in building systems that store and manipulate large datasets – automatically extracting higher-level information from raw data


关于存储和使用大量数据集的分布式系统的几个问题:


处理半结构化数据.

Plenty of Data