Building a Production Machine Learning Infrastructure
http://machinelearningmastery.com/building-a-production-machine-learning-infrastructure/
He says that there are two types of data scientist, the first type is a statistician that got good at programming. The second is a software engineer who is smart and got put on interesting projects. He says that he himself is this second type of data scientist.
He comments that academic machine learning is basically applied mathematics, specifically applied optimization theory, and this is how it is taught in an academic setting and in text books. Industrial machine learning is different.(学术界的机器学习主要是应用数学,尤其是应用优化理论,而工业界的机器学习则完全不同):
- Systems come before algorithms. In academic machine learning, accuracy take priority, at the expense of long run times. In industry, faster is always better and slower has to be justified, meaning accuracy can often take a back seat.(系统优先于算法。学术重于精度不在乎运行时间,而实践中运行速度则比精度更加重要)
- Objective functions are messy. Academic machine learning is all about optimizing objective function. Clean objective functions do not exist, and typically there are many and conflicting functions requiring a Pareto multiple-objective approach (make an improvement to one without negatively affecting the others).(目标函数非常模糊。实践中非常清晰的目标根本不存在,目标是许多相互冲突函数使用帕累托多目标方法(???)合成)
- Everything is changing. The systems are complex and no one person understands all of it.(一切都在变化着,系统非常复杂没有一个人能够完全了解它)
- Understanding-optimization trade-off. A process of coming up with hypotheses, testing them and improving the system. Understanding is often more important than better results. Experiments drive understanding.(理解和优化之间trade-off. 整个过程是不断地提出假设,验证然后改进系统。理解系统为什么这样远比得到好的结果重要,不断地实验可以加深理解)