Use Random Forest: Testing 179 Classifiers on 121 Datasets
http://machinelearningmastery.com/use-random-forest-testing-179-classifiers-121-datasets/
对于中小规模数据分类问题(实际上这就是大部分我们遇到的场景),RF和Gaussian-SVM应该是首先应该尝试的模型/算法。
最后作者给出了选择模型/算法一个比较实际的建议,那就是choose a middle ground. 这里middle ground是相对于其他两种办法而言的:
- Pick your favorite algorithm. Fast but limited to whatever your favorite algorithm or library happens to be. # 只选择自己喜欢算法
- Spot check a dozen algorithms. A balanced approach that allows better performing algorithms to rise to the top for you to focus on. # 随机抽查其他算法决定是否要继续深入下去
- Test all known/implemented algorithms. Time consuming exhaustive approach that can sometimes deliver surprising results. # 尝试所有算法