AutoML
机器学习在各个领域的成功应用引发了一股人工智能的热潮,但是机器学习的入门门槛过高,为了让非领域专业人员也能顺利的使用机器学习处理大数据问题,AutoML应运而生。
一次项目或业务的数据挖掘都包含有以下几个步骤:数据预处理、特性选择、模型算法选择、调参、上线后模型的再优化、效果评估。为了让机器学习模型的设计变得更加简单,我们希望能让这一过程自动化。AutoML主要完成的功能有两个,一是模型/算法选择,二是模型超参数优化。
整体过程包括:
- 元学习的热启动:在机器学习框架中寻找效果好的算法;计算不同数据集之间的相似度,相似的数据可以采取类似的超参数。
- 超参数优化,算法包括:Hyperopt(TPE 算法);SMAC(基于随机森林);Spearmint。输入不同的超参数为,以损失函数为准确率,调优器会在随机选择一些值的基础上,利用贪心算法去寻优。
比如,现在比较流行的Auto-sklearn,以scikit-learn函数库为基础,包含了15种分类算法,4种数据预处理操作,14种特征提取方法。
auto_ml是Githut上为数不多的为生产环境设计的代码,包含三个极优秀的开源框架:XGBoost,TensorFlow和LightGBM。可用于分类和预测任务,单次预测仅需1毫秒,支持序列化导出模型到本地。仅需几行代码即可构建一个回归模型:
from auto_mlimport Predictor
fromauto_ml.utils import get_boston_dataset
fromauto_ml.utils_models import load_ml_model
# Load data
df_train,df_test = get_boston_dataset()
# Tell auto_mlwhich column is 'output'
# Also notecolumns that aren't purely numerical
# Examplesinclude ['nlp', 'date', 'categorical', 'ignore']
column_descriptions= {
'MEDV': 'output' ,
'CHAS': 'categorical'
}
ml_predictor =Predictor(type_of_estimator='regressor',column_descriptions=column_descriptions)
ml_predictor.train(df_train)
# Score themodel on test data
test_score =ml_predictor.score(df_test, df_test.MEDV)
# auto_ml isspecifically tuned for running in production
# It can getpredictions on an individual row (passed in as a dictionary)
# A singleprediction like this takes ~1 millisecond
# Here we willdemonstrate saving the trained model, and loading it again
file_name =ml_predictor.save()
trained_model =load_ml_model(file_name)
# .predict and.predict_proba take in either:
# A pandasDataFrame
# A list ofdictionaries
# A singledictionary (optimized for speed in production evironments)
predictions =trained_model.predict(df_test)
print(predictions)
参考资料
- https://zhuanlan.zhihu.com/p/26015351
- http://www.cnblogs.com/hdu-zsk/p/5954658.html
- https://blog.csdn.net/u010367506/article/details/23453849
- https://blog.csdn.net/bbbeoy/article/details/72910467
- http://blog.sina.com.cn/s/blog_837f83580102v7bm.html
- https://blog.csdn.net/Alicehzj/article/details/78713914
- 机器学习-周志华
- https://www.jianshu.com/p/ab697790090f
- https://www.cnblogs.com/weihuchao/p/6874683.html
- https://blog.csdn.net/mmc2015/article/details/42459753
- https://blog.csdn.net/xiaozhouchou/article/details/51866685
- https://blog.csdn.net/tCDPYh6sA3/article/details/61191617
- https://blog.csdn.net/shenziheng1/article/details/53547401
- https://blog.csdn.net/shenziheng1/article/details/53637907
- https://blog.csdn.net/m0_37407756/article/details/68059453
- https://blog.csdn.net/maxiemei/article/details/23846871
- http://www.mamicode.com/info-detail-1568956.html
- https://blog.csdn.net/baidu_38060633/article/details/70338345
- https://blog.csdn.net/cht5600/article/details/52355566
- https://zhuanlan.zhihu.com/p/27792859
- Efficient_and_Robust_Automated_Machine_Learning