AutoML

机器学习在各个领域的成功应用引发了一股人工智能的热潮,但是机器学习的入门门槛过高,为了让非领域专业人员也能顺利的使用机器学习处理大数据问题,AutoML应运而生。

一次项目或业务的数据挖掘都包含有以下几个步骤:数据预处理、特性选择、模型算法选择、调参、上线后模型的再优化、效果评估。为了让机器学习模型的设计变得更加简单,我们希望能让这一过程自动化。AutoML主要完成的功能有两个,一是模型/算法选择,二是模型超参数优化。

整体过程包括:

  1. 元学习的热启动:在机器学习框架中寻找效果好的算法;计算不同数据集之间的相似度,相似的数据可以采取类似的超参数。
  2. 超参数优化,算法包括:Hyperopt(TPE 算法);SMAC(基于随机森林);Spearmint。输入不同的超参数为,以损失函数为准确率,调优器会在随机选择一些值的基础上,利用贪心算法去寻优。

image

比如,现在比较流行的Auto-sklearn,以scikit-learn函数库为基础,包含了15种分类算法,4种数据预处理操作,14种特征提取方法。

image

auto_ml是Githut上为数不多的为生产环境设计的代码,包含三个极优秀的开源框架:XGBoost,TensorFlow和LightGBM。可用于分类和预测任务,单次预测仅需1毫秒,支持序列化导出模型到本地。仅需几行代码即可构建一个回归模型:

from auto_mlimport Predictor
fromauto_ml.utils import get_boston_dataset
fromauto_ml.utils_models import load_ml_model

# Load data
df_train,df_test = get_boston_dataset()

# Tell auto_mlwhich column is 'output'
# Also notecolumns that aren't purely numerical
# Examplesinclude ['nlp', 'date', 'categorical', 'ignore']

column_descriptions= {
  'MEDV': 'output' ,
 'CHAS': 'categorical'
}

ml_predictor =Predictor(type_of_estimator='regressor',column_descriptions=column_descriptions)

ml_predictor.train(df_train)

# Score themodel on test data

test_score =ml_predictor.score(df_test, df_test.MEDV)

# auto_ml isspecifically tuned for running in production
# It can getpredictions on an individual row (passed in as a dictionary)
# A singleprediction like this takes ~1 millisecond
# Here we willdemonstrate saving the trained model, and loading it again

file_name =ml_predictor.save()

trained_model =load_ml_model(file_name)

# .predict and.predict_proba take in either:
# A pandasDataFrame
# A list ofdictionaries
# A singledictionary (optimized for speed in production evironments)

predictions =trained_model.predict(df_test)

print(predictions)

参考资料

  1. https://zhuanlan.zhihu.com/p/26015351
  2. http://www.cnblogs.com/hdu-zsk/p/5954658.html
  3. https://blog.csdn.net/u010367506/article/details/23453849
  4. https://blog.csdn.net/bbbeoy/article/details/72910467
  5. http://blog.sina.com.cn/s/blog_837f83580102v7bm.html
  6. https://blog.csdn.net/Alicehzj/article/details/78713914
  7. 机器学习-周志华
  8. https://www.jianshu.com/p/ab697790090f
  9. https://www.cnblogs.com/weihuchao/p/6874683.html
  10. https://blog.csdn.net/mmc2015/article/details/42459753
  11. https://blog.csdn.net/xiaozhouchou/article/details/51866685
  12. https://blog.csdn.net/tCDPYh6sA3/article/details/61191617
  13. https://blog.csdn.net/shenziheng1/article/details/53547401
  14. https://blog.csdn.net/shenziheng1/article/details/53637907
  15. https://blog.csdn.net/m0_37407756/article/details/68059453
  16. https://blog.csdn.net/maxiemei/article/details/23846871
  17. http://www.mamicode.com/info-detail-1568956.html
  18. https://blog.csdn.net/baidu_38060633/article/details/70338345
  19. https://blog.csdn.net/cht5600/article/details/52355566
  20. https://zhuanlan.zhihu.com/p/27792859
  21. Efficient_and_Robust_Automated_Machine_Learning

results matching ""

    No results matching ""