DagliJava 机器学习函数库
Dagli 是 LinkedIn 开源的用于 Java(和其他 JVM 语言)的机器学习函数库,其开发团队表示通过它可轻松编写不易出错、可读、可修改、可维护且易于部署的模型管道,而不会引起技术债。Dagli 充分利用了现代多核的 CPU 和功能日益强大的 GPU,可以对真实世界模型进行有效的单机训练。
下面是一个文本分类器的介绍性示例,此文本分类器以管道的形式实现,使用梯度增强决策树模型 (XGBoost) 的主动学习以及高维 ngram 集作为逻辑回归分类器中的特征:
Placeholder<String> text = new Placeholder<>();
Placeholder<LabelType> label = new Placeholder<>();
Tokens tokens = new Tokens().withInput(text);
NgramVector unigramFeatures = new NgramVector().withMaxSize(1).withInput(tokens);
Producer<Vector> leafFeatures = new XGBoostClassification<>()
.withFeaturesInput(unigramFeatures)
.withLabelInput(label)
.asLeafFeatures();
NgramVector ngramFeatures = new NgramVector().withMaxSize(3).withInput(tokens);
LiblinearClassification<LabelType> prediction = new LiblinearClassification<LabelType>()
.withFeaturesInput().fromVectors(ngramFeatures, leafFeatures)
.withLabelInput(label);
DAG2x1.Prepared<String, LabelType, DiscreteDistribution<LabelType>> trainedModel =
DAG.withPlaceholders(text, label).withOutput(prediction).prepare(textList, labelList);
LabelType predictedLabel = trainedModel.apply("Some text for which to predict a label", null);
// trainedModel now can be serialized and later loaded on a server, in a CLI app, in a Hive UDF...
评论