DagliJava 机器学习函数库-技术圈

Dagli 是 LinkedIn 开源的用于 Java（和其他 JVM 语言）的机器学习函数库，其开发团队表示通过它可轻松编写不易出错、可读、可修改、可维护且易于部署的模型管道，而不会引起技术债。Dagli 充分利用了现代多核的 CPU 和功能日益强大的 GPU，可以对真实世界模型进行有效的单机训练。

下面是一个文本分类器的介绍性示例，此文本分类器以管道的形式实现，使用梯度增强决策树模型 (XGBoost) 的主动学习以及高维 ngram 集作为逻辑回归分类器中的特征：

Placeholder<String> text = new Placeholder<>();
Placeholder<LabelType> label = new Placeholder<>(); 
Tokens tokens = new Tokens().withInput(text);

NgramVector unigramFeatures = new NgramVector().withMaxSize(1).withInput(tokens);
Producer<Vector> leafFeatures = new XGBoostClassification<>()
    .withFeaturesInput(unigramFeatures)
    .withLabelInput(label)
    .asLeafFeatures();

NgramVector ngramFeatures = new NgramVector().withMaxSize(3).withInput(tokens);
LiblinearClassification<LabelType> prediction = new LiblinearClassification<LabelType>()
    .withFeaturesInput().fromVectors(ngramFeatures, leafFeatures)
    .withLabelInput(label);

DAG2x1.Prepared<String, LabelType, DiscreteDistribution<LabelType>> trainedModel = 
    DAG.withPlaceholders(text, label).withOutput(prediction).prepare(textList, labelList);

LabelType predictedLabel = trainedModel.apply("Some text for which to predict a label", null);
// trainedModel now can be serialized and later loaded on a server, in a CLI app, in a Hive UDF...