-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Intro
ML.NET is not yet very well equipped for some natural-language processing (NLP) workloads. While there is already support for basic processing steps (tokenization, stop word removal, ...) and sentiment, other, higher-level workloads are not yet supported.
Feature request
- Pre-trained embeddings, like BERT or GloVe, for documents are useful for down-level tasks.
- It'd be even better if there was an easy to use API to tune or train custom models on custom datasets.
Use case
In our specific use case, we develop document classifiers. We only have a limited set of labeled documents to train with. Our plan is to use a pretrained or trained document embeddings, and learn a simple classifier on top, using the labeled documents.
Workarounds
There is already a project that runs BERT as ONNX on top of ML.NET, see https://github.com/GerjanVlot/BERT-ML.NET. I'd like to see this become an official part of ML.NET, with a good API, properly maintained and updated.
Outlook
These models are building-blocks for other features, like entity recognition (#630). Ideally ML.NET would support many more NLP tasks, as listed in https://github.com/microsoft/nlp-recipes#content. Generally, we notice an uptake in NLP-related project inquiries.