-
Notifications
You must be signed in to change notification settings - Fork 4
Scratch wiki for Machine Learning Ideas
- pipelines
- data manipulation
- versioning
- experiments
I can think of three major workflows, all of which we could readily support with minimal effort:
- Natively calling the MLops management library to do any necessary actions
- Calling an RS script that provides an interface to the underlying MLops management that can be used for native scripts, with a limited API, for use by RS scripting
- Automatic notebook generation using a CLI or web-based tool that generates a notebook with the necessary MLops framework
The issue is that most tools rely on the concept of experiments, which consist of a network design + hyperparameters, often configured in a YAML file that often includes conda and docker configs, and dataset generation or selection
Obviously this doesn't work too well if things are highly in flux, but separating the concept of dataset generation and management from the actual design is highly important for much of what we do
- Don't tie a scientist down to one specific tool; at least have a highly up-to-date keras/TF and pytorch framework available
- For deployment, have the ability to script a workflow and test using a locked in version of any of the packages
- Have a model library that is easily accessible to store models for both training and inference
- Have the ability to convert anything to ONNX
- Have a solid API between the training layer and the deployment layer to be able to
We "need" a more solid data ingest and manipulation versioning and framework to allow for multiple scientists to work on the same datasets with the same expectations
- MLflow
- kubeflow
- k3ai - auto deployment for kubeflow and MLflow
- NVIDIA Triton Inference Server (Excellent Presentation)
- <fill me in>
Workflow Solutions
- Apache Airflow
- kubeflow
- CWL
Several non-free, non-completely-open-source, or commercial solutions are also available
- OpenVINO model zoo and Intel tooling
- Xilinx model store
- Custom backends galore (AWS sagemaker)