Scratch wiki for Machine Learning Ideas

Description

pipelines
data manipulation
versioning
experiments

Project Flow

I can think of three major workflows, all of which we could readily support with minimal effort:

Natively calling the MLops management library to do any necessary actions
Calling an RS script that provides an interface to the underlying MLops management that can be used for native scripts, with a limited API, for use by RS scripting
Automatic notebook generation using a CLI or web-based tool that generates a notebook with the necessary MLops framework

The issue is that most tools rely on the concept of experiments, which consist of a network design + hyperparameters, often configured in a YAML file that often includes conda and docker configs, and dataset generation or selection

Obviously this doesn't work too well if things are highly in flux, but separating the concept of dataset generation and management from the actual design is highly important for much of what we do

Requirements

Don't tie a scientist down to one specific tool; at least have a highly up-to-date keras/TF and pytorch framework available
For deployment, have the ability to script a workflow and test using a locked in version of any of the packages
Have a model library that is easily accessible to store models for both training and inference
Have the ability to convert anything to ONNX
Have a solid API between the training layer and the deployment layer to be able to

Data Ingest

We "need" a more solid data ingest and manipulation versioning and framework to allow for multiple scientists to work on the same datasets with the same expectations

Current MLops toolchains

MLflow
kubeflow
k3ai - auto deployment for kubeflow and MLflow
NVIDIA Triton Inference Server (Excellent Presentation)
<fill me in>

Workflow Solutions

Apache Airflow
kubeflow
CWL

Several non-free, non-completely-open-source, or commercial solutions are also available

OpenVINO model zoo and Intel tooling
Xilinx model store
Custom backends galore (AWS sagemaker)

Provide feedback

Saved searches