Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to why and use cases #580

Merged
merged 2 commits into from
Oct 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 5 additions & 7 deletions docs/src/docs/use-cases.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,15 @@

KitOps is the market's only open source, standards-based packaging and versioning system designed for AI/ML projects. Using the OCI standard allows KitOps to be painlessly adopted by any organization using containers and enterprise registries today (see a partial list of [compatible tools](./modelkit/compatibility.md)).

Today AI/ML development in enterprises relies on artifacts that are tightly coupled, but versioned and stored separately:
* Models in Jupyter notebooks or MLOps tools
* Datasets in data lakes, databases, or files systems
* Code in git repositories
* Metadata (hyperparameters, features, weights, etc...) in various locations based on their type
Organizations around the world are using KitOps as a "gate" in the [handoff between development and production](#level-1-handoff-from-development-to-production-).

For this reason, organizations around the world are using KitOps as a "gate" between development and production. Those who are concerned about end-to-end auditing of their model development (like those in regulated industries, or under the jurisdiction of the [EU AI Act](https://artificialintelligenceact.eu/) extend KitOps usage to security and development use cases (see [Level 2](#level-2-adding-security-️) and [Level 3](#level-3-storage-for-all-ai-project-versions-) use cases below.)
Those who are concerned about end-to-end auditing of their model development - like those in regulated industries, or under the jurisdiction of the [EU AI Act](https://artificialintelligenceact.eu/) extend KitOps usage to security and development use cases (see [Level 2](#level-2-adding-security-️) and [Level 3](#level-3-storage-for-all-ai-project-versions-) use cases below.

## Level 1: Handoff From Development to Production 🤝

Organizations are having AI teams build a [ModelKit](./modelkit/intro.md) for each version of the AI project that is going to staging, user acceptance testing (UAT), or production. KitOps is ideally suited to CI/CD pipelines (e.g., using [KitOps in a GitHub Action](https://dev.to/kitops/introducing-the-new-github-action-for-using-kit-cli-on-mlops-pipelines-21ia)) either triggered manually by the model development team when they're ready to send the model to production, or automatically when a model or its artifacts are updated in their respective repositories.
Organizations are having AI teams build a [ModelKit](./modelkit/intro.md) for each version of the AI project that is going to staging, user acceptance testing (UAT), or production.

KitOps is ideally suited to CI/CD pipelines (e.g., using [KitOps in a GitHub Action](https://dev.to/kitops/introducing-the-new-github-action-for-using-kit-cli-on-mlops-pipelines-21ia)) either triggered manually by the model development team when they're ready to send the model to production, or automatically when a model or its artifacts are updated in their respective repositories.

This ensures that:
* __Operations teams have all the assets and information they need__ in order to determine how to test, deploy, audit, and manage these new workloads
Expand Down
17 changes: 8 additions & 9 deletions docs/src/docs/versus.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,28 +6,27 @@ When people first come across KitOps they sometimes wonder, "how is this better

Most teams working on AI projects store, track, and version their assets in one of two ways.

1. Using an MLOps tool
1. Using a combination of git, containers, cloud storage, and Jupyter notebooks
1. Using an [MLOps tool](#kitops-vs-mlops-tools)
1. Using a combination of git, containers, cloud storage, and [Jupyter notebooks](#kitops-vs-jupyter-containers-dataset-storage-and-git)

Neither solution is well suited to tracking and sharing AI project updates across data science, application development, infrastructure, and management teams... and neither is able to work seamlessly with the security, compliance, and efficiency processes organizations have spent decades perfecting.

Let's look at each option in a little more depth.

## KitOps vs. MLOps Tools

First off, it's important to understand that KitOps and its ModelKits don't completely replace the need for MLOps training and experimentation tools like Weights & Biases, MLFlow, or others.
First off, it's important to understand that KitOps and its ModelKits don't replace the need for MLOps training and experimentation tools like Weights & Biases, MLFlow, or others.

However, [ModelKits](./modelkit/intro.md) are a better way to package, version, and share AI project assets outside of the data science team who use MLOps tools everyday.
However, [ModelKits](./modelkit/intro.md) are more secure and flexible way to package, version, and share AI project assets outside of the data science team who use MLOps tools everyday.

Unlike MLOps tools, KitOps:

* Can be stored in the [container registry](https://kitops.ml/docs/modelkit/compatibility.html#compliant-oci-registries) every team already uses
* Fits naturally (and without any changes) into organizations' existing deployment, security, and compliance processes
* Can already be used with *every* software, DevOps, and data science tool
* Uses existing, proven, and compliant registries organizations already depend on for their critical software assets
* Is simple enough for anyone to use, not just data science teams
* Leverages the same structure and syntax engineering teams are familiar with from containers and Kubernetes
* Can already be [used with *every* software, DevOps, and data science tool](./modelkit/compatibility.md)
* Is available as free open source, and openly governed so it protects users and organizations from vendor lock-in
* Is [simple enough](./get-started.md) for anyone to use, not just data science teams
* Is based on standards like OCI, that are vendor agnostic
* Is open source, and openly governed so it protects users and organizations from vendor lock-in
* Built by a community with decades of production operations and compliance experience

## KitOps vs. Jupyter, Containers, Dataset Storage, and Git
Expand Down
39 changes: 23 additions & 16 deletions docs/src/docs/why-kitops.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# Why Use KitOps?

KitOps is the market's only open source, standards-based packaging and versioning system designed for AI/ML projects. Using the OCI standard allows KitOps to be painlessly adopted by any organization using containers and enterprise registries today (see a partial list of [compatible tools](./modelkit/compatibility.md)).

KitOps has been downloaded over 20,000 times in just the last three months. Users often use it as a:

* [Secure and immutable packaging and versioning standard](./modelkit/intro.md) that is [compatible with their existing container registry](https://kitops.ml/docs/modelkit/compatibility.html#compliant-oci-registries)
* Point-of-control between development and production to [enforce consistency in packaging and documentation](./kitfile/kf-overview.md)
* Catalogue of meaningful AI/ML project versions for regulatory compliance or change tracking
* Mechanism to simplify and unify the [creation of containers or Kubernetes deployment YAML](./deploy.md)

> [!NOTE]
> The goal of KitOps is to be a library of versioned packages for your AI project, stored in an enterprise registry you already use.

## The Problem

There is no standard and versioned packaging system for AI/ML projects. Today each part of the project is kept somewhere different:
Expand All @@ -8,26 +20,19 @@ There is no standard and versioned packaging system for AI/ML projects. Today ea
* Configuration in Jupyter notebooks, feature stores, MLOps tools, or ...
* Pipeline definitions in proprietary tools

Jupyter notebooks are great, but extracting the model, datasets, and metadata from one is tricky. Similarly, ML-specific experimentation tools like MLFlow or Weights & Biases are excellent at training, but they save everything in proprietary formats that are confusing for software engineers and SREs.
This makes it difficult to track which versions of code, model, and datasets go together. It makes building containers harder and managing in-production AI/ML projects riskier.

When the only people using AI were data scientists this was annoying but workable. Now there are application teams trying to integrate model versions with their application, testing teams trying to validate models, and DevOps teams trying to deploy and maintain models in production.
Teams that use ModelKits report saving between 12 and 100 hours per AI/ML project iteration. While security and compliance teams appreciate that all AI/ML project assets are packaged together for each version and stored in an already secured and auditable enterprise container registry.

Without unified packaging teams take on risk and give up speed:
* Which dataset version was used to train and validate this model version?
* When did the dataset change? Did that effect my test run?
* Where are the best configuration parameters for the model we're running in production?
* Where did the model come from? Can we trust the source?
* What changes did we make to the model?
Suddenly tough questions like these are easy to answer:

...and if you have to rollback a model deployment in production...good luck. With leaders demanding teams "add AI/ML" to their portfolios, many have fallen into a "throw it over the wall and hope it works" process that adds risk, delay, and frustration to self-hosting models.

This problem is only getting worse and the stakes are rising each day as more and more teams start deploying models to production without proper operational safeguards.
* Where did the model come from? Can we trust the source?
* When did the dataset change? Which models were trained on it?
* Who build and signed off on the model?
* Which model is in production, which is coming, and which has been retired?

## The Solution

> [!NOTE]
> The goal of KitOps is to be a library of versioned packages for your AI project, stored in an enterprise registry you already use.

Kit's ModelKits are the better solution:
* Combine models, datasets, code and all the context teams need to integrate, test, or deploy:
* Training code
Expand All @@ -44,6 +49,8 @@ Use `kit pack` to package up your Jupyter notebook, serialized model, and datase

Then `kit push` it to any OCI-compliant registry, even a private one.

Most people won't need everything, so just `kit unpack` from the remote registry to get just the model, only the datasets, or just the notebook. Or, if you need everything then a `kit pull` will grab everything.
Most people won't need everything, so just `kit unpack` only the layers you need (e.g., only model and datasets, or only code and docs) from the remote registry. Or, if you need everything then a `kit pull` will grab everything.

Finally [package it all up as a container or Kubernetes deployment](./deploy.md).

Check out our [getting started doc](./get-started.md), see the power and flexibility of our [CLI commands](./cli/cli-reference.md), or learn more about packaging your AI/ML project with [ModelKits](./modelkit/intro.md).
Check out our [getting started doc](./get-started.md), see the power and flexibility of our [CLI commands](./cli/cli-reference.md), or learn more about packaging your AI/ML project with [ModelKits](./modelkit/intro.md) and even making them [deployable](./deploy.md).