Skip to content

Commit

Permalink
Update documentation to include references to docs layers.
Browse files Browse the repository at this point in the history
  • Loading branch information
amisevsk committed Jul 11, 2024
1 parent e555301 commit 1653de5
Show file tree
Hide file tree
Showing 4 changed files with 52 additions and 53 deletions.
5 changes: 4 additions & 1 deletion docs/src/docs/kitfile/kf-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ Crafted with simplicity and efficiency in mind, the Kitfile organizes project de

**Model Specifications:** Insights into the models themselves, including framework details, training parameters, and validation metrics, to foster understanding and further development.

**Documentation:** Conveniently separated documentation files, to make getting started faster.

## Designed for Collaboration

By encapsulating the essence of your AI/ML project into a singular, version-controlled document, the Kitfile not only simplifies the packaging process but also enhances collaborative efforts. Whether you're sharing projects within your team or with the global AI/ML community, the Kitfile ensures that every artifact, from datasets to models, is accurately represented and easily accessible.
Expand All @@ -35,6 +37,7 @@ There are four main parts to a Kitfile:
1. Path to the Jupyter notebook folder in the `code` section
1. Path to the serialized model in the `model` section
1. Path to the datasets in the `datasets` section (you can have multiple datasets in the same page)
1. Paths to documentation in the `docs` section

Here's an example Kitfile:

Expand Down Expand Up @@ -71,7 +74,7 @@ datasets:
The only mandatory parts of the Kitfile are:
* `manifestVersion`
* At least one of `code`, `model`, `or datasets` sections
* At least one of `code`, `model`, `docs` or `datasets` sections

A ModelKit can only contain one model, but multiple datasets or code bases are allowed. Also note that you can only use relative paths (no absolute paths) in your Kitfile. Right now you can only build ModelKits from files on your local system...but don't worry we're already working towards allowing you to reference remote files. For example, building a ModelKit from a local notebook and model, but a dataset hosted on DvC, S3, or anywhere else.

Expand Down
59 changes: 33 additions & 26 deletions docs/src/docs/next-steps.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,19 @@ If you need a quick way to sign a ModelKit you can follow the same instructions

## Making your own Kitfile

A Kitfile is the configuration document for your ModelKit. It's written in YAML so it's easy to read. There are four main parts to a Kitfile:
A Kitfile is the configuration document for your ModelKit. It's written in YAML so it's easy to read. There are five main parts to a Kitfile:

1. The `package` section: Metadata about the ModelKit, including the author, description, and license
1. The `code` section: Path and information about codebases related to the project, including Jupyter notebook folders
1. The `datasets` section: Path and information on included datasets
1. The `model` section: Path and information on the serialized model
1. The `model` section: Information about the serialized model
1. The `docs` section: Information about documentation for the ModelKit
1. The `code` section: Information about codebases related to the project, including Jupyter notebook folders
1. The `datasets` section: Information on included datasets

A Kitfile only needs the `package` section, plus one of the other sections.
A Kitfile only needs the `package` section, plus one or more of the other sections.

The `package` and `model` sections can only contain a single model (you can chain models by using multiple ModelKits).
The `model` section can only contain a single model (you can chain models by using multiple ModelKits).

The `datasets`, `code`, and `name` sections are lists, so each entry must start with a dash. The dash is required even if you are only packaging a single item of that type.
The `datasets`, `code`, and `docs` sections are lists, so each entry must start with a dash. The dash is required even if you are only packaging a single item of that type.

Here's a snippet of a KitFile that contains two datasets, notice that each starts with "-":

Expand All @@ -53,37 +54,43 @@ Any relative paths defined within the Kitfile are interpreted as being relative

### Kitfile Examples

Here's a complete Kitfile example, with a model, two datasets, and a codebase:
Here's a complete Kitfile example, with a model, documentation, two datasets, and a codebase:

```yaml
manifestVersion: 1.0
package:
authors:
- Jozu
- Jozu
description: Small language model based on Mistral-7B fine tuned for answering film photography questions.
license: Apache-2.0
name: FilmSLM
model:
name: FilmSLM
description: Film photography Q&A model using Mistral-7B
framework: Mistral-7B
license: Apache-2.0
name: FilmSLM
path: ./models/film_slm:champion
version: 1.2.6
docs:
- path: ./README.md
description: Readme file for this ModelKit
- path: ./USAGE.md
description: Information on how to use this model for inference
datasets:
- description: Forum postings from sites like rangefinderforum, DPreview, PhotographyTalk, and r/AnalogCommunity
name: training data
path: ./data/forum-to-2023-train.csv
- description: validation data
name: validation data
path: ./data/test.csv
- description: Forum postings from sites like rangefinderforum, DPreview, PhotographyTalk, and r/AnalogCommunity
name: training data
path: ./data/forum-to-2023-train.csv
- description: validation data
name: validation data
path: ./data/test.csv
code:
- description: Jupyter notebook with model training code in Python
path: ./notebooks
- description: Jupyter notebook with model training code in Python
path: ./notebooks
```

A minimal ModelKit for distributing a pair of datasets looks like this:
Expand All @@ -93,15 +100,15 @@ manifestVersion: v1.0.0
package:
authors:
- Jozu
- Jozu
datasets:
- name: training data
path: ./data/train.csv
license: Apache-2.0
- description: validation data
name: validation data
path: ./data/validate.csv
- name: training data
path: ./data/train.csv
license: Apache-2.0
- description: validation data
name: validation data
path: ./data/validate.csv
```

More information on Kitfiles can be found in the [Overview](./kitfile/kf-overview.md) and [Format](./kitfile/format.md) documentation.
Expand Down Expand Up @@ -149,6 +156,7 @@ Models and their datasets can be very large and take a long time to push or pull

`unpack` can take arguments for partial unpacking of a ModelKit:
* `--model` to unpack only the model to the destination file system
* `--docs` to unpack only the documentation to the destination file system
* `--datasets` to unpack only the datasets to the destination file system
* `--code` to unpack only the code bases to the destination file system
* `--config` to unpack only the Kitfile to the destination file system
Expand All @@ -171,7 +179,6 @@ The `unpack` command is part of the typical push and pull commands:

For any ModelKit in your local or remote registry you can use the [info command](./cli/cli-reference.md#kit-info) to easily read the Kitfile without pulling or unpacking it. This is a great way to understand what's in a ModelKit you might be interested in without needing to execute the more time-consuming unpack/pull commands.


```sh
kit info mymodel:challenger
```
Expand Down
32 changes: 10 additions & 22 deletions pkg/artifact/kitfile.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ The Kitfile manifest for AI/ML is a YAML file designed to encapsulate all the ne

## Overview

The manifest is structured into several key sections: `version`, `package`,`code`, `datasets` and `model`. Each section serves a specific purpose in describing the AI/ML package components and requirements.
The manifest is structured into several key sections: `manifestVersion`, `package`, `code`, `datasets`, `docs`, and `model`. Each section serves a specific purpose in describing the AI/ML package components and requirements.

### `ManifestVersion`
### `manifestVersion`

- **Description**: Specifies the manifest format version.
- **Type**: String
- **Example**: `1.0`
- **Example**: `1.0.0`

### `package`

Expand Down Expand Up @@ -54,7 +54,13 @@ This section provides general information about the AI/ML project.
- `path`: Location of the dataset file or directory relative to the context.
- `description`: Overview of the dataset.
- `license`: SPDX license identifier for the dataset.
- `preprocessing`: Reference to preprocessing steps.

#### `docs`

- **Description**: Information about included documentation for the model
- **Type**: Object Array
- `description`: Description of the documentation
- `path`: Location of the documentation relative to the context

#### `model`

Expand All @@ -70,12 +76,6 @@ This section provides general information about the AI/ML project.
- `name`: Identifier for the part
- `path`: Location of the file or a directory relative to the context
- `type`: The type of the part (e.g. LoRA weights)
- `training`:
- `dataset`: Name of the dataset
- `parameters`: name value pairs
- `validation`:
- `dataset`: Name of the dataset
- `metrics`: name value pairs


## Example
Expand All @@ -97,24 +97,12 @@ datasets:
path: data/dataset.csv
description: Description of the dataset.
license: CC-BY-4.0
preprocessing: Preprocessing steps.
model:
name: ModelName
path: models/model.h5
framework: TensorFlow
version: 1.0
description: Model description.
license: Apache-2.0
training:
dataset: DatasetName
parameters:
learning_rate: 0.001
epochs: 100
batch_size: 32
validation:
- dataset: DatasetName
metrics:
accuracy: 0.95
f1_score: 0.94
```
9 changes: 5 additions & 4 deletions pkg/artifact/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,10 @@ A **ModelKit** represents a comprehensive bundle of AI/ML artifacts, including m
**Artifacts:** The building blocks of a ModelKit. Artifacts can be models, datasets, or code, each stored and addressed individually. This modular approach facilitates direct access via tools. Artifact metadata is encapsulated within the kitfile, ensuring comprehensive documentation of each component.

The artifacts and their media types are
* Serialized Model: `application/vnd.kitops.modelkit.model.v1.tar+gzip`
* Datasets: `application/vnd.kitops.modelkit.dataset.v1.tar+gzip`
* Code: `application/vnd.kitops.modelkit.code.v1.tar+gzip`
* Serialized Model: `application/vnd.kitops.modelkit.model.v1.tar`
* Datasets: `application/vnd.kitops.modelkit.dataset.v1.tar`
* Code: `application/vnd.kitops.modelkit.code.v1.tar`
* Docs: `application/vnd.kitops.modelkit.docs.v1.tar`

**ModelKit File (Kitfile)** Acts as a record detailing the properties, relationships, and intended uses of the included artifacts. The Kitfile is central to understanding the structure and purpose of a ModelKit. It adopts the `application/vnd.kitops.modelkit.config.v1+json` media type for easy access and interpretation by tools.See the seperate kitfile specification on details

Expand Down Expand Up @@ -46,4 +47,4 @@ Example of a ModelKit manifest with a single serialized model and kitfile.
}
```

`size` is listed in bytes.
`size` is listed in bytes.

0 comments on commit 1653de5

Please sign in to comment.