Update documentation to include references to docs layers.

jozu-ai · Jul 11, 2024 · 1653de5 · 1653de5
1 parent e555301
commit 1653de5
Show file tree

Hide file tree

Showing 4 changed files with 52 additions and 53 deletions.
diff --git a/docs/src/docs/kitfile/kf-overview.md b/docs/src/docs/kitfile/kf-overview.md
@@ -16,6 +16,8 @@ Crafted with simplicity and efficiency in mind, the Kitfile organizes project de
 
 **Model Specifications:** Insights into the models themselves, including framework details, training parameters, and validation metrics, to foster understanding and further development.
 
+**Documentation:** Conveniently separated documentation files, to make getting started faster.
+
 ## Designed for Collaboration
 
 By encapsulating the essence of your AI/ML project into a singular, version-controlled document, the Kitfile not only simplifies the packaging process but also enhances collaborative efforts. Whether you're sharing projects within your team or with the global AI/ML community, the Kitfile ensures that every artifact, from datasets to models, is accurately represented and easily accessible.
@@ -35,6 +37,7 @@ There are four main parts to a Kitfile:
 1. Path to the Jupyter notebook folder in the `code` section
 1. Path to the serialized model in the `model` section
 1. Path to the datasets in the `datasets` section (you can have multiple datasets in the same page)
+1. Paths to documentation in the `docs` section
 
 Here's an example Kitfile:
 
@@ -71,7 +74,7 @@ datasets:
 
 The only mandatory parts of the Kitfile are:
 * `manifestVersion`
-* At least one of `code`, `model`, `or datasets` sections
+* At least one of `code`, `model`, `docs` or `datasets` sections
 
 A ModelKit can only contain one model, but multiple datasets or code bases are allowed. Also note that you can only use relative paths (no absolute paths) in your Kitfile. Right now you can only build ModelKits from files on your local system...but don't worry we're already working towards allowing you to reference remote files. For example, building a ModelKit from a local notebook and model, but a dataset hosted on DvC, S3, or anywhere else.
 

diff --git a/docs/src/docs/next-steps.md b/docs/src/docs/next-steps.md
@@ -16,18 +16,19 @@ If you need a quick way to sign a ModelKit you can follow the same instructions
 
 ## Making your own Kitfile
 
-A Kitfile is the configuration document for your ModelKit. It's written in YAML so it's easy to read. There are four main parts to a Kitfile:
+A Kitfile is the configuration document for your ModelKit. It's written in YAML so it's easy to read. There are five main parts to a Kitfile:
 
 1. The `package` section: Metadata about the ModelKit, including the author, description, and license
-1. The `code` section: Path and information about codebases related to the project, including Jupyter notebook folders
-1. The `datasets` section: Path and information on included datasets
-1. The `model` section: Path and information on the serialized model
+1. The `model` section: Information about the serialized model
+1. The `docs` section: Information about documentation for the ModelKit
+1. The `code` section: Information about codebases related to the project, including Jupyter notebook folders
+1. The `datasets` section: Information on included datasets
 
-A Kitfile only needs the `package` section, plus one of the other sections.
+A Kitfile only needs the `package` section, plus one or more of the other sections.
 
-The `package` and `model` sections can only contain a single model (you can chain models by using multiple ModelKits).
+The `model` section can only contain a single model (you can chain models by using multiple ModelKits).
 
-The `datasets`, `code`, and `name` sections are lists, so each entry must start with a dash. The dash is required even if you are only packaging a single item of that type.
+The `datasets`, `code`, and `docs` sections are lists, so each entry must start with a dash. The dash is required even if you are only packaging a single item of that type.
 
 Here's a snippet of a KitFile that contains two datasets, notice that each starts with "-":
 
@@ -53,37 +54,43 @@ Any relative paths defined within the Kitfile are interpreted as being relative
 
 ### Kitfile Examples
 
-Here's a complete Kitfile example, with a model, two datasets, and a codebase:
+Here's a complete Kitfile example, with a model, documentation, two datasets, and a codebase:
 
 ```yaml
 manifestVersion: 1.0
 
 package:
   authors:
-  - Jozu
+    - Jozu
   description: Small language model based on Mistral-7B fine tuned for answering film photography questions.
   license: Apache-2.0
   name: FilmSLM
 
 model:
+  name: FilmSLM
   description: Film photography Q&A model using Mistral-7B
   framework: Mistral-7B
   license: Apache-2.0
-  name: FilmSLM
   path: ./models/film_slm:champion
   version: 1.2.6
 
+docs:
+  - path: ./README.md
+    description: Readme file for this ModelKit
+  - path: ./USAGE.md
+    description: Information on how to use this model for inference
+
 datasets:
-- description: Forum postings from sites like rangefinderforum, DPreview, PhotographyTalk, and r/AnalogCommunity
-  name: training data
-  path: ./data/forum-to-2023-train.csv
-- description: validation data
-  name: validation data
-  path: ./data/test.csv
+  - description: Forum postings from sites like rangefinderforum, DPreview, PhotographyTalk, and r/AnalogCommunity
+    name: training data
+    path: ./data/forum-to-2023-train.csv
+  - description: validation data
+    name: validation data
+    path: ./data/test.csv
 
 code:
-- description: Jupyter notebook with model training code in Python
-  path: ./notebooks
+  - description: Jupyter notebook with model training code in Python
+    path: ./notebooks
 ```
 
 A minimal ModelKit for distributing a pair of datasets looks like this:
@@ -93,15 +100,15 @@ manifestVersion: v1.0.0
 
 package:
   authors:
-  - Jozu
+    - Jozu
 
 datasets:
-- name: training data
-  path: ./data/train.csv
-  license: Apache-2.0
-- description: validation data
-  name: validation data
-  path: ./data/validate.csv
+  - name: training data
+    path: ./data/train.csv
+    license: Apache-2.0
+  - description: validation data
+    name: validation data
+    path: ./data/validate.csv
 ```
 
 More information on Kitfiles can be found in the [Overview](./kitfile/kf-overview.md) and [Format](./kitfile/format.md) documentation.
@@ -149,6 +156,7 @@ Models and their datasets can be very large and take a long time to push or pull
 
 `unpack` can take arguments for partial unpacking of a ModelKit:
 * `--model` to unpack only the model to the destination file system
+* `--docs` to unpack only the documentation to the destination file system
 * `--datasets` to unpack only the datasets to the destination file system
 * `--code` to unpack only the code bases to the destination file system
 * `--config` to unpack only the Kitfile to the destination file system
@@ -171,7 +179,6 @@ The `unpack` command is part of the typical push and pull commands:
 
 For any ModelKit in your local or remote registry you can use the [info command](./cli/cli-reference.md#kit-info) to easily read the Kitfile without pulling or unpacking it. This is a great way to understand what's in a ModelKit you might be interested in without needing to execute the more time-consuming unpack/pull commands.
 
-
 ```sh
 kit info mymodel:challenger
 ```

diff --git a/pkg/artifact/kitfile.md b/pkg/artifact/kitfile.md
@@ -4,13 +4,13 @@ The Kitfile manifest for AI/ML is a YAML file designed to encapsulate all the ne
 
 ## Overview
 
-The manifest is structured into several key sections: `version`, `package`,`code`, `datasets` and `model`. Each section serves a specific purpose in describing the AI/ML package components and requirements.
+The manifest is structured into several key sections: `manifestVersion`, `package`, `code`, `datasets`, `docs`, and `model`. Each section serves a specific purpose in describing the AI/ML package components and requirements.
 
-### `ManifestVersion`
+### `manifestVersion`
 
 - **Description**: Specifies the manifest format version.
 - **Type**: String
-- **Example**: `1.0`
+- **Example**: `1.0.0`
 
 ### `package`
 
@@ -54,7 +54,13 @@ This section provides general information about the AI/ML project.
   - `path`: Location of the dataset file or directory relative to the context.
   - `description`: Overview of the dataset.
   - `license`: SPDX license identifier for the dataset.
-  - `preprocessing`: Reference to preprocessing steps.
+
+#### `docs`
+
+- **Description**: Information about included documentation for the model
+- **Type**: Object Array
+ - `description`: Description of the documentation
+ - `path`: Location of the documentation relative to the context
 
 #### `model`
 
@@ -70,12 +76,6 @@ This section provides general information about the AI/ML project.
     - `name`: Identifier for the part
     - `path`: Location of the file or a directory relative to the context
     - `type`: The type of the part (e.g. LoRA weights)
-  - `training`:
-    - `dataset`: Name of the dataset
-    - `parameters`: name value pairs
-  - `validation`:
-    - `dataset`: Name of the dataset
-    - `metrics`: name value pairs
 
 
 ## Example
@@ -97,24 +97,12 @@ datasets:
     path: data/dataset.csv
     description: Description of the dataset.
     license: CC-BY-4.0
-    preprocessing: Preprocessing steps.
 model:
     name: ModelName
     path: models/model.h5
     framework: TensorFlow
     version: 1.0
     description: Model description.
     license: Apache-2.0
-    training:
-      dataset: DatasetName
-      parameters:
-        learning_rate: 0.001
-        epochs: 100
-        batch_size: 32
-    validation:
-      - dataset: DatasetName
-        metrics:
-          accuracy: 0.95
-          f1_score: 0.94
 ```
 
diff --git a/pkg/artifact/spec.md b/pkg/artifact/spec.md
@@ -7,9 +7,10 @@ A **ModelKit** represents a comprehensive bundle of AI/ML artifacts, including m
 **Artifacts:** The building blocks of a ModelKit. Artifacts can be models, datasets, or code, each stored and addressed individually. This modular approach facilitates direct access via tools. Artifact metadata is encapsulated within the kitfile, ensuring comprehensive documentation of each component.
 
 The artifacts and their media types are
-* Serialized Model: `application/vnd.kitops.modelkit.model.v1.tar+gzip`
-* Datasets:  `application/vnd.kitops.modelkit.dataset.v1.tar+gzip`
-* Code: `application/vnd.kitops.modelkit.code.v1.tar+gzip`
+* Serialized Model: `application/vnd.kitops.modelkit.model.v1.tar`
+* Datasets:  `application/vnd.kitops.modelkit.dataset.v1.tar`
+* Code: `application/vnd.kitops.modelkit.code.v1.tar`
+* Docs: `application/vnd.kitops.modelkit.docs.v1.tar`
 
 **ModelKit File (Kitfile)** Acts as a record detailing the properties, relationships, and intended uses of the included artifacts. The Kitfile is central to understanding the structure and purpose of a ModelKit. It adopts the `application/vnd.kitops.modelkit.config.v1+json` media type for easy access and interpretation by tools.See the seperate kitfile specification on details
 
@@ -46,4 +47,4 @@ Example of a ModelKit manifest with a single serialized model and kitfile.
 }
 ```
 
-`size` is listed in bytes.
+`size` is listed in bytes.