Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for docs layers #386

Merged
merged 2 commits into from
Jul 9, 2024
Merged

Add support for docs layers #386

merged 2 commits into from
Jul 9, 2024

Conversation

amisevsk
Copy link
Contributor

Description

Adds a new field to Kitfiles:

manifestVersion: 1.0.0
package:
  - name: my-modelkit
docs:
  - path: ./path/to/doc
    description: "short description of the doc"

Currently, docs are stored identically to other layers (code, datasets, model) -- i.e. as tarballs with (optional) compression.

Some considerations before merging:

  • Do we want to do something specific with docs to aid retrieval (i.e. gzip single files so that they can be displayed in a browser?)
  • Fields on each docs entry are preliminary -- do we want name/type/etc., or some way to determine what the doc should be used for (e.g. license vs readme vs ???)?
  • If we stick with just path and description, do we want to limit the length of the description?

Linked issues

Closes #84

Add 'docs' top-level field to Kitfile, which is intended to store
documentation for the model. Paths specified under docs are stored as
separate layer types, allowing for them to be unpacked and/or parsed
independently from the rest of the model.
@amisevsk amisevsk requested a review from gorkem June 26, 2024 20:26
@Burhan-Q
Copy link

I think it could make sense to allow for any arbitrary doc "type" to give finer control on what to unpack. Perhaps all project docs would be relevant to some but more process docs would be relevant to others, and instead of requiring to always fully unpack all docs, it could at least allow for only unpacking what's needed. Additionally I wonder if multiple versions for docs may wish to be stored, for instance say a project charter has multiple iterations, it might be desirable to store all of these for record keeping, but only unpack the most recent version.

On the license type, I think the single field for the license in the Kitfile should suffice in most cases, but the ability to single that out for unpacking could make sense.

@amisevsk
Copy link
Contributor Author

@Burhan-Q Thanks for your feedback! I like the idea of versioning docs layers, though it might be slightly tricky on the packing side -- when you run kit pack it really only looks at what's in the current context, so we'd have to figure out a way to grab previous layers as part of that process. Since there's no a priori versioning implied in building a modelkit, I'm not sure how we would say "this is the previous version of this modelkit".

One workaround for that might be to use tags for versioning the modelkit, e.g. tag versions my-modelkit:1.0, my-modelkit:1.1, and so on. In that case, viewing the earlier version of docs would be kit unpack --docs my-modelkit:1.0.

On the unpacking side, @gorkem has some ideas about making the flags for unpack work as filters, so that you could specify something like kit unpack --filter=docs:README.* [...] to grab just the README from a modelkit. An arbitrary type field could work as well here, though.

@gorkem
Copy link
Contributor

gorkem commented Jun 28, 2024

ModelKits themselves are versioned. If only a document changes between two versions of a ModelKit the cost of storage and network for that ModelKit is just the changed layer. The built-in versioning of ModelKits is probably enough for most cases.

@Burhan-Q
Copy link

If I understand correctly, the ModelKit is a self-contained "full-history" of changes, so if someone wished to grab a specific document version, they could do so by calling the unpack for that particular version and only grab docs (if that's all they want/need). Then it makes sense that nothing extra would be needed.

@gorkem
Copy link
Contributor

gorkem commented Jun 28, 2024

Correct. the unpack can extract only filtered artifact types today and we want to add more advanced filtering going forward.

Copy link
Contributor

@gorkem gorkem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good and it works as expected. I think this is a good starting point for this feature. It is missing documentation updates that needs to be completed before we merge it.

@amisevsk
Copy link
Contributor Author

amisevsk commented Jul 9, 2024

Docs changes are in #395

@amisevsk amisevsk merged commit 77f8440 into jozu-ai:main Jul 9, 2024
3 checks passed
@amisevsk amisevsk deleted the docs-layer branch July 9, 2024 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a docs layer to ModelKits
3 participants