Skip to content
This repository has been archived by the owner on Oct 30, 2024. It is now read-only.

feat: ingestion - include metadata from .knowledge.json on dir level #124

Merged
merged 4 commits into from
Sep 18, 2024

Conversation

iwilltry42
Copy link
Collaborator

@iwilltry42 iwilltry42 commented Sep 16, 2024

Ref #118

When ingesting a file or directory (recursively or not), we're now checking if there is a .knowledge.json file present in the directory. It's structured like this:

{
  "metadata": {
    "foo.pdf": {
      "baz": "bom",
      "foo": "bar"
    },
    "somedir/bar.pdf": {
      "x": "y"
    }
  }
}

This will add the defined k/v pairs as metadata to the documents in the vector store.
.knowledge.json files in nested directories will be merged (with override) with parent metadata files.

Notes

  1. I went with .knowledge.json instead of .metadata.json because I felt like the latter could be too "common" and we'd run into conflicts. By default, we're including hidden files in the ingestion process, so .knowledge.json is not explicitly being ignored.
  2. It's JSON with an explicit metadata entry so we can add additional fields for new features in the future, e.g. directory content descriptions, etc. which can be merged with dataset metadata for routing retrieval

pkg/client/metadata.go Outdated Show resolved Hide resolved
@iwilltry42 iwilltry42 merged commit 3785f12 into main Sep 18, 2024
1 check passed
@iwilltry42 iwilltry42 deleted the knowledge-118 branch September 18, 2024 19:32
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants