Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: (ragKnowledge) Enhance RAG knowledge handling #2351

Merged

Conversation

augchan42
Copy link
Contributor

  • Added support for double-byte characters (e.g., Chinese) to ensure proper processing.
  • Implemented cleanup of deleted knowledge files to maintain data integrity.
  • Enabled loading knowledge from directories, simplifying configuration and management.

These improvements address performance and usability issues, facilitating better support for large knowledge sets.

Closes #2323

Risks

These are changes to ragKnowledge so should be low.

Testing

In your character card, set:
"ragKnowledge": true,

and configure your knowledge folders. Direct files and strings are still supported, below is a sample:

"knowledge": [
        {
            "path": "hexagrams/harvardYenchingHexagrams.md"
        },
        {
            "directory": "shared/warringstates",
            "shared": true
        },
        "The art of war is of vital importance to the state. It is a matter of life and death, a road either to safety or to ruin."
    ],

Then, place these files and directories under eliza/characters/knowledge.

Scenarios to test:

  1. indexing multiple knowledge files just by placing in a directory (no need to list each file)
  2. removing files, and check logs to see that the entries are removed from the knowledge table
  3. changing file contents will also re-trigger indexing on next startup, and re-indexing will be shown in the log

I start my agent like:
pnpm start:debug --character="characters/qwen.character.json" > agent.log 2>&1

This uses the target that shows debug messages from the elizaLogger

Where should a reviewer start?

runtime.ts - initialize() is the main entry point that calls processCharacterRAGKnowledge() and processCharacterRAGDirectory. Both processCharacterRAGDirectory has most of the changes.

ragknowledge.ts - ids are now scoped to private or shared to prevent issues when the same knowledge is moved between shared folder and private folders. double byte characters were stripped on pre-processing so that line was commented out. And embedding of knowledge files was optimized.

The only additions to generation.ts was logging, so no need to focus too much on that.

localembeddingmanager.ts just has embedding logging commented out, as it was too verbose (it was displaying a long vector string in debug mode)

Discord username

hosermage

- Added support for double-byte characters (e.g., Chinese) to ensure proper processing.
- Implemented cleanup of deleted knowledge files to maintain data integrity.
- Enabled loading knowledge from directories, simplifying configuration and management.

These improvements address performance and usability issues, facilitating better support for large knowledge sets.

Closes elizaOS#2323
@augchan42
Copy link
Contributor Author

hey @azep-ninja hope you like this newer version. It maintains the original file knowledge loading, adds recursive directory support, and removes knowledge if files/directories are removed.

@augchan42
Copy link
Contributor Author

I extended sqlite support for this feature, we can add other database plugins as the need arises. This is needed for removing old knowledge entries for files/directories that are removed. This fix adds sqlite support, postgres was supported previously

@odilitime odilitime merged commit 7b13d77 into elizaOS:develop Jan 16, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature - ragKnowledge enhancements (double byte support, caching, load from directories)
2 participants