Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid full vault scan on incremental indexing #1148

Merged
merged 2 commits into from
Feb 3, 2025

Conversation

zeroliu
Copy link
Collaborator

@zeroliu zeroliu commented Feb 2, 2025

Previously, we added an incremental indexing feature for copilot plus users. On any note update, It generates the excluded file list and checks whether the current file should be updated. This process can take 5 seconds for large vaults and causes the plugin to be unusable in those extreme scenarios. This PR fixes the issue by stopping the generation of included/excluded file lists and matching the current file path against the matching patterns.

  • Added a new shouldIndexFile method that checks whether a TFile should be indexed. It consolidates all the custom validation logic that checks inclusions and exclusions.
  • Updated the pattern-matching logic to use native obsidian API when possible.

@zeroliu zeroliu force-pushed the zero/improve-indexing branch 6 times, most recently from 1201034 to df7bb23 Compare February 2, 2025 04:54
@zeroliu zeroliu requested a review from logancyang February 2, 2025 06:04
@zeroliu zeroliu marked this pull request as ready for review February 2, 2025 06:04
@logancyang
Copy link
Owner

Thanks for fixing this! How did you test it?

@zeroliu
Copy link
Collaborator Author

zeroliu commented Feb 2, 2025

In terms of functionality, I tested inclusion and exclusion with tags, file extension, and single file match with [[]]. I observe whether shouldIndexFile returns true for the matching file path. I also tested different patterns with a full vault index and verified that the number of files indexed was correct.

In terms of performance, this part gets confusing. I created a vault with 40000 files. I was hoping to see the 5-second delay as described in the issue, but I couldn't reproduce it. As you can see in my screenshot, even after I added ,,,,,,,,,,, as exclusion patterns that create a number of excluded strings. It still only takes ~40ms to run the file change handler. There must be a scenario that the user is in, but I can't reproduce it. My change shows that the change handler is now consistently faster, so this is still a good PR to merge. I hope it will resolve the reported performance issue, too.

Before:
Screenshot 2025-02-01 at 10 20 10 PM

After:
Screenshot 2025-02-01 at 10 14 53 PM

@zeroliu zeroliu force-pushed the zero/improve-indexing branch from df7bb23 to ddb3a63 Compare February 2, 2025 06:28
src/utils.ts Outdated
* @param vault - The vault to get tags from.
* @returns An array of lowercase tags without the hash symbol.
*/
export function getTagsFromNote(file: TFile): string[] {
Copy link
Owner

@logancyang logancyang Feb 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now returning both inline and frontmatter tags, while before it only returns frontmatter tags.

The reason I only get frontmatter tags is to

  1. Limit the number of notes from tags for custom prompt templating {#tag1, #tags2...} and avoiding overcrowding the context window
  2. Have a differentiation between frontmatter tags and inline tags. Consider frontmatter tags a "note-level" property and inline tags as "line/block-level" property.

So this change will affect not only index filtering but also custom prompt processing. Currently, inline tags are used via salient terms during vault search.

Should we put a parameter here to control what tags to return? Such as frontmatterOnly = true

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

getNotesFromPath,
getNotesFromTags,
isFolderMatch,
isPathInList,
processVariableNameForNotePath,
} from "../src/utils";

// Mock Obsidian's TFile class
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to mock these because of the usage of app.vault.metadataCache

src/utils.ts Outdated
* @param vault - The vault to get tags from.
* @returns An array of lowercase tags without the hash symbol.
*/
export function getTagsFromNote(file: TFile): string[] {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@logancyang logancyang merged commit ef3b000 into logancyang:master Feb 3, 2025
2 checks passed
@logancyang logancyang mentioned this pull request Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants