indexer/walker: Avoid running jobs where not needed #1006
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #1002
Depends on #1027
Background
The removal of de-duplication upon job enqueuing is necessary to avoid various bugs and race conditions where we'd deduplicate jobs which appear to be same, but have different
Defer
orDependsOn
, i.e. we'd be overly aggressive in the deduplication efforts.It is worth clarifying here that this PR doesn't affect jobs which run as part of text synchronization - i.e.
textDocument/didOpen
,textDocument/didChange
orworkspace/didChangeWatchedFiles
. It only aims to reduce the duplicated work as part of the walker (which is triggered byinitialize
), which would occur if the user has deep workspace which takes a while to index via walker, and opens a few (yet unindexed) modules, which get indexed as a priority.For
textDocument/didChange
andworkspace/didChangeWatchedFiles
it is expected that the jobs do need to run since we know something has changed. We could attempt to do some de-duplication there as well, but it's probably going to have low impact.For
textDocument/didOpen
- we treat any opened document as a change currently, because we have no way to tell whether it matches the contents on disk. This is IMO a common case, where user would start without any open files, walker indexes everything, and then they open any module and we re-index the whole module again, and we do it again for every single file they open. Reducing duplicated work here is just little more involved, so I filed a separate ticket for that: #1031Benchmarks
I am (finally) updating the benchmarks as part of the PR, which may suggest that this PR has negative performance impact, but it's little more complicated as we're really reflecting a few PRs, each contributing to those numbers in slightly different way:
terraform providers schema -json
when it's not necessary. Previously we'd run it on every indexed & initialized module.required_version
constraint #1027 removed 1 job (terraform version
) entirely from the walker indexing. It now runs only on the text synchronization methods.