Skip to content

Conversation

@MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Jun 29, 2025

Summary

The semantic index stores a map from expression to scope because we need to know in which TypeInference (scope) to look up the expression's type. Today, we use a hash map to store the expression-to-scope mapping.

This PR replaces the hash map with an interval map (vector-based) that maps a range of node IDs (expressions) to their corresponding scope. The advantage of an interval map over a hash map is that it reduces memory consumption from O(expressions) to O(~scopes).

The main downside (other than increased complexity) is that the lookup complexity increases from O(1) to O(log(~scopes)). Looking at the benchmark results, the fact that we need to write less data outweighs the slightly slower lookup times.

The instrumented benchmarks show a 1-2% performance improvement. I measured memory consumption on a large project and the overall memory consumption of all semantic indices decreased by about 5%,

Test plan

cargo test

@MichaReiser MichaReiser added internal An internal refactor or improvement ty Multi-file analysis & type inference labels Jun 29, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jun 29, 2025

mypy_primer results

No ecosystem changes detected ✅

Memory usage changes were detected when running on open source projects
flake8 (https://github.com/pycqa/flake8)
-     memo fields = ~66MB
+     memo fields = ~63MB

prefect (https://github.com/PrefectHQ/prefect)
-     memo fields = ~568MB
+     memo fields = ~541MB

@github-actions
Copy link
Contributor

github-actions bot commented Jun 29, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@MichaReiser MichaReiser force-pushed the micha/scopes-by-expression-interval-map branch from 8ac1ed4 to d4ef8d6 Compare June 29, 2025 15:16
Base automatically changed from micha/ast-ids to main July 2, 2025 15:57
@MichaReiser MichaReiser force-pushed the micha/scopes-by-expression-interval-map branch from a55036c to e021a77 Compare July 11, 2025 17:02
@MichaReiser MichaReiser force-pushed the micha/scopes-by-expression-interval-map branch from e021a77 to 8d0af39 Compare July 11, 2025 17:05
@MichaReiser MichaReiser force-pushed the micha/scopes-by-expression-interval-map branch from 8d0af39 to 7bc278e Compare July 11, 2025 17:36
@MichaReiser MichaReiser marked this pull request as ready for review July 12, 2025 16:26
Copy link
Contributor

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice — thank you!

/// Builds an interval-map that matches expressions (by their node index) to their enclosing scopes.
///
/// The interval map is built in a two-step process because the expression ids are assigned in source order,
/// but we visit the expressions in semantic order. Few expressions are registered out of order.
Copy link
Contributor

@sharkdp sharkdp Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that would change with the proposal in #19271?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I don't think it would invalidate the entire approach. Instead, we would have to use a regular sort call in build before building the interval map (and Rust's sorting claims to be pretty good at sorting mostly sorted data)

@MichaReiser MichaReiser merged commit 3560f86 into main Jul 14, 2025
37 checks passed
@MichaReiser MichaReiser deleted the micha/scopes-by-expression-interval-map branch July 14, 2025 11:51
dcreager added a commit that referenced this pull request Jul 14, 2025
* dcreager/merge-arguments: (223 commits)
  fix docs
  Combine CallArguments and CallArgumentTypes
  [ty] Sync vendored typeshed stubs (#19334)
  [`refurb`] Make example error out-of-the-box (`FURB122`) (#19297)
  [refurb] Make example error out-of-the-box (FURB177) (#19309)
  [ty] ignore errors when reformatting codemodded typeshed (#19332)
  [ty] Provide docstrings for stdlib APIs when hovering over them in an IDE (#19311)
  [ty] Add virtual files to the only project database (#19322)
  Add t-string fixtures for rules that do not need to be modified (#19146)
  [ty] Remove `FileLookupError` (#19323)
  [ty] Fix handling of metaclasses in `object.<CURSOR>` completions
  [ty] Use an interval map for scopes by expression (#19025)
  [ty] List all `enum` members (#19283)
  [ty] Handle configuration errors in LSP more gracefully (#19262)
  [ty] Use python version and path from Python extension (#19012)
  [`pep8_naming`] Avoid false positives on standard library functions with uppercase names (`N802`) (#18907)
  Update Rust crate toml to 0.9.0 (#19320)
  [ty] Fix server version (#19284)
  Update NPM Development dependencies (#19319)
  Update taiki-e/install-action action to v2.56.13 (#19317)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal An internal refactor or improvement ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants