Skip to content

Conversation

@dcreager
Copy link
Member

@dcreager dcreager commented Jul 17, 2025

This is a follow-on to #19410 that further reduces the memory usage of our reachability constraints. When finishing the building of a use-def map, we walk through all of the "final" states and mark only those reachability constraints as "used". We then throw away the interior TDD nodes of any reachability constraints that weren't marked as used.

(This helps because we build up quite a few intermediate TDD nodes when constructing complex reachability constraints. These nodes can never be accessed if they were only used as an intermediate TDD node. The marking step ensures that we keep any nodes that ended up being referred to in some accessible use-def map state.)

@dcreager dcreager added internal An internal refactor or improvement ty Multi-file analysis & type inference labels Jul 17, 2025
@dcreager dcreager requested review from MichaReiser and removed request for AlexWaygood, carljm and sharkdp July 17, 2025 21:55
@carljm
Copy link
Contributor

carljm commented Jul 17, 2025

Looks like we are somehow failing to mark some nodes as used that we later try to query?

@carljm carljm marked this pull request as draft July 17, 2025 22:10
@dcreager
Copy link
Member Author

Looks like we are somehow failing to mark some nodes as used that we later try to query?

blerp blorp. Glad I added that assertion!

@github-actions
Copy link
Contributor

github-actions bot commented Jul 17, 2025

mypy_primer results

No ecosystem changes detected ✅

Memory usage changes were detected when running on open source projects
trio (https://github.com/python-trio/trio)
- TOTAL MEMORY USAGE: ~176MB
+ TOTAL MEMORY USAGE: ~167MB
-     memo fields = ~138MB
+     memo fields = ~131MB

sphinx (https://github.com/sphinx-doc/sphinx)
- TOTAL MEMORY USAGE: ~301MB
+ TOTAL MEMORY USAGE: ~287MB
-     memo fields = ~236MB
+     memo fields = ~224MB

prefect (https://github.com/PrefectHQ/prefect)
- TOTAL MEMORY USAGE: ~626MB
+ TOTAL MEMORY USAGE: ~596MB
-     memo fields = ~490MB
+     memo fields = ~467MB

@codspeed-hq
Copy link

codspeed-hq bot commented Jul 17, 2025

CodSpeed WallTime Performance Report

Merging #19414 will not alter performance

Comparing dcreager/smoosh-reachability (848c573) with main (dc66019)

Summary

✅ 7 untouched benchmarks

@github-actions
Copy link
Contributor

github-actions bot commented Jul 17, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to parse examples/mcp/databricks_mcp_cookbook.ipynb:12:1:8: Simple statements must be separated by newlines or semicolons

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to parse examples/mcp/databricks_mcp_cookbook.ipynb:12:1:8: Simple statements must be separated by newlines or semicolons

dcreager added 6 commits July 17, 2025 19:33
* main:
  [ty] Use `…` as the "cut" indicator in diagnostic rendering (#19420)
  [ty] synthesize __setattr__ for frozen dataclasses (#19307)
  [ty] Fixed bug in semantic token provider for parameters. (#19418)
  [ty] Shrink reachability constraints (#19410)
  Move JUnit rendering to `ruff_db` (#19370)
@dcreager
Copy link
Member Author

I've fixed all of the test failures here and tweaked how we keep track of the mapping between indexes in the pre-filtered and post-filtered vectors here. The ecosystem memory usage is showing a slight memory increase for some small projects — this is expected since the index mapping adds a slight overhead if there are not a lot of reachability constraints that are thrown away. But @MichaReiser reports that we are seeing a 3× reduction in memory usage for very large projects. I don't know if we have a way to get some of those into CI, but I even without that I consider this PR to be a win.

@dcreager dcreager marked this pull request as ready for review July 18, 2025 14:31
@dcreager
Copy link
Member Author

The ecosystem memory usage is showing a slight memory increase for some small projects

Okay and now I fixed this by shrinking a vector that we were shrinking before

@dcreager
Copy link
Member Author

Somehow requesting Micha's review earlier removed everyone else! I've added you all back not because I specifically want 👀 from everyone, but just in case any of you want to take a look at this. It does have implications on the finish step of the semantic index builder.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. There's a small regression on the instrumented benchmarks but the walltime benchmarks are mostly neutral which makes this a huge win. Thank you!

}
}

impl get_size2::GetSize for RankBitBox {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same as what the automatically derived GetSize implementation gives you

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to use a helper function to get the size of the bits field since bitvec::BitBox doesn't implement GetSize. I found a way to use the derive impl with a size_fn helper for just that one field.


/// Returns the number of bits _before_ (and not including) the given index that are set.
pub(crate) fn rank(&self, index: usize) -> u32 {
let chunk_index = index / CHUNK_SIZE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, too sad that as_raw_slice isn't const. I wonder if it is worth to mark this function and get_bit as #[inline]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will #[inline] be necessary if this is only used within the same crate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still acts as a hint to the compiler. But yes, it's less useful compared to cross crate inlining without LTO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Got it! Added

@MichaReiser
Copy link
Member

I don't know if we have a way to get some of those into CI, but I even without that I consider this PR to be a win.

You can get more detailed numbers locally using TY_MEMORY_REPORT=full

@AlexWaygood AlexWaygood removed their request for review July 21, 2025 13:34
dcreager added 7 commits July 21, 2025 10:35
* main: (25 commits)
  [ty] Sync vendored typeshed stubs (#19461)
  [ty] Extend tuple `__len__` and `__bool__` special casing to also cover tuple subclasses (#19289)
  [ty] bump docstring-adder pin (#19458)
  [ty] Disallow assignment to `Final` class attributes (#19457)
  Update dependency ruff to v0.12.4 (#19442)
  Update pre-commit hook astral-sh/ruff-pre-commit to v0.12.4 (#19443)
  Update rui314/setup-mold digest to 702b190 (#19441)
  Update taiki-e/install-action action to v2.56.19 (#19448)
  Update Rust crate strum_macros to v0.27.2 (#19447)
  Update Rust crate strum to v0.27.2 (#19446)
  Update Rust crate rand to v0.9.2 (#19444)
  Update Rust crate serde_json to v1.0.141 (#19445)
  Fix `unreachable` panic in parser (#19183)
  [`ruff`] Support byte strings (`RUF055`) (#18926)
  [ty] Avoid second lookup for `infer_maybe_standalone_expression` (#19439)
  [ty] Implemented "go to definition" support for import statements (#19428)
  [ty] Avoid secondary tree traversal to get call expression for keyword arguments (#19429)
  [ty] Add goto definition to playground (#19425)
  [ty] Add support for `@warnings.deprecated` (#19376)
  [ty] make `del x` force local resolution of `x` in the current scope (#19389)
  ...
…ability

* origin/main:
  [ty] Expansion of enums into unions of literals (#19382)
  [ty] Avoid rechecking the entire project when changing the opened files (#19463)
  [ty] Add warning for unknown `TY_MEMORY_REPORT` value (#19465)
@dcreager dcreager merged commit 88de572 into main Jul 21, 2025
37 checks passed
@dcreager dcreager deleted the dcreager/smoosh-reachability branch July 21, 2025 18:16
dcreager added a commit that referenced this pull request Jul 22, 2025
* main: (76 commits)
  Move fix suggestion to subdiagnostic (#19464)
  [ty] Implement non-stdlib stub mapping for classes and functions (#19471)
  [ty] Disallow illegal uses of `ClassVar` (#19483)
  [ty] Disallow `Final` in function parameter/return-type annotations (#19480)
  [ty] Extend `Final` test suite (#19476)
  [ty] Minor change to diagnostic message for invalid Literal uses (#19482)
  [ty] Detect illegal non-enum attribute accesses in Literal annotation (#19477)
  [ty] Reduce size of `TypeInference` (#19435)
  Run MD tests for Markdown-only changes (#19479)
  Revert "[ty] Detect illegal non-enum attribute accesses in Literal annotation"
  [ty] Detect illegal non-enum attribute accesses in Literal annotation
  [ty] Added semantic token support for more identifiers (#19473)
  [ty] Make tuple subclass constructors sound (#19469)
  [ty] Pass down specialization to generic dataclass bases (#19472)
  [ty] Garbage-collect reachability constraints (#19414)
  [ty] Implicit instance attributes declared `Final` (#19462)
  [ty] Expansion of enums into unions of literals (#19382)
  [ty] Avoid rechecking the entire project when changing the opened files (#19463)
  [ty] Add warning for unknown `TY_MEMORY_REPORT` value (#19465)
  [ty] Sync vendored typeshed stubs (#19461)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal An internal refactor or improvement ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants