Skip to content

Conversation

@ibraheemdev
Copy link
Member

@ibraheemdev ibraheemdev commented Jun 24, 2025

Summary

Setting TY_MEMORY_REPORT=full will generate and print a memory usage report to the CLI after a ty check run:

=======SALSA STRUCTS=======
`Definition`                                       metadata=7.24MB   fields=17.38MB  count=181062
`Expression`                                       metadata=4.45MB   fields=5.94MB   count=92804
`member_lookup_with_policy_::interned_arguments`   metadata=1.97MB   fields=2.25MB   count=35176
...
=======SALSA QUERIES=======
`File -> ty_python_semantic::semantic_index::SemanticIndex`
    metadata=11.46MB  fields=88.86MB  count=1638
`Definition -> ty_python_semantic::types::infer::TypeInference`
    metadata=24.52MB  fields=86.68MB  count=146018
`File -> ruff_db::parsed::ParsedModule`
    metadata=0.12MB   fields=69.06MB  count=1642
...
=======SALSA SUMMARY=======
TOTAL MEMORY USAGE: 577.61MB
    struct metadata = 29.00MB
    struct fields = 35.68MB
    memo metadata = 103.87MB
    memo fields = 409.06MB

Eventually, we should integrate these numbers into CI in some form. The one limitation currently is that heap allocations in salsa structs (e.g. interned values) are not tracked, but memoized values should have full coverage. We may also want a peak memory usage counter (that accounts for non-salsa memory), but that is relatively simple to profile manually (e.g. time -v ty check) and would require a compile-time option to avoid runtime overhead.

Depends on salsa-rs/salsa#925.

@ibraheemdev ibraheemdev requested a review from carljm as a code owner June 24, 2025 23:53
@ibraheemdev ibraheemdev added the internal An internal refactor or improvement label Jun 24, 2025
@ibraheemdev ibraheemdev requested a review from AlexWaygood as a code owner June 24, 2025 23:53
@ibraheemdev ibraheemdev added the ty Multi-file analysis & type inference label Jun 24, 2025
@ibraheemdev ibraheemdev force-pushed the ibraheem/memory-usage-dump branch from 3582d8c to bc95677 Compare June 24, 2025 23:55
@github-actions
Copy link
Contributor

github-actions bot commented Jun 24, 2025

mypy_primer results

No ecosystem changes detected ✅

@github-actions
Copy link
Contributor

github-actions bot commented Jun 25, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great

I've a few smaller nits. The only downside of the design is that it's very easy to forget the heap_size attribute on a salsa query which will result in under counting. That makes me wonder if we should change the design in salsa so that the stack and heap_size is reported separately for each query (we can show a total as well) and the heap_size would be Unknown if the heap_size attribute isn't set. This would make it more appearant where heap_size attributes are missing (compared to, ah, this query doesn't allocate much)

Comment on lines 147 to 151
if std::env::var("TY_MEMORY_REPORT").as_deref() == Ok("1") {
salsa_memory_dump(&db);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if:

  • We would print a condenced memory report when running with -vv
  • We would print the full memory report when running with -vvv

You can get the verbosity from args.verbosity.

I would probably skip the environment variable for now. If we don't, make sure to add it here https://github.com/astral-sh/ty/blob/main/docs/reference/env.md

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having to run -vvv is a little difficult because of the amount of tracing logs you have to wait for :) I kept the environment variable but added short and full options. We might eventually have to add an option for mypy primer that keeps the diff less sensitive to minor changes.

@AlexWaygood AlexWaygood removed their request for review June 25, 2025 08:04
@ibraheemdev ibraheemdev force-pushed the ibraheem/memory-usage-dump branch 4 times, most recently from 007f92f to 23844f1 Compare June 26, 2025 18:45
@ibraheemdev ibraheemdev enabled auto-merge (squash) June 26, 2025 18:46
@ibraheemdev ibraheemdev force-pushed the ibraheem/memory-usage-dump branch from 23844f1 to c59855c Compare June 26, 2025 21:24
@ibraheemdev ibraheemdev merged commit 6f7b1c9 into main Jun 26, 2025
35 checks passed
@ibraheemdev ibraheemdev deleted the ibraheem/memory-usage-dump branch June 26, 2025 21:27
dcreager added a commit that referenced this pull request Jun 27, 2025
* main:
  [ty] Add builtins to completions derived from scope (#18982)
  [ty] Don't add incorrect subdiagnostic for unresolved reference (#18487)
  [ty] Simplify `KnownClass::check_call()` and `KnownFunction::check_call()` (#18981)
  [ty] Add micro-benchmark for #711 (#18979)
  [`flake8-annotations`] Make `ANN401` example error out-of-the-box (#18974)
  [`flake8-async`] Make `ASYNC110` example error out-of-the-box (#18975)
  [pandas]: Fix issue on `non pandas` dataframe `in-place` usage (PD002) (#18963)
  [`pylint`] Fix `PLC0415` example (#18970)
  [ty] Add environment variable to dump Salsa memory usage stats (#18928)
  [`pylint`] Fix `PLW0108` autofix introducing a syntax error when the lambda's body contains an assignment expression (#18678)
  Bump 0.12.1 (#18969)
  [`FastAPI`] Add fix safety section to `FAST002` (#18940)
  [ty] Add regression test for leading tab mis-alignment in diagnostic rendering (#18965)
  [ty] Resolve python environment in `Options::to_program_settings` (#18960)
  [`ruff`] Fix false positives and negatives in `RUF010` (#18690)
  [ty] Fix rendering of long lines that are indented with tabs
  [ty] Add regression test for diagnostic rendering panic
  [ty] Move venv and conda env discovery to `SearchPath::from_settings` (#18938)
ibraheemdev added a commit that referenced this pull request Jun 28, 2025
## Summary

Print the [new salsa memory usage
dumps](#18928) in mypy primer CI
runs to help us catch memory regressions. The numbers are rounded to the
nearest power of 1.1 (about a 5% threshold between buckets) to avoid overly sensitive diffs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal An internal refactor or improvement ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants