Skip to content

Conversation

@MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Jul 26, 2025

Summary

This PR shrinks the size of the members table by ~28% (from 611MB to 438MB on a large project).

This is achieved by changing how we store MemberExpr.

Before

pub(crate) struct MemberExpr {
    symbol_name: Name,
	segments: SmallVec<[MemberSegment; 1]>,
}

pub(crate) enum MemberSegment {
    /// An attribute access, e.g. `.y` in `x.y`
    Attribute(Name),
    /// An integer-based index access, e.g. `[1]` in `x[1]`
    IntSubscript(ast::Int),
    /// A string-based index access, e.g. `["foo"]` in `x["foo"]`
    StringSubscript(String),
}

There are a few issues with this layout: MemberSegment itself is 32 bytes because it stores a String (or int) for each segment. This is an issue for multiple reasons:

  1. We can only store a single segment inline
  2. Even then, the small vec with only a single inline segment accounts for 40 bytes (32 bytes for a single segment + its metadata).
  3. Each heap allocated segment also takes 32 bytes

This PR shrinks the size of MemberExpr and MemberSegment and reduces the cases where heap allocating the segments is necessary.

New

pub(crate) struct MemberExpr {
    /// The entire path as a single Name
    path: Name,
    /// Metadata for each segment (in forward order)
    segments: Segments,
}

The first change is that we change symbol_name to a path. The path stores the entire path of the member expression. For example, a.b[4] becomes ab4. This avoids the need for each segment to store its own Name (or Int)

The second important change is that Segments is now an enum over a small and a heap variant:

#[derive(Clone, Debug, PartialEq, Eq, get_size2::GetSize)]
enum Segments {
    /// Inline storage for up to 7 segments with 6-bit relative offsets (max 63 bytes per segment)
    Small(SmallSegments),
    /// Heap storage for expressions that don't fit inline
    Heap(Box<[SegmentInfo]>),
}

SmallSegments packs up to 7 segments into a single u64 (see docstring for more details).

SegmentInfo packs both an offset and a kind, so that a single segment only takes 4 bytes (it's a u32).

All these improvements together shrink MemberExpr from 64 bytes to 40 bytes

@MichaReiser MichaReiser added the ty Multi-file analysis & type inference label Jul 26, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jul 26, 2025

mypy_primer results

No ecosystem changes detected ✅

Memory usage changes were detected when running on open source projects
trio (https://github.com/python-trio/trio)
- TOTAL MEMORY USAGE: ~167MB
+ TOTAL MEMORY USAGE: ~159MB

sphinx (https://github.com/sphinx-doc/sphinx)
-     memo fields = ~214MB
+     memo fields = ~204MB

@github-actions
Copy link
Contributor

github-actions bot commented Jul 26, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 3, 2025

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

@MichaReiser MichaReiser changed the title [ty] Prototype shrink MemberSize [ty] Reduce size of memory table Aug 3, 2025
@MichaReiser MichaReiser force-pushed the micha/member-size branch 5 times, most recently from 88cd5f2 to 892f555 Compare August 6, 2025 09:36
@MichaReiser MichaReiser marked this pull request as ready for review August 6, 2025 09:46
@AlexWaygood AlexWaygood removed their request for review August 6, 2025 10:35
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@MichaReiser MichaReiser changed the title [ty] Reduce size of memory table [ty] Reduce size of member table Aug 6, 2025
@MichaReiser MichaReiser merged commit b96aa46 into main Aug 7, 2025
37 checks passed
@MichaReiser MichaReiser deleted the micha/member-size branch August 7, 2025 09:16
dcreager added a commit that referenced this pull request Aug 7, 2025
* origin/main:
  [ty] Implemented support for "rename" language server feature (#19551)
  [ty] Reduce size of member table (#19572)
  [ty] Move server capabilities creation (#19798)
  [ty] Repurpose `FunctionType.into_bound_method_type` to return `BoundMethodType` (#19793)
  [ty] Validate writes to `TypedDict` keys (#19782)
  [ty] Add support for using the test command emitted when a mdtest fails (#19794)
dcreager added a commit that referenced this pull request Aug 7, 2025
* dcreager/bound-typevar: (24 commits)
  more comment fix
  this isn't true anymore
  fix inner function in overload implementation ecosystem error
  Update Rust toolchain to 1.89 (#19807)
  [ty] Add `ty.inlayHints.variableTypes` server option (#19780)
  synthetic typevars for type materializations are bound
  [ty] Add failing tests for tuple subclasses (#19803)
  [ty] Add `ty.experimental.rename` server setting (#19800)
  clippy
  make BoundTypeVarInstance interned
  [ty] Implemented support for "rename" language server feature (#19551)
  [ty] Reduce size of member table (#19572)
  [ty] Move server capabilities creation (#19798)
  separate types!
  tmp
  allow KnownInstance::TypeVar in type expressions
  bind typevar in Generic/Protocol base class reference
  [ty] Repurpose `FunctionType.into_bound_method_type` to return `BoundMethodType` (#19793)
  remove unneeded GenericContext::with_binding_context
  create KnownInstanceType::TypeVar for PEP 695 too
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants