Skip to content

Conversation

@yossivainshtein
Copy link

What does this PR do and why?

This PR attempts to fix this performance issue when replacing large array.

When replacing an array which contains object with an identifier attribute, it's possible to lookup the object for reconciliation by using a map lookup instead of linear scan of the entire array.

Steps to validate locally

I added a new test case named it should reconciliate large array instances efficiently.
This test takes a long time before the fix and its quick after.

@yossivainshtein yossivainshtein marked this pull request as draft December 8, 2025 18:07
@coolsoftwaretyler
Copy link
Collaborator

Thanks @yossivainshtein - this is great. I will take a look once you mark it as ready for review.

@coolsoftwaretyler coolsoftwaretyler self-requested a review December 8, 2025 18:29
Copy link
Collaborator

@thegedge thegedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like @coolsoftwaretyler mentioned, we can give a full review when you're ready, @yossivainshtein.

Thanks for your contribution! I've dropped in some thoughts that may help improve your PR :)

Comment on lines +354 to +358
// Creates a map of node by identifier value.
// In theory, several nodes can have the same identifier, if the array contains different types, so every identifier is mapped to an array of nodes with the same values.
// In practice this in probably a rare case, so we can live with the performance hit.
//
// If not all nodes have identifier, we can't use the map for lookups, so we return null.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love that you've thought through this!

I'd highly recommend capturing various edge cases in test (if we haven't already done so). Some cases that come to mind would be tests for arrays of:

  1. all scalars,
  2. model types without identifiers,
  3. model types with identifiers,
  4. union of two distinct model types with and without identifiers
  5. union of scalar and model type (I believe MST supports this, but double check).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, of course these should be tested. I'll try to add those soon


function buildObjectByIdMap(nodes: AnyNode[]): [Set<string>, Map<string, Array<AnyNode>> | null] {
// Creates a map of node by identifier value.
// In theory, several nodes can have the same identifier, if the array contains different types, so every identifier is mapped to an array of nodes with the same values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line of your comment is an interesting one. Basically, it states that we still have the same worst case complexity.

That being said, in those scenarios we're likely to fall back to the old code path. In the case where we have an array with models of the same type containing identifiers, we get a nice perf boost 🚀

(nothing to action here, I just like talking things through)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'm trying to understand when it's OK to use the map and when it isn't...

@yossivainshtein
Copy link
Author

Thanks for the comments @thegedge , @coolsoftwaretyler !

I obviously should add some tests for various edge cases, I hope to get to it soon.

I made it draft because I wanted to make sure first that this direction which I took makes sense. Perhaps I should have opened a discussion (not sure how to do that).

Another question - My test is more of a performance test, not really a UT. I saw that there's a separate file for performance tests but i couldn't get it to work... do they work?

@coolsoftwaretyler
Copy link
Collaborator

Hey @yossivainshtein - you can always open discussions at https://github.com/mobxjs/mobx-state-tree/discussions, but I like collaborating in a PR anyways.

I'm not sure if the perf tests work. They've been in the codebase a long time. I wouldn't use them as a bar to clear for a PR at the moment, although for this change it makes sense to:

  1. Write your own set of perf tests for this change in particular
  2. We may want to revisit the existing perf tests and see if we can get them working to check for regressions from your change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replacing large array very slow (possibly due to reconciliation)

4 participants