v1.7.0 - Improve Memory Management for LDV Full Recalculations #636

jamessimone · 2024-10-31T21:10:24Z

This is the first (potentially) non-breaking change medium version bump, but in reflecting on the scope of the changes being made, it feels appropriate to make the leap to version 1.7.0. Here's a breakdown of the changes:

Overview

RollupFullBatchRecalculator is no longer an actual batch class, per se, though it still does batch processing. Instead, it takes advantage of the Apex Cursor beta to iterate through children records when doing full recalculations where the number of records exceeds the value for RollupControl__mdt.MaxLookupRowsBeforeBatching__c. Like its batch class predecessor, it will continue to default to calculating rollup values for 500 children at a time - still configurable using RollupControl__mdt.BatchChunkSize__c. Using Database.Cursor has several big advantages over using a batch class, and while the feature is still in beta, I expect that it will GA soon
Previously, Apex Rollup took advantage of Database.Stateful, which is a marker interface that's part of the Database.Batchable framework to calculate what a parent record's prior value was when that parent record had children split between batches. This happens more often than you'd expect, even with very fine-tuned ordering of the children records. With large data volume Salesforce orgs, keeping track of which parents have been updated while iterating through all of the children records represents a serious problem, and at a certain point when relying on Database.Stateful to calculate millions of records and dozens of rollups for a single child type, the rollup calculation will inevitably fail as memory runs out.
While managing everything in-memory is extremely convenient as the package author, I'd basically arrived at the end of the road when it came to optimizing the batch full recalc process to try to avoid running out of memory. There were no tricks left - just the cold, hard fact that 12 MB of space will eventually run out with a one-to-many relationship between each parent record and the rollups being applied to that record when there are millions of parent records involved. This was the problem reported in FATAL ERROR: Database.executeBatch cannot be called from a batch start, batch execute, or future method. #615
⚠️ Salesforce orgs have a MAXIMUM number of active DataWeave scripts that can be used. ⚠️
If you saw the phrase "(potentially) non-breaking change" above and went 🤔, v1.7.0 adds a DataWeave script to the package. While I view it as unlikely that there are orgs out there running up against that limit, it's worth calling out here. Contact your AE if you truly cannot accommodate additional active DataWeave scripts and want to update. See the section titled "Easily Parsing Data" below for more info on why this change is being made
Because records are now created during "batch" full recalculations, if you abort a full recalculation using the Apex Jobs page in setup, or via some other means, there is the potential that RollupState__c records will be orphaned and won't be cleaned up. You will have to delete those records manually. More info is available in the "Easily Deleting Data" section.

Challenges

There's one natural conclusion when an in-memory data management strategy is no longer sufficient, and that's a record-based memory management solution. As a package author, however, it would be irresponsible (not to mention untidy) to implement a feature that contributes significantly to the data storage on a subscriber org. This means that the "easy" approach - a data record (henceforth referred to as "state", or "Rollup State", or the API name for the object) for each parent - was immediately disqualified. The solution would have to compress data such that the previously calculated values for a parent record could be:

easily retrieved,
easily parsed
easily deleted

Easily Retrieving Data

In order to facilitate the easy retrieval of data, v1.7.0 uses a new custom object, RollupState__c, which has on it 11 different custom text fields ranging from RollupState__c.RelatedRecordKeys0__c all the way through RollupState__c.RelatedRecordKeys10__c. Because you can't filter on long text area fields within SOQL, and because the maximum length of a standard text field is 255 characters, creating many fields like this is a crucial piece of the data compression strategy that makes it possible to store state for many parents within a single record. Assuming that most rollups will be keyed to standard Salesforce ids, it follows then that each related record key field can hold (255/(18 + 1) = 13 parent ids per field (the + 1 accounts for the comma separating each key).

Despite using 11 custom fields for this (which might sound like a lot), that ends up allowing for a maximum of 13 x 11 = 143 parent records' data being compressed into a single RollupState__c. Now that the kinks have been ironed out with the logic that keeps track of which related record key field is being used at any given time, it will be possible to further expand the number of custom fields being used if there's any concern with the data storage use of Rollup State records for these calculations (which would allow for the data to be even further compressed). That being said, since relevant records are deleted within each cursor "chunk" that's processed, I am hopeful that the data storage concerns being discussed here are purely theoretical.

Easily Parsing Data

This proved to be an unexpected hurdle - initial benchmarking on deserializing the long text area on each Rollup State record was poor. As in, for some chunks that had only a dozen state records, the deserialization of those records back into Apex-defined types was causing the 60 second Apex CPU time limit to be exceeded 🤯.

This necessitated the next big change to Apex Rollup - the introduction of a DataWeave script to perform the deserialization. Using DataWeave, the deserialization time was cut from a process that at times was taking 30+ seconds to an operation that has always finished within a single second. That's enough of a justification for me to proceed with this piece of tech, though I do still have plans to circle back on the poor performance of the native deserializer.

Easily Deleting Data

At the end of each recalculation job, any remaining RollupState__c records are deleted. This is something to be mindful of. If you were to abort a currently running recalculation job, this would effectively "orphan" those records, and they'll need to be deleted separately (by you, presumably). The framework will not (at this time) do any periodic maintenance on removing older RollupState__c records that could be orphaned in this manner, though I am open to the idea of giving users a way to conveniently do so via the Apex Rollup app should the need for that arise.

Special Considerations

One of the... downstream effects... of moving away from using Batch Apex was that the majority of the Apex Rollup framework now relies on Queueables to perform async processing. In order to accommodate this, v1.7.0 takes (greater) advantage of Queueable Finalizers to coordinate work between different threads, as finalizers are the only place where it's possible to fully chain different jobs together without exceeding limits.

As I had many failing tests when replacing the Batchable version of RollupFullBatchRecalculator with a Queueable, and I've been performing integration testing to sanity check huge swathes of this work, I don't expect that this change should be visible to people in anything other than a possible increase in Debug Logs being generated (as Finalizers create their own, separate, log file). I've mentioned it here, though, on the off-chance that this change does somehow lead to deleterious effects.

jamessimone · 2024-10-31T23:25:53Z

@jongpie just pushed up a few other small changes that I noticed while re-reviewing, but otherwise I think this is good to review!

rollup/core/objects/RollupState__c/RollupState__c.object-meta.xml

rollup/core/objects/RollupState__c/fields/RelatedRecordKeys0__c.field-meta.xml

rollup/core/classes/RollupState.cls

rollup/core/classes/RollupFullRecalcProcessor.cls

rollup/core/classes/RollupParentResetProcessor.cls

rollup/core/classes/RollupState.cls

extra-tests/classes/RollupFullRecalcTests.cls

extra-tests/classes/RollupStateTests.cls

jongpie · 2024-11-01T02:48:56Z

@jamessimone I made it through all of the changes, and don't see anything major to change, just some little things for cleanup, and some questions. If I have some more time this weekend, I'll try to review it again just to be safe, but it's looking really good 🥳

jamessimone · 2024-11-01T09:16:34Z

Amazing. Thank you. You caught a lot of great stuff that I am going to clean up today. I'll let you know when I've pushed up my code review changes!

jamessimone · 2024-11-01T15:44:36Z

@jongpie okay, I just pushed up another commit which I think addresses all of the code review items!

…ollections variable

… batch chunks accessing state for parents that exist in multiple batch chunks

…stress testing was showing JSON.deserialize was taking as little as a half second but up to a FULL MINUTE to deserialize large amounts of state

…ight cleanup of RollupFinalizer and RollupParentResetProcessor

codecov · 2024-11-05T03:38:34Z

Codecov Report

Attention: Patch coverage is 92.76190% with 38 lines in your changes missing coverage. Please review.

Project coverage is 96.29%. Comparing base (c67a403) to head (0a7781d).

Files with missing lines	Patch %	Lines
rollup/core/classes/RollupState.cls	94.27%	13 Missing ⚠️
rollup/core/classes/RollupSObjectUpdater.cls	70.37%	8 Missing ⚠️
rollup/core/classes/Rollup.cls	83.33%	4 Missing ⚠️
rollup/core/classes/RollupFullRecalcProcessor.cls	86.66%	4 Missing ⚠️
rollup/core/classes/RollupAsyncProcessor.cls	95.58%	3 Missing ⚠️
rollup/core/classes/RollupFinalizer.cls	90.00%	3 Missing ⚠️
rollup/core/classes/RollupCalculator.cls	97.91%	1 Missing ⚠️
...ollup/core/classes/RollupFullBatchRecalculator.cls	95.23%	1 Missing ⚠️
rollup/core/classes/RollupParentResetProcessor.cls	96.77%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #636      +/-   ##
==========================================
+ Coverage   92.02%   96.29%   +4.27%     
==========================================
  Files           5       33      +28     
  Lines         326     7133    +6807     
  Branches       61       61              
==========================================
+ Hits          300     6869    +6569     
- Misses         21      259     +238     
  Partials        5        5

Flag	Coverage Δ
Apex	`96.50% <92.76%> (?)`
LWC	`92.02% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jamessimone requested a review from jongpie October 31, 2024 21:10

jamessimone temporarily deployed to Test October 31, 2024 21:10 — with GitHub Actions Inactive

jamessimone had a problem deploying to Test October 31, 2024 23:21 — with GitHub Actions Error

jamessimone temporarily deployed to Test October 31, 2024 23:25 — with GitHub Actions Inactive