Helper peak memory usage can be reduced #3285

divergentdave · 2024-07-09T15:34:04Z

I set up a harness to run one big aggregation job under Valgrind, using the DHAT tool. For both the leader and the helper, the peak memory usage was similar, double the total measurement share/output share size, plus various smaller allocations. (in the case of the helper, the next largest allocation is the aggregation job initialization request body, in a trillium_http::ReceivedBody)

In the case of the leader, one copy of the measurement share is allocated when loading report aggregations from the database in the aggregation job driver. The other copy is made inside the VDAF's prepare_init(), because the Prio3 leader measurement share is cloned for storage in Prio3PrepareState. We could probably eliminate this copy with Arcs inside the Prio3 implementation internals, in order to share and then transfer ownership of this vector, but that's outside the scope of this repository.

In the case of the helper, peak memory usage occurs after VDAF preparation, when preparing to write results to the database. The largest allocations come from two copies of the output shares. One is allocated in Flp::truncate(), and the other is allocated when cloning a Vec<ReportShareData> (which contains in turn WritableReportAggregation and A::OutputShare). This vector is cloned for two reasons, to give an async block ownership while still allowing for transaction retries, and to allow the vector to be modified inside the transaction before the report aggregations are finally written out. One of these copies could be eliminated if the original Vec<ReportShareData> were moved into an Arc, with ownership shared between each retry of the transaction, and if the mid-transaction data structure updates were done on a Vec<Cow<'_, _>>, with borrow references pointing back to the original Arc<Vec<ReportShareData>>. This would require an involved refactor, to allow AggregationJobWriter to work with Cows for both phases of operation, instead of only its second phase, but it could cut peak memory usage by almost half, giving us more headroom from OOM errors.

The text was updated successfully, but these errors were encountered:

branlwyd · 2024-07-09T17:45:32Z

I think this is a worthwhile optimization -- I noticed the copying but did not measure to realize it would be our largest allocation. I may be able to prioritize it soon unless you plan to take on the implementation.

divergentdave mentioned this issue Jul 9, 2024

Use an Arc internally instead of cloning Prio3 measurement share divviup/libprio-rs#1094

Open

branlwyd self-assigned this Jul 12, 2024

branlwyd mentioned this issue Jul 13, 2024

Helper agg init: remove copy of report share data. #3303

Merged

branlwyd closed this as completed in #3303 Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helper peak memory usage can be reduced #3285

Helper peak memory usage can be reduced #3285

divergentdave commented Jul 9, 2024

branlwyd commented Jul 9, 2024

Helper peak memory usage can be reduced #3285

Helper peak memory usage can be reduced #3285

Comments

divergentdave commented Jul 9, 2024

branlwyd commented Jul 9, 2024