You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I set up a harness to run one big aggregation job under Valgrind, using the DHAT tool. For both the leader and the helper, the peak memory usage was similar, double the total measurement share/output share size, plus various smaller allocations. (in the case of the helper, the next largest allocation is the aggregation job initialization request body, in a trillium_http::ReceivedBody)
In the case of the leader, one copy of the measurement share is allocated when loading report aggregations from the database in the aggregation job driver. The other copy is made inside the VDAF's prepare_init(), because the Prio3 leader measurement share is cloned for storage in Prio3PrepareState. We could probably eliminate this copy with Arcs inside the Prio3 implementation internals, in order to share and then transfer ownership of this vector, but that's outside the scope of this repository.
In the case of the helper, peak memory usage occurs after VDAF preparation, when preparing to write results to the database. The largest allocations come from two copies of the output shares. One is allocated in Flp::truncate(), and the other is allocated when cloning a Vec<ReportShareData> (which contains in turn WritableReportAggregation and A::OutputShare). This vector is cloned for two reasons, to give an async block ownership while still allowing for transaction retries, and to allow the vector to be modified inside the transaction before the report aggregations are finally written out. One of these copies could be eliminated if the original Vec<ReportShareData> were moved into an Arc, with ownership shared between each retry of the transaction, and if the mid-transaction data structure updates were done on a Vec<Cow<'_, _>>, with borrow references pointing back to the original Arc<Vec<ReportShareData>>. This would require an involved refactor, to allow AggregationJobWriter to work with Cows for both phases of operation, instead of only its second phase, but it could cut peak memory usage by almost half, giving us more headroom from OOM errors.
The text was updated successfully, but these errors were encountered:
I think this is a worthwhile optimization -- I noticed the copying but did not measure to realize it would be our largest allocation. I may be able to prioritize it soon unless you plan to take on the implementation.
I set up a harness to run one big aggregation job under Valgrind, using the DHAT tool. For both the leader and the helper, the peak memory usage was similar, double the total measurement share/output share size, plus various smaller allocations. (in the case of the helper, the next largest allocation is the aggregation job initialization request body, in a
trillium_http::ReceivedBody
)In the case of the leader, one copy of the measurement share is allocated when loading report aggregations from the database in the aggregation job driver. The other copy is made inside the VDAF's
prepare_init()
, because the Prio3 leader measurement share is cloned for storage inPrio3PrepareState
. We could probably eliminate this copy withArc
s inside the Prio3 implementation internals, in order to share and then transfer ownership of this vector, but that's outside the scope of this repository.In the case of the helper, peak memory usage occurs after VDAF preparation, when preparing to write results to the database. The largest allocations come from two copies of the output shares. One is allocated in
Flp::truncate()
, and the other is allocated when cloning aVec<ReportShareData>
(which contains in turnWritableReportAggregation
andA::OutputShare
). This vector is cloned for two reasons, to give an async block ownership while still allowing for transaction retries, and to allow the vector to be modified inside the transaction before the report aggregations are finally written out. One of these copies could be eliminated if the originalVec<ReportShareData>
were moved into anArc
, with ownership shared between each retry of the transaction, and if the mid-transaction data structure updates were done on aVec<Cow<'_, _>>
, with borrow references pointing back to the originalArc<Vec<ReportShareData>>
. This would require an involved refactor, to allowAggregationJobWriter
to work withCow
s for both phases of operation, instead of only its second phase, but it could cut peak memory usage by almost half, giving us more headroom from OOM errors.The text was updated successfully, but these errors were encountered: