-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track time spent in datastore methods; use this to charge energy #1957
base: master
Are you sure you want to change the base?
Conversation
`TxId` and `MutTxId` now read from the `Instant` clock around all of their public methods, and accumulate the total `Duration` spent in those methods throughout the life of the transaction. Then, when committing or aborting the transaction, the `RelationalDB` charges energy via a new `EnergyMonitor` method, `record_datastore_time`. This necessitates having `RelationalDB` hold an `Arc<dyn EnergyMonitor>`, as previously it had no mechanism by which to access the energy monitor. Note that this approach is intended as a stopgap, rather than a permanent solution. Notable flaws inclue: - Energy consumed by datastore operations is not included in the WS broadcasts sent to clients; these contain only WASM execution energy. This is because we construct the `ModuleEvent` before downgrading the TX and running incremental queries, and the total datastore energy cost is only available after those operations. - Wall time is a very noisy measurement. It is not necessarily incorrect to pass this noise on to our customers, since many of the unpredictable costs are caused by events like page allocations which are caused by the user code, and in general we expect any variance not caused by user code to disappear as the total time grows large. Still, it's not wonderful that we will end up charging different amounts of energy to run the same operations against the same data sets multiple times. - This patch does not track time spent in iterator or `RowRef` operations, only methods on the (`Mut`-) `TxId` (which are generally accessed through the `RelationalDB`). - We also need to take care that this does not significantly regress performance, since it involves a potentially large number of added clock reads and relaxed atomic increments. My hope is that the atomics are uncontended and so it's not a significant cost, but hey, this is why we have benchmarks, right?
Closes #666 . Nice. |
benchmarks please |
Benchmarking failed. Please check the workflow run for details. |
Callgrind benchmark resultsCallgrind Benchmark ReportThese benchmarks were run using callgrind, Measurement changes larger than five percent are in bold. In-memory benchmarkscallgrind: empty transaction
callgrind: filter
callgrind: insert bulk
callgrind: iterate
callgrind: serialize_product_value
callgrind: update bulk
On-disk benchmarkscallgrind: empty transaction
callgrind: filter
callgrind: insert bulk
callgrind: iterate
callgrind: serialize_product_value
callgrind: update bulk
|
Uhhh keep in mind callgrind might not simulate delays due to atomics. But if they're uncontended it probably doesn't matter. |
Ah, I hadn't considered that. Are the wall-time benchmarks working again? Do I need to merge something into this branch? |
@Centril has offered to run some benchmarks. This is low-priority among his work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in providing feedback here.
We're most likely going to have to rethink this approach since the majority of energy usage is going to come from reads, i.e. iterators. There are really only two operations that determine the cost of reads (in the datastore):
- Index seeks
- Row ID fetches, or
RowPointer
dereferences.
Of course the cost of both of these operations is highly data dependent, but just counting them should give us an initial estimate that's good enough modulo some constant factor. Ultimately we'll want to take the table schema into account to more accurately estimate the cost of a fetch.
Description of Changes
TxId
andMutTxId
now read from theInstant
clock around all of their public methods, and accumulate the totalDuration
spent in those methods throughout the life of the transaction. Then, when committing or aborting the transaction, theRelationalDB
charges energy via a newEnergyMonitor
method,record_datastore_time
.This necessitates having
RelationalDB
hold anArc<dyn EnergyMonitor>
, as previously it had no mechanism by which to access the energy monitor.Note that this approach is intended as a stopgap, rather than a permanent solution. Notable flaws include:
Energy consumed by datastore operations is not included in the WS broadcasts sent to clients; these contain only WASM execution energy. This is because we construct the
ModuleEvent
before downgrading the TX and running incremental queries, and the total datastore energy cost is only available after those operations.Wall time is a very noisy measurement. It is not necessarily incorrect to pass this noise on to our customers, since many of the unpredictable costs are caused by events like page allocations which are caused by the user code, and in general we expect any variance not caused by user code to disappear as the total time grows large. Still, it's not wonderful that we will end up charging different amounts of energy to run the same operations against the same data sets multiple times.
This patch does not track time spent in iterator or
RowRef
operations, only methods on the (Mut
-)TxId
(which are generally accessed through theRelationalDB
).We also need to take care that this does not significantly regress performance, since it involves a potentially large number of added clock reads and relaxed atomic increments. My hope is that the atomics are uncontended and so it's not a significant cost, but hey, this is why we have benchmarks, right?
API and ABI breaking changes
Will require private PR, which I will do after we determine that this is an acceptable approach and that the overhead is low. No user-facing interfaces change.
Expected complexity level and risk
2 - performance risks, and may under- or over-charge energy, but seems unlikely to break functionality.
Testing
Describe any testing you've done, and any testing you'd like your reviewers to do,
so that you're confident that all the changes work as expected!