Collect protocol implementation #105

tgeoghegan · 2022-04-26T23:52:20Z

To implement the collect protocol, we need:

[leader]/collect endpoint (Skeleton of the leader /collect endpoint #93, Store collect jobs in the database #97)
Collect job URI handler on leader (Leader collect job URI handler #160)
Scheduling of collect jobs in leader
[helper]/aggregate_share endpoint (Helper /aggregate_share endpoint #103, Store aggregate share requests serviced by helper #111)

The text was updated successfully, but these errors were encountered:

tgeoghegan · 2022-04-26T23:57:29Z

max_batch_lifetime enforcement in helper (Max batch lifetime enforcement in the helper #142)

When servicing an AggregateShareReq, helper needs to check whether any of the minimum batch intervals that make up AggregateShareReq.batch_interval have already been included in some other AggregateShareReq. To that end, helper will need to persist the parameters of the AggregateShareReqs it services.

max_batch_lifetime enforcement in leader

Leader needs similar enforcement. This should be done in the /collect handler.

AggregateShareReq needs aggregation_param field (added in interop: Add aggregation parameter to AggregateShareReq ietf-wg-ppm/draft-ietf-ppm-dap#224) (Add agg_param to AggregateShareReq #110)

Implements the helper's `aggregate_share` endpoint. The assumption is that the process of preparing input shares into output shares will create rows in `batch_aggregations` and update the `checksum` and `aggregate_share` columns as individual reports are prepared. Then, all the `/aggregate_share` handler has to do is sum the aggregate shares it finds. Note that this does not include support for the protocol changes in [1], nor does it include enforcement of a task's `max_batch_lifetime`. [1]: ietf-wg-ppm/draft-ietf-ppm-dap#224 Part of #105

`AggregateShareReq` now includes the aggregation parameter[1], which we now reflect in `messages::AggregateShareReq`. We store the encoded aggregation parameter in the datastore, and rename `batch_aggregations` to `batch_unit_aggregations` along the way, for clarity. [1]: ietf-wg-ppm/draft-ietf-ppm-dap#224 Part of #105

Adds a new database table `aggregate_share_jobs`, used by helper to store the results of successfully serviced `AggregateShareReq`s. This allows leaders to retry an `AggregateShareReq` indefinitely, provided the parameters don't change. The leader's `collect_jobs` table now also has some nullable columns where it can cache the leader and helper's aggregate shares, to similarly allow the collector to retry requests to the collect job URI. Part of #105

To enforce a task's `max_batch_lifetime`, we need to know how many times each batch unit in an `AggregateShareReq`'s `batch_interval` has been collected, that is, how many rows in `aggregate_share_jobs` have a `batch_interval` that contains the batch unit's interval. `datastore::Transaction::get_aggregate_share_job_count_by_batch_unit` is meant to be used with one batch unit interval at a time. I suspect this could be optimized into a single SQL query that checks multiple batch units at once. Part of #105

Helper aggregator now rejects aggregate share requests referencing batch units which have already been collected enough times. Part of #105 References #104

When servicing a collect request, the leader must generate a collect job URI relative to the public base URL from which the API is served and then stick that in a `Location` header. We now provide that base URL in the aggregator's config file. Part of #105

Refactors some existing code that supports the helper's `max_batch_lifetime` enforcement so that it can be re-used in the leader's `collect` endpoint. All the logic for that endpoint now moves into methods on `VdafOps`. part of #105

To enforce a task's `max_batch_lifetime`, we need to know how many times each batch unit in an `AggregateShareReq`'s `batch_interval` has been collected, that is, how many rows in `aggregate_share_jobs` have a `batch_interval` that contains the batch unit's interval. `datastore::Transaction::get_aggregate_share_job_count_by_batch_unit` is meant to be used with one batch unit interval at a time. I suspect this could be optimized into a single SQL query that checks multiple batch units at once. Part of #105

Helper aggregator now rejects aggregate share requests referencing batch units which have already been collected enough times. Part of #105 References #104

Helper aggregator now rejects aggregate share requests referencing batch units which have already been collected enough times. In support of this, we add `datastore::Transaction::get_aggregate_share_job_counts_for_intervals`. Part of #105 References #104

When servicing a collect request, the leader must generate a collect job URI relative to the public base URL from which the API is served and then stick that in a `Location` header. We now provide that base URL in the aggregator's config file. Part of #105

Leader now consults the task parameters to determine what base URL to use when constructing collect job URIs. This assumes that a leader will serve collect jobs from the same base URL that it serves other endpoints like `/upload` or `/collect`. Part of #105

Refactors some existing code that supports the helper's `max_batch_lifetime` enforcement so that it can be re-used in the leader's `collect` endpoint. All the logic for that endpoint now moves into methods on `VdafOps`. part of #105

Adds a warp filter for path `/collect_jobs/{collect_job_id}` to the leader. Adds support for querying collect jobs from the datastore as well as updating them with helper and leader aggregate shares. The latter is currently only needed for tests, but will soon be used when running collect jobs. Part of #105

Factors logic for enumerating tasks and creating per-task jobs out of `aggregation_job_creator` and into a new module. Also adds a skeleton of `collect_job_creator` to show how this is used across multiple binary targets. Part of #105

Factors logic for discovering incomplete jobs out of `aggregation_job_driver` and into a new module. Adds a skeleton of `collect_job_creator` to show how this is used across multiple binary targets. Part of #105

Adds support for acquiring and releasing leases on collect jobs to the datastore module, which will soon be used by the collect job driver to drive jobs. Part of #105

Fleshes out the implementation of the Janus collect job driver. Some existing logic used for the helper's `/aggregate_share` handler is refactored into `mod aggregate_share` so it can be used in `collect_job_driver`. Additionally, the existing methods `update_collect_job_*` methods on `datastore::Transaction` are collapsed into a single method that sets helper aggregate share, leader aggregate share, report count and checksum in a single operation. This simplifies the logic of the collect job driver since it doesn't have to deal with the case where the leader's aggregate share was computed but the helper's isn't known yet. The downside is that if a collect job fails because the helper failed to compute its aggregate share, then the leader will recompute its share "from scratch" the next time the collect job is run. If helpers fail often enough to make caching the leader aggregate share worthwhile, then we probably have bigger problems than this performance issue. Part of #105

tgeoghegan · 2022-06-29T20:21:46Z

All this work is done, closing!

tgeoghegan self-assigned this Apr 26, 2022

tgeoghegan mentioned this issue Apr 28, 2022

Add agg_param to AggregateShareReq #110

Merged

tgeoghegan mentioned this issue Apr 29, 2022

Store aggregate share requests serviced by helper #111

Merged

tgeoghegan added a commit that referenced this issue May 5, 2022

helper max_batch_lifetime enforcement

a115bc4

Helper aggregator now rejects aggregate share requests referencing batch units which have already been collected enough times. Part of #105 References #104

tgeoghegan mentioned this issue May 5, 2022

Provide API base URL in aggregator config #143

Closed

tgeoghegan mentioned this issue May 5, 2022

max batch lifetime enforcement in the leader #145

Merged

tgeoghegan added a commit that referenced this issue May 10, 2022

helper max_batch_lifetime enforcement

42c9958

Helper aggregator now rejects aggregate share requests referencing batch units which have already been collected enough times. Part of #105 References #104

tgeoghegan added a commit that referenced this issue May 10, 2022

helper max_batch_lifetime enforcement

03f8655

Helper aggregator now rejects aggregate share requests referencing batch units which have already been collected enough times. Part of #105 References #104

tgeoghegan mentioned this issue May 11, 2022

Construct collect job URIs from task parameters #153

Merged

tgeoghegan mentioned this issue May 12, 2022

Leader collect job URI handler #160

Merged

tgeoghegan mentioned this issue May 18, 2022

Introduce janus_main #176

Merged

tgeoghegan mentioned this issue May 19, 2022

Introduce mod job_creator #180

Closed

tgeoghegan mentioned this issue May 20, 2022

Introduce mod job_driver #182

Merged

tgeoghegan added a commit that referenced this issue May 25, 2022

acquire and release collect jobs in datastore

22e7f1f

Adds support for acquiring and releasing leases on collect jobs to the datastore module, which will soon be used by the collect job driver to drive jobs. Part of #105

tgeoghegan mentioned this issue May 25, 2022

acquire and release collect jobs in datastore #192

Merged

tgeoghegan added a commit that referenced this issue May 26, 2022

acquire and release collect jobs in datastore

08ae24c

Adds support for acquiring and releasing leases on collect jobs to the datastore module, which will soon be used by the collect job driver to drive jobs. Part of #105

tgeoghegan added a commit that referenced this issue May 27, 2022

acquire and release collect jobs in datastore

2d5cb60

Adds support for acquiring and releasing leases on collect jobs to the datastore module, which will soon be used by the collect job driver to drive jobs. Part of #105

tgeoghegan mentioned this issue May 27, 2022

Collect job driver #195

Merged

tgeoghegan closed this as completed Jun 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collect protocol implementation #105

Collect protocol implementation #105

tgeoghegan commented Apr 26, 2022 •

edited

Loading

tgeoghegan commented Apr 26, 2022 •

edited

Loading

tgeoghegan commented Jun 29, 2022

Collect protocol implementation #105

Collect protocol implementation #105

Comments

tgeoghegan commented Apr 26, 2022 • edited Loading

tgeoghegan commented Apr 26, 2022 • edited Loading

tgeoghegan commented Jun 29, 2022

tgeoghegan commented Apr 26, 2022 •

edited

Loading

tgeoghegan commented Apr 26, 2022 •

edited

Loading