Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect protocol implementation #105

Closed
4 tasks done
tgeoghegan opened this issue Apr 26, 2022 · 2 comments
Closed
4 tasks done

Collect protocol implementation #105

tgeoghegan opened this issue Apr 26, 2022 · 2 comments
Assignees

Comments

@tgeoghegan
Copy link
Contributor

tgeoghegan commented Apr 26, 2022

To implement the collect protocol, we need:

@tgeoghegan tgeoghegan self-assigned this Apr 26, 2022
@tgeoghegan
Copy link
Contributor Author

tgeoghegan commented Apr 26, 2022

When servicing an AggregateShareReq, helper needs to check whether any of the minimum batch intervals that make up AggregateShareReq.batch_interval have already been included in some other AggregateShareReq. To that end, helper will need to persist the parameters of the AggregateShareReqs it services.

  • max_batch_lifetime enforcement in leader

Leader needs similar enforcement. This should be done in the /collect handler.

tgeoghegan added a commit that referenced this issue Apr 27, 2022
Implements the helper's `aggregate_share` endpoint. The assumption is
that the process of preparing input shares into output shares will
create rows in `batch_aggregations` and update the `checksum` and
`aggregate_share` columns as individual reports are prepared. Then, all
the `/aggregate_share` handler has to do is sum the aggregate shares it
finds.

Note that this does not include support for the protocol changes in [1],
nor does it include enforcement of a task's `max_batch_lifetime`.

[1]: ietf-wg-ppm/draft-ietf-ppm-dap#224

Part of #105
tgeoghegan added a commit that referenced this issue Apr 28, 2022
`AggregateShareReq` now includes the aggregation parameter[1], which we
now reflect in `messages::AggregateShareReq`. We store the encoded
aggregation parameter in the datastore, and rename `batch_aggregations`
to `batch_unit_aggregations` along the way, for clarity.

[1]: ietf-wg-ppm/draft-ietf-ppm-dap#224

Part of #105
tgeoghegan added a commit that referenced this issue Apr 28, 2022
`AggregateShareReq` now includes the aggregation parameter[1], which we
now reflect in `messages::AggregateShareReq`. We store the encoded
aggregation parameter in the datastore, and rename `batch_aggregations`
to `batch_unit_aggregations` along the way, for clarity.

[1]: ietf-wg-ppm/draft-ietf-ppm-dap#224

Part of #105
tgeoghegan added a commit that referenced this issue Apr 28, 2022
`AggregateShareReq` now includes the aggregation parameter[1], which we
now reflect in `messages::AggregateShareReq`. We store the encoded
aggregation parameter in the datastore, and rename `batch_aggregations`
to `batch_unit_aggregations` along the way, for clarity.

[1]: ietf-wg-ppm/draft-ietf-ppm-dap#224

Part of #105
tgeoghegan added a commit that referenced this issue Apr 28, 2022
`AggregateShareReq` now includes the aggregation parameter[1], which we
now reflect in `messages::AggregateShareReq`. We store the encoded
aggregation parameter in the datastore, and rename `batch_aggregations`
to `batch_unit_aggregations` along the way, for clarity.

[1]: ietf-wg-ppm/draft-ietf-ppm-dap#224

Part of #105
tgeoghegan added a commit that referenced this issue Apr 29, 2022
Adds a new database table `aggregate_share_jobs`, used by helper to
store the results of successfully serviced `AggregateShareReq`s. This
allows leaders to retry an `AggregateShareReq` indefinitely, provided
the parameters don't change. The leader's `collect_jobs` table now also
has some nullable columns where it can cache the leader and helper's
aggregate shares, to similarly allow the collector to retry requests to
the collect job URI.

Part of #105
tgeoghegan added a commit that referenced this issue May 4, 2022
Adds a new database table `aggregate_share_jobs`, used by helper to
store the results of successfully serviced `AggregateShareReq`s. This
allows leaders to retry an `AggregateShareReq` indefinitely, provided
the parameters don't change. The leader's `collect_jobs` table now also
has some nullable columns where it can cache the leader and helper's
aggregate shares, to similarly allow the collector to retry requests to
the collect job URI.

Part of #105
tgeoghegan added a commit that referenced this issue May 4, 2022
Adds a new database table `aggregate_share_jobs`, used by helper to
store the results of successfully serviced `AggregateShareReq`s. This
allows leaders to retry an `AggregateShareReq` indefinitely, provided
the parameters don't change. The leader's `collect_jobs` table now also
has some nullable columns where it can cache the leader and helper's
aggregate shares, to similarly allow the collector to retry requests to
the collect job URI.

Part of #105
tgeoghegan added a commit that referenced this issue May 5, 2022
To enforce a task's `max_batch_lifetime`, we need to know how many times
each batch unit in an `AggregateShareReq`'s `batch_interval` has been
collected, that is, how many rows in `aggregate_share_jobs` have a
`batch_interval` that contains the batch unit's interval.

`datastore::Transaction::get_aggregate_share_job_count_by_batch_unit` is
meant to be used with one batch unit interval at a time. I suspect this
could be optimized into a single SQL query that checks multiple batch
units at once.

Part of #105
tgeoghegan added a commit that referenced this issue May 5, 2022
Helper aggregator now rejects aggregate share requests referencing batch
units which have already been collected enough times.

Part of #105
References #104
tgeoghegan added a commit that referenced this issue May 5, 2022
When servicing a collect request, the leader must generate a collect job
URI relative to the public base URL from which the API is served and
then stick that in a `Location` header. We now provide that base URL in
the aggregator's config file.

Part of #105
tgeoghegan added a commit that referenced this issue May 5, 2022
When servicing a collect request, the leader must generate a collect job
URI relative to the public base URL from which the API is served and
then stick that in a `Location` header. We now provide that base URL in
the aggregator's config file.

Part of #105
tgeoghegan added a commit that referenced this issue May 5, 2022
Refactors some existing code that supports the helper's
`max_batch_lifetime` enforcement so that it can be re-used in the
leader's `collect` endpoint. All the logic for that endpoint now moves
into methods on `VdafOps`.

part of #105
tgeoghegan added a commit that referenced this issue May 10, 2022
To enforce a task's `max_batch_lifetime`, we need to know how many times
each batch unit in an `AggregateShareReq`'s `batch_interval` has been
collected, that is, how many rows in `aggregate_share_jobs` have a
`batch_interval` that contains the batch unit's interval.

`datastore::Transaction::get_aggregate_share_job_count_by_batch_unit` is
meant to be used with one batch unit interval at a time. I suspect this
could be optimized into a single SQL query that checks multiple batch
units at once.

Part of #105
tgeoghegan added a commit that referenced this issue May 10, 2022
Helper aggregator now rejects aggregate share requests referencing batch
units which have already been collected enough times.

Part of #105
References #104
tgeoghegan added a commit that referenced this issue May 10, 2022
Helper aggregator now rejects aggregate share requests referencing batch
units which have already been collected enough times.

Part of #105
References #104
tgeoghegan added a commit that referenced this issue May 10, 2022
Helper aggregator now rejects aggregate share requests referencing batch
units which have already been collected enough times.

In support of this, we add
`datastore::Transaction::get_aggregate_share_job_counts_for_intervals`.

Part of #105
References #104
tgeoghegan added a commit that referenced this issue May 10, 2022
When servicing a collect request, the leader must generate a collect job
URI relative to the public base URL from which the API is served and
then stick that in a `Location` header. We now provide that base URL in
the aggregator's config file.

Part of #105
tgeoghegan added a commit that referenced this issue May 11, 2022
Leader now consults the task parameters to determine what base URL to
use when constructing collect job URIs. This assumes that a leader will
serve collect jobs from the same base URL that it serves other endpoints
like `/upload` or `/collect`.

Part of #105
tgeoghegan added a commit that referenced this issue May 11, 2022
Refactors some existing code that supports the helper's
`max_batch_lifetime` enforcement so that it can be re-used in the
leader's `collect` endpoint. All the logic for that endpoint now moves
into methods on `VdafOps`.

part of #105
tgeoghegan added a commit that referenced this issue May 11, 2022
Refactors some existing code that supports the helper's
`max_batch_lifetime` enforcement so that it can be re-used in the
leader's `collect` endpoint. All the logic for that endpoint now moves
into methods on `VdafOps`.

part of #105
tgeoghegan added a commit that referenced this issue May 12, 2022
Adds a warp filter for path `/collect_jobs/{collect_job_id}` to the
leader. Adds support for querying collect jobs from the datastore as
well as updating them with helper and leader aggregate shares. The
latter is currently only needed for tests, but will soon be used when
running collect jobs.

Part of #105
tgeoghegan added a commit that referenced this issue May 12, 2022
Adds a warp filter for path `/collect_jobs/{collect_job_id}` to the
leader. Adds support for querying collect jobs from the datastore as
well as updating them with helper and leader aggregate shares. The
latter is currently only needed for tests, but will soon be used when
running collect jobs.

Part of #105
tgeoghegan added a commit that referenced this issue May 13, 2022
Adds a warp filter for path `/collect_jobs/{collect_job_id}` to the
leader. Adds support for querying collect jobs from the datastore as
well as updating them with helper and leader aggregate shares. The
latter is currently only needed for tests, but will soon be used when
running collect jobs.

Part of #105
tgeoghegan added a commit that referenced this issue May 13, 2022
Adds a warp filter for path `/collect_jobs/{collect_job_id}` to the
leader. Adds support for querying collect jobs from the datastore as
well as updating them with helper and leader aggregate shares. The
latter is currently only needed for tests, but will soon be used when
running collect jobs.

Part of #105
tgeoghegan added a commit that referenced this issue May 19, 2022
Factors logic for enumerating tasks and creating per-task jobs out of
`aggregation_job_creator` and into a new module. Also adds a skeleton of
`collect_job_creator` to show how this is used across multiple binary
targets.

Part of #105
tgeoghegan added a commit that referenced this issue May 20, 2022
Factors logic for discovering incomplete jobs out of
`aggregation_job_driver` and into a new module.
Adds a skeleton of `collect_job_creator` to show how this is used across
multiple binary targets.

Part of #105
tgeoghegan added a commit that referenced this issue May 25, 2022
Factors logic for discovering incomplete jobs out of
`aggregation_job_driver` and into a new module.
Adds a skeleton of `collect_job_creator` to show how this is used across
multiple binary targets.

Part of #105
tgeoghegan added a commit that referenced this issue May 25, 2022
Factors logic for discovering incomplete jobs out of
`aggregation_job_driver` and into a new module.
Adds a skeleton of `collect_job_creator` to show how this is used across
multiple binary targets.

Part of #105
tgeoghegan added a commit that referenced this issue May 25, 2022
Adds support for acquiring and releasing leases on collect jobs to the
datastore module, which will soon be used by the collect job driver to
drive jobs.

Part of #105
tgeoghegan added a commit that referenced this issue May 26, 2022
Adds support for acquiring and releasing leases on collect jobs to the
datastore module, which will soon be used by the collect job driver to
drive jobs.

Part of #105
tgeoghegan added a commit that referenced this issue May 27, 2022
Adds support for acquiring and releasing leases on collect jobs to the
datastore module, which will soon be used by the collect job driver to
drive jobs.

Part of #105
tgeoghegan added a commit that referenced this issue May 27, 2022
Fleshes out the implementation of the Janus collect job driver.

Some existing logic used for the helper's `/aggregate_share` handler is
refactored into `mod aggregate_share` so it can be used in
`collect_job_driver`.

Additionally, the existing methods `update_collect_job_*` methods on
`datastore::Transaction` are collapsed into a single method that sets
helper aggregate share, leader aggregate share, report count and
checksum in a single operation. This simplifies the logic of the collect
job driver since it doesn't have to deal with the case where the
leader's aggregate share was computed but the helper's isn't known yet.
The downside is that if a collect job fails because the helper failed to
compute its aggregate share, then the leader will recompute its share
"from scratch" the next time the collect job is run. If helpers fail
often enough to make caching the leader aggregate share worthwhile, then
we probably have bigger problems than this performance issue.

Part of #105
tgeoghegan added a commit that referenced this issue Jun 1, 2022
Fleshes out the implementation of the Janus collect job driver.

Some existing logic used for the helper's `/aggregate_share` handler is
refactored into `mod aggregate_share` so it can be used in
`collect_job_driver`.

Additionally, the existing methods `update_collect_job_*` methods on
`datastore::Transaction` are collapsed into a single method that sets
helper aggregate share, leader aggregate share, report count and
checksum in a single operation. This simplifies the logic of the collect
job driver since it doesn't have to deal with the case where the
leader's aggregate share was computed but the helper's isn't known yet.
The downside is that if a collect job fails because the helper failed to
compute its aggregate share, then the leader will recompute its share
"from scratch" the next time the collect job is run. If helpers fail
often enough to make caching the leader aggregate share worthwhile, then
we probably have bigger problems than this performance issue.

Part of #105
@tgeoghegan
Copy link
Contributor Author

All this work is done, closing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant