Ensure sample_id is always populated #1265

jklukas · 2020-04-27T15:59:09Z

As we continue to have discussions about data retention policies, sampling may become an even more important concern where we permanently delete data after a certain period of time based on sample_id.

We could consider adding more robust support for sampling, with schemas defined in mozilla-pipeline-schemas including metadata about the field(s) to use for calculating sample_id. Absent such a field, we could fall back to per-document sampling rather than per-client by calculating sample_id from document_id.

cc @mreid-moz

The text was updated successfully, but these errors were encountered:

jklukas added the pipeline metadata Should be solved by capturing new metadata in JSON schemas label Mar 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure sample_id is always populated #1265

Ensure sample_id is always populated #1265

jklukas commented Apr 27, 2020

Ensure sample_id is always populated #1265

Ensure sample_id is always populated #1265

Comments

jklukas commented Apr 27, 2020