Allow multiple prefixes in aggregation jobs #67

wojtek-rybak · 2024-07-19T14:35:45Z

According to the API specification, when creating a job, we must provide the bucket name and the prefix of the files with aggregatable reports to be included in the aggregation. Since I want to perform the aggregation every hour, it seems necessary to have a separate prefix for each hour. For example:

/data/2024-07-19/00/...
/data/2024-07-19/01/...
/data/2024-07-19/02/...
/data/2024-07-19/03/...
etc.

However, if I need to perform an aggregation over a 6-hour interval (using different filtering id), I encounter a problem. The API only allows one prefix, which means I would need to copy the data to a new location. This approach seems impractical and inefficient.

It would be highly beneficial if the aggregation service could accept a list of prefixes. This change would allow more flexibility in specifying the data intervals for aggregation without needing to duplicate data.

nlrussell · 2024-08-23T17:43:01Z

Hi @wojtek-rybak, thanks for providing this feedback. Can you say more about how costly it is to do this and just how much of a blocker this is, so we can consider that information in determining the priority of this request?

wojtek-rybak · 2024-09-04T06:22:16Z

Hi @nlrussell

At RTB House, we are currently in the testing phase, working with a small amount of data from a subset of users, which amounts to tens of gigabytes per day. At this stage, the issue described is only a minor inconvenience.

However, we plan to start working on the final, production-ready solution around early October. For that phase, we estimate processing tens of terabytes of data per day. It would be highly beneficial if the feature allowing for multiple prefixes in the aggregation service could be added by then. This addition would enable us to avoid the need for data copying in our final design.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow multiple prefixes in aggregation jobs #67

Allow multiple prefixes in aggregation jobs #67

wojtek-rybak commented Jul 19, 2024

nlrussell commented Aug 23, 2024

wojtek-rybak commented Sep 4, 2024

Allow multiple prefixes in aggregation jobs #67

Allow multiple prefixes in aggregation jobs #67

Comments

wojtek-rybak commented Jul 19, 2024

nlrussell commented Aug 23, 2024

wojtek-rybak commented Sep 4, 2024