You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update CD crawler to allow timeToLive for queue messages to be configurable. It is currently using the default of 7 days, which is too short and results in messages getting lost on this arbitrary timeline. This impacts our internal harvester and the missing license backfill process. If this can't be fixed, the DAG will have to reduce the number of packages it sends to the harvester. This will likely slow down processing. It is currently averaging only 125k per day, but will process closer to 500k on some days. The primary driver of this is the number of files being scanned by scancode. This will require some thought into how best to keep the process running without missing packages because they get dropped off after expiring.
Rationale
The backfill DAG puts more messages on the queue than the throughput of the GH CD harvester. If these messages just drop off the queue unprocessed, then it will appear that they are indeed missing their license, which may be incorrect.
Definition of Done
There is a new config to set the expiration to use for a message and the configured expiration is seen with messages in the queue.
The text was updated successfully, but these errors were encountered:
Description
Update CD crawler to allow timeToLive for queue messages to be configurable. It is currently using the default of 7 days, which is too short and results in messages getting lost on this arbitrary timeline. This impacts our internal harvester and the missing license backfill process. If this can't be fixed, the DAG will have to reduce the number of packages it sends to the harvester. This will likely slow down processing. It is currently averaging only 125k per day, but will process closer to 500k on some days. The primary driver of this is the number of files being scanned by scancode. This will require some thought into how best to keep the process running without missing packages because they get dropped off after expiring.
Rationale
The backfill DAG puts more messages on the queue than the throughput of the GH CD harvester. If these messages just drop off the queue unprocessed, then it will appear that they are indeed missing their license, which may be incorrect.
Definition of Done
The text was updated successfully, but these errors were encountered: