Set appropriate cache-related headers for database dumps. #1916
Labels
A-backend ⚙️
C-enhancement ✨
Category: Adding new behavior or a change to the way an existing feature works
In #1800 we introduced database dumps that can be downloaded from https://static.crates.io/db-dump.tar.gz. The dumps are updated every 24 hours. However, CloudFront may cache them for up to 24 hours, so in the worst case users will see a new dump only shortly before the next dump is generated.
We can fix this by setting appropriate caching headers for the dump. Here are some ideas:
We could set an "expires" header to, say, 24.5 hours after the dump was created. This would give some wiggle room for different dump creation times, but it would ensure that the new dump will become available roughly half an hour after it was created. However, the dump frequency is configured in the Heroku scheduler, so if we decide to set a different frequency, we would need to remember to update the code as well, so we should at least introduce a command line parameter to
enqueue-job
if we decide to use this option. Another downside is that if a dump job fails, we will have a dump with an expiry in the past, so it won't be cached anymore.It's probably possible to set the "etag" header together with a low TTL in the "cache-control" header. I believe this will result in CloudFront frequently asking S3 whether a version with a different etag is available, but it will only retransfer the dump if it has actually changed. This option has the advantage of being indepent of the dump frequency, but it needs further investigation whether things really work the way I seem to remember.
There may be other options as well – we can discuss this here on the issue.
Related: #1871, #1826, #1915
The text was updated successfully, but these errors were encountered: