Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set appropriate cache-related headers for database dumps. #1916

Closed
smarnach opened this issue Nov 22, 2019 · 2 comments
Closed

Set appropriate cache-related headers for database dumps. #1916

smarnach opened this issue Nov 22, 2019 · 2 comments
Labels
A-backend ⚙️ C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works

Comments

@smarnach
Copy link
Contributor

smarnach commented Nov 22, 2019

In #1800 we introduced database dumps that can be downloaded from https://static.crates.io/db-dump.tar.gz. The dumps are updated every 24 hours. However, CloudFront may cache them for up to 24 hours, so in the worst case users will see a new dump only shortly before the next dump is generated.

We can fix this by setting appropriate caching headers for the dump. Here are some ideas:

  • We could set an "expires" header to, say, 24.5 hours after the dump was created. This would give some wiggle room for different dump creation times, but it would ensure that the new dump will become available roughly half an hour after it was created. However, the dump frequency is configured in the Heroku scheduler, so if we decide to set a different frequency, we would need to remember to update the code as well, so we should at least introduce a command line parameter to enqueue-job if we decide to use this option. Another downside is that if a dump job fails, we will have a dump with an expiry in the past, so it won't be cached anymore.

  • It's probably possible to set the "etag" header together with a low TTL in the "cache-control" header. I believe this will result in CloudFront frequently asking S3 whether a version with a different etag is available, but it will only retransfer the dump if it has actually changed. This option has the advantage of being indepent of the dump frequency, but it needs further investigation whether things really work the way I seem to remember.

There may be other options as well – we can discuss this here on the issue.

Related: #1871, #1826, #1915

@smarnach smarnach added C-bug 🐞 Category: unintended, undesired behavior E-help-wanted P-medium A-backend ⚙️ labels Nov 22, 2019
@smarnach
Copy link
Contributor Author

smarnach commented Mar 2, 2020

@jtgeibel @carols10cents This issue may contribute to the database dumps being older than expected, which may be the root cause for the emailed report mentioned in Friday's meeting.

@Turbo87 Turbo87 removed the P-medium label Feb 11, 2021
@Turbo87 Turbo87 removed the A-s3 label Mar 11, 2021
@Turbo87 Turbo87 added C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works and removed C-bug 🐞 Category: unintended, undesired behavior E-help-wanted E-medium labels Sep 26, 2021
@Turbo87
Copy link
Member

Turbo87 commented Jun 25, 2024

we've implemented explicit cache invalidation a while ago, so this is probably not strictly needed anymore :)

@Turbo87 Turbo87 closed this as completed Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backend ⚙️ C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works
Projects
None yet
Development

No branches or pull requests

2 participants