Collecting input: validating releases and improving cache #2375

jbergstroem · 2020-07-03T01:53:16Z

I'm looking at how we can improve our end to end storage of distfiles. Part of that also touches our release process, pushing distfiles and making these public (in the eyes of a user as well as cache).

To extend my train of thought a bit; I want our end storage to be something a bit more distributed as well as "colder"; be it GCS or s3-like. This way, we can offload a lot of traffic from our server(s) as well as improve resiliency when catering to the majority of our bandwidth.

As far as I understand, we invalidate cache (read: changing user expectations) as part of release processes. One reason seems to be that some releases needs to be re-baked. What other reasons do we have? Incomplete releases perhaps?

jbergstroem · 2020-07-03T01:54:56Z

My first idea was to have any ci output with destination "dist" landing in a staging folder. Before passing onto cold storage (read: never write twice, never delete) there would be some kind of validation. Preferably automatic (checksums, extracting, binary checks, ..) or alternatively manual since parts of our release is already intentionally manual.

Based on the input I collect this is subject to change.

rvagg · 2020-07-03T02:11:29Z

We invalidate cache because brute-force "invalidate all" is the best tool we have and when we publish new website updates and directory indexes we need those updates to be pushed out to the edges and not stay stale. CF has more fine-grained controls now but I think it means tagging everything into groups with nginx so we can invalidate just certain things. That work hasn't been done because ... it's work, and it's not simple.

jbergstroem · 2020-07-03T02:30:41Z

We invalidate cache because brute-force "invalidate all" is the best tool we have and when we publish new website updates and directory indexes we need those updates to be pushed out to the edges and not stay stale.

I usually read this as "I don't know what I'm changing". My dream scenario would be to understand these changes and introduce a flow where we can replace "restarting windows" with "killall Finder".

CF has more fine-grained controls now but I think it means tagging everything into groups with nginx so we can invalidate just certain things.

Cache tags is one solution. The landscape has changed a bit so I would say we have more options now.

Discovering all exceptions would probably help us make the best decision and Make It So.

rvagg · 2020-07-03T02:40:09Z

Right, it's always been a crappy setup but originally we didn't have much choice and as options started appearing they were just too complicated and time consuming to implement. We have two broad cases which overlap:

website redeploy - all of the stuff we serve as website content would need to be invalidated, or magically just the things that have changed if that's possible. There might be flexibility on urgency of this but I would expect that people deploying new versions of the website would be surprised if their changes didn't show up straight away so I'm not sure relying on a timed cache invalidation is acceptable. This content is under /en/ etc. and we should be able to isolate it because of the localisation setup and because in nging pretty much everything that's not this website content is pulled in as aliases.
releases - nightlies, tests, rcs and actual releases. We want to invalidate the various directory indexes (/dist/, /download/*/, /docs/, others?) so the new items show up. index.json and index.tab in the respective release type directory needs invalidation. And, when there's a new /release/ build, we have to rebuild the website again to get it properly listed on the front page and on the downloads page. Although we have a separate trigger for website rebuild so this step could be rolled into the website redeploy step above. There's also /api/ which probably needs invalidation on new /release/s.

The tragedy of our current situation is all of the download binaries getting invalidated. Even if we could just say "any .tar.?z, .exe, .msi, .etc should never be invalidated, you can have them for as long as you like" then it'd be a massive step forward.

jbergstroem · 2020-07-03T02:46:46Z

Even if we could just say "any .tar.?z, .exe, .msi, .etc should never be invalidated, you can have them for as long as you like" then it'd be a massive step forward.

This is also what I find being the biggest win (hence #2376).

And, when there's a new /release/ build,

I don't remember this - can you share a link?

[paraphrase] Generic indexes and front pages

Doable; but ultimately not scalable hand-tooling. We can set timeouts from nginx based on file-types and do exceptions for indexes. The maintainable solution here is usually having a caching server in front that more easily can express logic for these cases; be it ATS or varnish (or cloudflare or fastly). If we split out downloads, we don't have to change much of this behavior for now since cache populates quickly.

rvagg · 2020-07-03T02:56:41Z

And, when there's a new /release/ build,

I don't remember this - can you share a link?

what I mean is that "release" type builds are special cases of all the types. nightly, test and rc just get auto-promoted and they don't interact with the rest of the website at all, they're just a new subdirectory, a change to index.json and index.tab in their respective parent directories and a refresh of that parent directory index. For release builds that are either the current "current" line or the current "lts" line, the main website page gets updated to list it, https://nodejs.org/en/download/ (and friends) gets updated to show it.

We run this in crontab every 5 minutes: https://github.com/nodejs/build/blob/master/ansible/www-standalone/resources/scripts/check-build-site.sh which checks the release build index.tab with website index.html. If the former is newer then it needs a website rebuild.

This is also very relevant, and also in need of attention if you're keen for something to chew on: #2123 (comment)

jbergstroem · 2020-07-03T04:24:24Z

This is also very relevant, and also in need of attention if you're keen for something to chew on: #2123 (comment)

This seems highly relevant to #2359 as well.

github-actions · 2021-04-30T00:54:00Z

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

jbergstroem added help wanted question cdn changes or problems related to our cdn: cloudflare labels Jul 3, 2020

jbergstroem mentioned this issue Jul 31, 2020

Manually flush cloudflare cache on release? #2396

Closed

github-actions bot added the stale label Apr 30, 2021

github-actions bot closed this as completed May 30, 2021

richardlau mentioned this issue Mar 14, 2023

SLOW OR FAILED (500 ERROR) NODE.JS DOWNLOADS nodejs/nodejs.org#4495

Closed

nschonni mentioned this issue Jul 14, 2023

Redesign Cloudflare cache purging #3410

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collecting input: validating releases and improving cache #2375

Collecting input: validating releases and improving cache #2375

jbergstroem commented Jul 3, 2020

jbergstroem commented Jul 3, 2020 •

edited

Loading

rvagg commented Jul 3, 2020

jbergstroem commented Jul 3, 2020

rvagg commented Jul 3, 2020

jbergstroem commented Jul 3, 2020

rvagg commented Jul 3, 2020

jbergstroem commented Jul 3, 2020

github-actions bot commented Apr 30, 2021

Collecting input: validating releases and improving cache #2375

Collecting input: validating releases and improving cache #2375

Comments

jbergstroem commented Jul 3, 2020

jbergstroem commented Jul 3, 2020 • edited Loading

rvagg commented Jul 3, 2020

jbergstroem commented Jul 3, 2020

rvagg commented Jul 3, 2020

jbergstroem commented Jul 3, 2020

rvagg commented Jul 3, 2020

jbergstroem commented Jul 3, 2020

github-actions bot commented Apr 30, 2021

jbergstroem commented Jul 3, 2020 •

edited

Loading