Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: remove .gz downloads for some platforms #1584

Closed
rvagg opened this issue Nov 21, 2018 · 22 comments
Closed

Proposal: remove .gz downloads for some platforms #1584

rvagg opened this issue Nov 21, 2018 · 22 comments
Labels

Comments

@rvagg
Copy link
Member

rvagg commented Nov 21, 2018

I'd really like to explore ways to simplify our deliverables. The complexity of our infra constantly gets us into trouble.

The Linux kernel stopped offering .gz downloads long ago, so perhaps its time to follow suit? Unfortunately xz isn't on many minimal distributions, like some popular Docker images. e.g. it's not on any debian or ubuntu images, but it is on centos:7. Of course it's a pretty trivial install almost everywhere. But tar does come with everything and -J is natively supported so I'm pretty sure where you have a modern distro, you can unpack .tar.xz files. On alpine:latest you can even do wget https://nodejs.org/download/release/latest-v11.x/node-v11.2.0-linux-x64.tar.xz -O - | tar -Jxv without anything extra installed, none of the Ubuntu or Debian images come with either curl or wget so you have to install stuff anyway.

AIX and macOS are the awkward platforms for .xz files. We don't even offer .xz for AIX and you have to install a third-party package to even use them on macOS. So we could keep offering them there.

Headers are also awkward because they're consumed across all platforms and gzip is easy everywhere. node-gyp even consumes them in native Node with just a gzip stream. Full source packages are similar. So we could keep headers and source as both.

Here's our current list of deliverables:

node-v11.2.0-aix-ppc64.tar.gz
node-v11.2.0-darwin-x64.tar.gz
node-v11.2.0-darwin-x64.tar.xz
node-v11.2.0-headers.tar.gz
node-v11.2.0-headers.tar.xz
node-v11.2.0-linux-arm64.tar.gz
node-v11.2.0-linux-arm64.tar.xz
node-v11.2.0-linux-armv6l.tar.gz
node-v11.2.0-linux-armv6l.tar.xz
node-v11.2.0-linux-armv7l.tar.gz
node-v11.2.0-linux-armv7l.tar.xz
node-v11.2.0-linux-ppc64le.tar.gz
node-v11.2.0-linux-ppc64le.tar.xz
node-v11.2.0-linux-s390x.tar.gz
node-v11.2.0-linux-s390x.tar.xz
node-v11.2.0-linux-x64.tar.gz
node-v11.2.0-linux-x64.tar.xz
node-v11.2.0.pkg
node-v11.2.0-sunos-x64.tar.gz
node-v11.2.0-sunos-x64.tar.xz
node-v11.2.0.tar.gz
node-v11.2.0.tar.xz
node-v11.2.0-win-x64.7z
node-v11.2.0-win-x64.zip
node-v11.2.0-win-x86.7z
node-v11.2.0-win-x86.zip
node-v11.2.0-x64.msi
node-v11.2.0-x86.msi

We could trim that down to:

node-v11.2.0-aix-ppc64.tar.gz
node-v11.2.0-darwin-x64.tar.gz
node-v11.2.0-darwin-x64.tar.xz
node-v11.2.0-headers.tar.gz
node-v11.2.0-headers.tar.xz
node-v11.2.0-linux-arm64.tar.xz
node-v11.2.0-linux-armv6l.tar.xz
node-v11.2.0-linux-armv7l.tar.xz
node-v11.2.0-linux-ppc64le.tar.xz
node-v11.2.0-linux-s390x.tar.xz
node-v11.2.0-linux-x64.tar.xz
node-v11.2.0.pkg
node-v11.2.0-sunos-x64.tar.xz
node-v11.2.0.tar.gz
node-v11.2.0.tar.xz
node-v11.2.0-win-x64.7z
node-v11.2.0-win-x64.zip
node-v11.2.0-win-x86.7z
node-v11.2.0-win-x86.zip
node-v11.2.0-x64.msi
node-v11.2.0-x86.msi

Thoughts @nodejs/build & @nodejs/version-management?

(edit: to be clear, this would be something we do from a particular version forward, not impacting existing release lines. So maybe Node 12+).

@refack
Copy link
Contributor

refack commented Nov 21, 2018

+1 as a server major change

1 similar comment
@gdams
Copy link
Member

gdams commented Nov 21, 2018

+1 as a server major change

@refack
Copy link
Contributor

refack commented Nov 21, 2018

node-v11.2.0-win-x64.7z
node-v11.2.0-win-x64.zip
node-v11.2.0-win-x86.7z
node-v11.2.0-win-x86.zip

Since we're revamping the list, for Windows we could create "self-extracting" 7z archives (add -sfx to command line), and eliminate the *.zip (this files can be executed or extracted with a client)

P.S. if we switch from .msi to NSIS, those files can be easily extracted without running the installer logic, and/or we could add a "just extract" option to the installer.

@ljharb
Copy link
Member

ljharb commented Nov 21, 2018

How does removing an archive format for “not all” platforms reduce complexity?

@refack
Copy link
Contributor

refack commented Nov 21, 2018

How does removing an archive format for “not all” platforms reduce complexity?

I was just about to ping you to ask that.
Well, as I see it it reduces complexity for us, and for users who pick a package from the list. But yeah it pushed that complexity to the tools (sorry).

image

@ljharb
Copy link
Member

ljharb commented Nov 21, 2018

When you say “us”, I’m not sure who you mean - don’t the tools manage creating those archives in the first place?

@refack
Copy link
Contributor

refack commented Nov 21, 2018

When you say “us”,

I meant the BuildWG (and maybe the Releasers). We would have less need to make sure tools are available and properly configured on all platforms, and that pack & deploy scripts are correct and run to completion for less artifacts.

P.S. we could also delegate GZIP compression to the HTTP layer, since AFAIK all (reasonable) HTTP clients have builtin GZIP decompression built in, and if not the protocol degrades gracefully (for example that's what S3 does).

@refack
Copy link
Contributor

refack commented Nov 21, 2018

P.P.S. We could also go for the lower common denominator and pull the .xz and .7z packs ¯_(ツ)_/¯

@rvagg
Copy link
Member Author

rvagg commented Nov 21, 2018

or we could just offer one compressed bundle per platform, ditch xz on darwin, ditch 7z on windows, only offer .gz source and headers, it'd be really nice to have fewer items to check to confirm a valid release and fewer requirements for a release build machine

@ljharb
Copy link
Member

ljharb commented Nov 22, 2018

I guess I'm unclear on why all of this isn't a single automated tool, such that it's any easier to check 1 format over 50.

@refack
Copy link
Contributor

refack commented Nov 22, 2018

I guess I'm unclear on why all of this isn't a single automated tool, such that it's any easier to check 1 format over 50.

We have 10 ~20 platforms, each one with it's own edges. Just this week we hit one of those edges, and it's always at the 11th hour.
I think what @rvagg is trying to raise is the question of the ROI of this.
Maybe we should looks at some download stats.

@ljharb
Copy link
Member

ljharb commented Nov 22, 2018

Certainly if the cost of a certain format and/or platform is too high, then it's reasonable to explore removing it - but I'd hope that prior to doing so, all efforts to automate it would be exhausted (or deemed impractical).

@refack
Copy link
Contributor

refack commented Nov 22, 2018

but I'd hope that prior to doing so, all efforts to automate it would be exhausted (or deemed impractical).

Ack.

@rvagg
Copy link
Member Author

rvagg commented Nov 22, 2018

This isn't going to be a massive reduction in complexity, fairly minimal in the scheme of things, but it's directional. From the beginning we've expanded our complexity rapidly because we can, some of it for the challenge or apparent convenience, a lot of it just because we can and have the resources. But then when the crunch comes we feel it. The Build WG is regularly on the receiving end of criticism because when stuff doesn't work, everyone feels it. But of course when stuff's working it's taken for granted. The breakage happens because of complexity, I suspect if we did a root cause analysis of our failures in the last couple of years we'd blame complexity for most of them.

So what I'm proposing here is not a huge bite out of complexity, it's a slight reduction but it gets us in the right direction. Lots of small steps like this, and a bit of resistance to expanding complexity will go a long way to making our resources (infra and people) more resilient.

Cost/benefit critique of this proposal in isolation is welcome of course.

@rvagg
Copy link
Member Author

rvagg commented Nov 22, 2018

Oh, and I forgot to say also that I consider calls for getting the Foundation to pay people to maintain our infrastructure as a sign that we've bitten off more than we can chew and we should scale back. If we've really grown that complex that we need to pay people to maintain it then we're doing it wrong.

@richardlau
Copy link
Member

P.P.S. We could also go for the lower common denominator and pull the .xz and .7z packs ¯_(ツ)_/¯

My personal view is that if complexity is the driving reason then it makes more sense to drop the xz packages (given that it isn't everywhere by default and has extra logic in the makefiles that we are proposing to add on top of).

On Windows I would keep the zip packages as explorer can handle that (i.e. the users don't have to install 7zip to unpack). In theory we could even create the zip packages using PowerShell (meaning we could drop 7zip as a build requirement at the cost of PowerShell complexity). Self-extracting archives (which we don't currently do) raises more complexities (we'd probably have to sign them for them to be trusted).

Of course this means uses less efficient compression formats/algorithms (which is why I think we're using xz and 7z in the first place).

@rvagg
Copy link
Member Author

rvagg commented Nov 22, 2018

Of course this means uses less efficient compression formats/algorithms (which is why I think we're using xz and 7z in the first place).

Not entirely. tar.xz has been the new emerging standard for the last 4 or so years, mainly on Linux, and these kinds of changes take a long time. So it's two-pronged, much smaller files and hopping on an emerging standard bandwagon.

And, since it's baked into tar, it technically is everywhere on Linux now, at least from a consumption perspective.

The 7z issue is a different matter entirely. TBH I'd be happy to drop it because it's not native and doesn't appear to be likely to be native. I'm not a user of either of the zip or 7z files in any capacity though so I don't feel I have much authority here. I haven't run any numbers on how much its downloaded, perhaps it's really popular 🤷‍♂️.

@mhdawson
Copy link
Member

@ljharb from my viewpoint even though the generation distribution is automated, the issue is handling the cases when that automation fails. The more files, the higher the possibility for transfer failures etc that have to be investigated fixed then failures occur.

I think the key question is what the impact will be to end users if we remove *.gz files and as long as we think that will be limited (which it sounds like is the case) then its worth having less files to generate/transfer. Overall I'm +1 on a SemVer boundary.

@ljharb
Copy link
Member

ljharb commented Nov 22, 2018

@mhdawson when you say “transfer failures”, is resumable rsync and checking file signatures not available? I’d assume a tool could keep hammering until each file was delivered without human intervention. In general i understand the point, but I’d contrast the effort to fix an occasional failure with the ability for some node users to directly use the smallest archive format they support ¯\_(ツ)_/¯

@refack
Copy link
Contributor

refack commented Nov 22, 2018

The 7z issue is a different matter entirely. TBH I'd be happy to drop it because it's not native and doesn't appear to be likely to be native. I'm not a user of either of the zip or 7z files in any capacity though so I don't feel I have much authority here. I haven't run any numbers on how much its downloaded, perhaps it's really popular 🤷‍♂️.

IMHO the "built in" zip in windows is a toy (in my machine-bootstrap-script it's one of the first things I uninstall). And 7z.exe is emerging as the non-default-gold-standard. And in that sense it's a great tool to have on our CI in general to compress artifacts for publishing and for multi-phased jobs.
But if we had to choose between the two, I'm go with the .zip files.

I do like the dual nature of self-extracting-files (which could include a setup script) or NSIS installers (which are trivially extractable as archives). So we could sign it as well, as we already have to have signing in our pipeline.

@refack
Copy link
Contributor

refack commented Nov 22, 2018

@mhdawson when you say “transfer failures”, is resumable rsync and checking file signatures not available? I’d assume a tool could keep hammering until each file was delivered without human intervention. In general i understand the point, but I’d contrast the effort to fix an occasional failure with the ability for some node users to directly use the smallest archive format they support ¯_(ツ)_/¯

@ljharb I think the problem we're having boils down (again) to our limited resources. The tools and scripts you're describing exists, and time was invested in developing and even improving them, but they are still not good enough ¯_(ツ)_/¯

@github-actions
Copy link

github-actions bot commented Mar 6, 2020

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is remove or a comment is made.

@github-actions github-actions bot added the stale label Mar 6, 2020
@github-actions github-actions bot closed this as completed Apr 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants