Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checksums for source tarballs downloaded from github.com/.../.../archive can change over time #5151

Open
boegel opened this issue Sep 21, 2017 · 14 comments
Milestone

Comments

@boegel
Copy link
Member

boegel commented Sep 21, 2017

First reported by @schiotz at #4871 (comment), several other projects have been hit by this as well:

Some more details in libgit2/libgit2#4343 (comment)

Long story short: we should try to avoid downloading from github.com/.../.../archive (and try to use github.com/.../.../releases (or another located where 'packaged' tarballs are available) instead, if at all possible.

If not, the alternatives I see are:

  • doing a git clone on the tagged version and creating the tarball ourselves (directly using tar), which should always give the same tarball?
  • not including any checksums if we need to download from github.com/.../.../archive
@boegel
Copy link
Member Author

boegel commented Sep 21, 2017

related discussion in Homebrew: Homebrew/homebrew-core#18044

@boegel
Copy link
Member Author

boegel commented Sep 21, 2017

Interesting quote by someone at GitHub (from Homebrew/homebrew-core#18044 (comment))

Not all hashes will change. Most of the changes are due to the bugfix in git/git@22f0dcd, so only tarballs with filenames greater than 100 characters are affected.

and (from Homebrew/homebrew-core#18044 (comment))

Historically, they don't change that often ...

So short-term, we can figure out which checksums we need to fix, and then figure out a better way of dealing with this.

@boegel
Copy link
Member Author

boegel commented Sep 21, 2017

Idea by @wpoely86: we should also support --ignore-checksums and --force-download

@tgamblin
Copy link

tgamblin commented Sep 21, 2017

doing a git clone on the tagged version and creating the tarball ourselves (directly using tar), which should always give the same tarball?

Note that this doesn't work. tar is known to be nondeterministic across systems/users/etc., so you can't really rely on this for checksumming. You might be able to steal some of Debian's boilerplate from that discussion, I guess.

But really why do this? If you're already going to use git clone as your fetch mechanism, just clone at a particular commit and you're done. The commit hash is already verified by git during the clone.

@schiotz
Copy link
Contributor

schiotz commented Sep 21, 2017

When fixing checksums, perhaps leave the old ones in place as well. Otherwise all users who already have the old files in their sources folder will be hit. Here I am assuming that the sha256 of a file just has to match one of the checksums in the checksums list.

@boegel
Copy link
Member Author

boegel commented Sep 21, 2017

@schiotz You mean, as a comment? EasyBuild currently doesn't support specifying multiple alternatives for checksums (since it doesn't make much sense in general).

@schiotz
Copy link
Contributor

schiotz commented Sep 21, 2017

@boegel No, I had misunderstood how EB does it. The checksums is a list, and I just thought that all source files needed their checksum somewhere on that list, but that extra checksums would be ignores.

@schiotz
Copy link
Contributor

schiotz commented Sep 22, 2017

@boegel Follow-up to my previous comment: You are going to be hit by some people downloading the new tarballs, some people having the old ones cached.

If EB cannot support alternate checksums, then maybe it should react to a wrong checksum by attempting a new download and checking again. That may however be a non-trivial change, if the download and the checksum check are well-separated in the code.

It might be easier to change the checksum code, so it does not assume a specific order of the checksums but just checks that the checksums of source files are on the list. I cannot see any risk in this, since the space of sha256 checksum is beyond astronomical. There could be a minor performance issue if there are packages with hundreds of source files, but casting the checksum list to a set should fix that.

@boegel
Copy link
Member Author

boegel commented Sep 22, 2017

I've opened a PR to fix the broken checksums that I could find in #5162.

@schiotz I'm not sure if keeping multiple checksums around for a single source file is a good idea in the long run...
I think it may lead to confusion later, and it will result in needless additional bookkeeping.

Also, it makes it significantly more difficult to trust contributors and make sure they don't (by accident or willingly) add checksum alternatives which validate source tarballs that are somehow malicious (see for example https://mail.python.org/pipermail/security-announce/2017-September/000000.html).

We should just bite the bullet and replace the checksums, and force people to re-download the sources; the new --force-download option that has been implemented in easybuilders/easybuild-framework#2313 is going to be a big help in dealing with this...

@fgeorgatos
Copy link
Contributor

2 comments on this one (interesting thread btw):

  • would we like autogenerating alt-downloads of the style filename@hash ?! (not as default, as an option)
  • which would be the right place to report somewhere upstreadm (gihub?/git?) the challenges?

@fgeorgatos
Copy link
Contributor

fgeorgatos commented Sep 25, 2017

hm, some more info on this case!
a) this is not the first time we've reported tar pax issues b) see the original comment on the git tree

@boegel
Copy link
Member Author

boegel commented Sep 20, 2018

closing for now, doesn't seem to be an issue anymore for the time being...

@boegel boegel closed this as completed Sep 20, 2018
zklaus pushed a commit to zklaus/antlr-feedstock that referenced this issue Sep 17, 2020
Unfortunately, checksums for tarballs in the github archive are not stable.
It seems that the checksum for this tarball has changed.
See also eg. easybuilders/easybuild-easyconfigs#5151
netgate-git-updates pushed a commit to pfsense/FreeBSD-ports that referenced this issue Apr 27, 2021
It seems that github again delivers different archives.
Therefor the checksums changed.
It is maybe required to download the files from
github.com/.../release/...

As reference I found many reported problems:
spack/spack#5411
easybuilders/easybuild-easyconfigs#5151
libgit2/libgit2#4343

As the archives are generated on the fly, they can change at any time.
PR:		255423
Reported by:	lysfjord.daniel@smokepit.net, pkg-fallout
@boegel
Copy link
Member Author

boegel commented Jan 31, 2023

Re-opening this, since the problem has re-emerged, see https://github.blog/changelog/2023-01-30-git-archive-checksums-may-change/

We should probably look into a different way of determining checksums, based on the unpacked sources, like Nix does - see https://nixos.wiki/wiki/Nix_Hash

@boegel
Copy link
Member Author

boegel commented Jan 31, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants