Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure downloads are correct via checksums #6773

Merged
merged 3 commits into from
May 14, 2014
Merged

Conversation

staticfloat
Copy link
Member

This commit adds in SHA512 and MD5 checksums for all dependencies I could find. I'd appreciate some testing, (easiest way is to mess up the checksums put in deps/deps_checksums.{sha512,md5} and then rebuild the dependency whose checksum you altered) to ensure both that normal compilation continues to work, as well as altered files don't allow compilation to continue.

I tested on OSX and Linux, but didn't test on Windows or Windows cross-compile.

This PR is in reference to #6558

@tkelman
Copy link
Contributor

tkelman commented May 10, 2014

Needs updating for just-bumped Arpack version

@staticfloat
Copy link
Member Author

@tkelman thanks!

@tkelman
Copy link
Contributor

tkelman commented May 10, 2014

Windows is using a different version of OpenBlas, needs another checksum entry?

ERROR: SHA512 checksum failure on openblas-v0.2.9.rc1.tar.gz, should be:
but is actually:
db1fa3ac4572869dbdf4c887fa6112419464accbfb311d9c9ca8bcde8882208001963dbbeb6afa350ca0d72c13fdda0ab7c6bc464de93d9af0889a0b361c0f97

@ivarne
Copy link
Member

ivarne commented May 10, 2014

I really like how you store the hashes in a simple and flexible format. It is an advantage that one can add hashes for new releases, before deciding to actually bump the version for everybody. I still wonder if keeping everything in Versions.make would be a better idea. Many files to update makes things harder to do correctly.

We should probably have some documentation somewhere. This is likely to cause failures, so it would be nice if people unfamiliar with grep/ack would be able to figure out what is going on. The error message should probably also have some borders/ascii art to make it easier to spot, and give some debugging steps. Not everybody bothers to ask for help, and it would be sad if this feature make people turn their back on Julia.

@staticfloat
Copy link
Member Author

Alright. I think I've managed to make this a little more human-friendly, added in OpenBLAS 0.2.9-rc1, and addressed all of the comments so far. Note that we only interrupt the build if we have a ground truth hash stored, and it doesn't match the downloaded file. If we don't have a hash stored, then the compilation will just continue with a little warning.

@@ -0,0 +1,98 @@
# This is the checksums file, lines starting with "#" are ignored as comments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file stores checksums for the automatically downloaded dependencies, in an attempt to ensure that the correct file has been downloaded and that it has not been tampered with. deps/jlchecksum contains the code for the actual verification.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

@StefanKarpinski
Copy link
Member

I would propose for discussion another format: using a directory structure. Something like this:

deps/checksums/
  libfoo-1.2.3.tar.bz2/
    sha512 => contains SHA512 hash of libfoo-1.2.3.tar.bz2
    md5 => contains MD5 hash of libfoo-1.2.3.tar.bz2
  libbar-3.2.1.tar.gz/
    sha512 => contains SHA512 hash of libbar-3.2.1.tar.gz
    md5 => contains MD5 hash of libbar-3.2.1.tar.gz

This way there's nothing to parse – just read and chomp the appropriate file. If no hash file exists for a given file/hash pair, we could automatically generate it when building on a system that has the desired hashing

@staticfloat
Copy link
Member Author

Ah, that does seem like a better way to do things. I'll do this later, when I have some spare time again. :)

@StefanKarpinski
Copy link
Member

... program and has downloaded the file, checked the other hashes, and built it's dependent targets successfully (this would prevent accidentally saving hashes for corrupted downloads). That way new files get hashes generated automatically without additional steps.

@staticfloat
Copy link
Member Author

I don't think automatically generating hashes is a good idea. I think it'd be a good idea to have a script that can spit out the checksums/<filename>/{md5,sha512} files, but I wouldn't create those files without a developer specifically asking for it. Something like jlchecksum --create <filename> to generate the hashes for <filename>.

@ivarne
Copy link
Member

ivarne commented May 10, 2014

Preferably the hashes should come from the project that released the dependency. Automatically generating the files would be pretty convenient though, and not all dependencies publish hashes.

@StefanKarpinski
Copy link
Member

Maybe. I figure that the first time a dependency gets downloaded and built, we should checksum it. If that download was spoofed, they get a wrong checksum and others who get the real version won't be able to build it and this will quickly be discovered. Realistically, if you download a new version of some dependency and build it and it seems to work, are you going to do anything further to verify it? The only thing that's actually safer would be to require starting with some manually provided checksum from the download source or something like that. But is that really more likely to be secure? If someone is intercepting your download request, aren't they likely to also make the checksum match? Automatically hashing the first version means that only the first download is vulnerable to attack, which is a very small surface as compared to every single download ever being vulnerable to attack. It seems to me that this protection isn't really about preventing someone from spoofing all downloads of a dependency but about preventing them from spoofing some of the downloads without being detected. Automatically hashing the first version still accomplishes that.

@staticfloat
Copy link
Member Author

@StefanKarpinski alright, you convinced me. ;)

@nalimilan
Copy link
Member

Have considered making the error messages more user-friendly? For now it's OK, but in the long-term they will most commonly be printed to users with network issues (proxy...), and they won't understand what the checksum problem means. This was asked on the R mailing list recently.

Something like: "Downloaded file checksum is incorrect. Check your network connection, especially network proxies." would be nice (realistically, while I agree an improved security is very important, in practice it won't be that common -- because of the presence of the check).

@tkelman
Copy link
Contributor

tkelman commented May 11, 2014

In case anyone was curious, this branch works fine for from-scratch builds on Windows, both MSYS2 and Cygwin-to-MinGW cross. Also works fine in RHEL5 which is old enough that it doesn't have shasum.

(P.S anyone want to return the testing favor and let us know if JuliaMath/openlibm#56 works on Mac and 32 bit Linux? 😉 would be nice to get the Windows build passing tests again, it's been nearly a week )

@tkelman tkelman mentioned this pull request May 11, 2014
@staticfloat
Copy link
Member Author

Alright. Anything else to improve on this, or does this satisfy everything?

@ivarne
Copy link
Member

ivarne commented May 12, 2014

How does it work with the make sourcedist target?

@staticfloat
Copy link
Member Author

It should "just work", since it does all the checking in the configure targets, and so doesn't care that the files came from someplace other than jldownload; it will check each tarball before extracting it.

@ivarne
Copy link
Member

ivarne commented May 12, 2014

Nice!

@staticfloat
Copy link
Member Author

Hah, make source-dist wasn't even working properly. Fixed now, and verified that jlchecksum works just fine with it.

Dependencies will be checked at extraction-time, and have their checksums stored
in deps/checksums/. The deps/jlchecksum command tests a given filename against the
checksums stored in that file, supporting MD5 and SHA512 checksums for all
recorded files.  SHA512 is preferred, with MD5 available as a fallback in the
case that `shasum` is not available.  If the hashes do not match, an error is
thrown and compilation is halted.  If a file is asked for and no hash has been
given for that file, the hash is generated for both SHA512 and MD5, and stored
@nalimilan
Copy link
Member

Fine with me with the improved error message!

@staticfloat
Copy link
Member Author

@StefanKarpinski are there any other comments on this, or am I free to merge?

@StefanKarpinski
Copy link
Member

No, this looks awesome.

staticfloat added a commit that referenced this pull request May 14, 2014
Ensure downloads are correct via checksums
@staticfloat staticfloat merged commit 65ff846 into master May 14, 2014
@staticfloat
Copy link
Member Author

Bam. Another bug bites the dust.

@StefanKarpinski StefanKarpinski deleted the sf/depchecksums branch May 14, 2014 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
building Build system, or building Julia or its dependencies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants