Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHA1 -> SHA256 #816

Closed
jorangreef opened this issue Oct 12, 2016 · 67 comments
Closed

SHA1 -> SHA256 #816

jorangreef opened this issue Oct 12, 2016 · 67 comments

Comments

@jorangreef
Copy link

I have done a quick check but it looks like Yarn is using SHA1 for checksumming?

If so, I think it would be a good time to switch to SHA256.

@ljharb
Copy link

ljharb commented Oct 12, 2016

it's worth noting that SHA1 is achievable in POSIX, but SHA256 is much harder to make work cross-platform.

@Daniel15
Copy link
Member

SHA256 is much harder to make work cross-platform.

Just need to use PowerShell everywhere, it has it built-in ($hash = (Get-FileHash -Path $file -Algorithm SHA256).Hash) 😛

Node's crypto library supports SHA256, I think we'd use that.

> var crypto = require('crypto')
undefined
> crypto.createHash('sha256').update('Hello World').digest('hex');
'a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e'

@Daniel15
Copy link
Member

Looks like there's three hashing functions in the codebase:

"hash" in src/util/misc, uses SHA256
"HashStream" in src/util/crypto, uses SHA1
"hash" in src/util/crypto, takes the algorithm as an argument and defaults to MD5

hash in src/util/misc and hash in src/util/crypto should probably be unified and they should all use SHA256, I think we'd need to think about backwards compatibility though.

@sebmck
Copy link
Contributor

sebmck commented Oct 12, 2016

Thanks for the issue! The shasums we use are given to us by the npm registry so there's not much we can do about this unfortunately. 😢

@sebmck sebmck closed this as completed Oct 12, 2016
@Daniel15
Copy link
Member

@kittens - Can we compute our own hash once the file has been downloaded, and store that in the local cache?

@graingert
Copy link
Contributor

might as well jump straight to SHA3

@bestander
Copy link
Member

Now it is time to reconsider the issue.
A few problems to solve first:

  • forwards compatibility / migration path
  • speed concerns
  • should we just ignore sha1 coming from npm or use it anyway?

Maybe sha2/3 could be an additional field?

@graingert
Copy link
Contributor

graingert commented Mar 6, 2017

It looks like "hash" in src/utils/misc.js is gone now, and I can only see md5 used in:

const dest = this.config.getTemp(crypto.hash(url));

and
https://github.com/yarnpkg/yarn/blob/master/src/util/git.js#L35

@graingert
Copy link
Contributor

looks like it would be safe to change the default to sha256. I'm not sure where the yarn.lock tgz hash files are checked though.

@bestander
Copy link
Member

bestander commented Mar 6, 2017 via email

@graingert
Copy link
Contributor

@bestander where is the value from the yarn.lock file passed in?

@bestander
Copy link
Member

@zkat
Copy link
Contributor

zkat commented May 13, 2017

Hi y'all.

Just dropping in to point out that the npm registry has started serving more secure hashes with packages that have been published with it.

You'll find the new hashes in the dist.integrity field of package manifests, in the same object as dist.tarball and dist.shasum. The integrity field is a proper subresource integrity field. Please note that in order to be fully forward-compatible here, you should follow the subresource integrity spec and parse this field out, pick algorithms, etc (or just use the linked library). We will almost certainly use/abuse the ?opt syntax from the spec in the future, and eventually start shipping multi-hash strings. It's also possible we'll have multiple hashes for the same algorithm, only one of which will apply to the specific tarball being downloaded, but that's more a door to leave open in the future.

On the publishing side, you should continue generating sha1 hashes into shasum, but add an integrity field with a stronger algorithm. See https://github.com/npm/npm-registry-client/pull/157/files for the npm-registry-client implementation of it.

Right now, the registry only supports sha512 for tarballs that have been published using npm5, but we'll be generating integrity strings for all tarballs on the registry at some point in the near-ish future, so they should be more widely available then.

lmk if y'all have any questions! It's exciting to start moving on from sha1 and setting ourselves up for a smoother transition to sha3 and beyond :)

edit: p.s. I saw something about generating these from the command line. If you're relying on sha1sum for that, srisum should be a near-drop-in replacement.

@bestander
Copy link
Member

Thanks for commenting this, @zkat, very much appreciated!

@bestander
Copy link
Member

Afaik this is not a large change to do now, so a PR with a fix is welcome

@nikshepsvn
Copy link

@bestander Is this still open? I would love to give it a shot :) -- Also, I hope I'm not asking for too much, but can you give me an idea on how I should get started?

@bestander
Copy link
Member

@nikshepsvn yes it is, give it a go.

A good start would be to check where Yarn checks hashes, for example https://github.com/yarnpkg/yarn/blob/master/src/fetchers/tarball-fetcher.js#L83.
Another point - npm started returning sha256 or sha512 in package info, we probably should start using it.

@nikshepsvn
Copy link

@bestander Thank you so much, I'll do my best to understand as much as I can and get back to you with any questions -- is that okay?

@bestander
Copy link
Member

bestander commented Aug 24, 2017 via email

@ghost
Copy link

ghost commented Oct 4, 2017

Currently NPM has SHA-512 for new packages and SHA-1 for old ones.

@bestander
Copy link
Member

Well someone send us a PR

@imsnif
Copy link
Member

imsnif commented Nov 28, 2017

@BYK - So, my reasoning goes like this.

The pros of having the integrity field as a string are:

  1. yarn would be adhering to the standard: https://www.w3.org/TR/CSP2/#source-list-syntax, as linked from the sri standard: https://w3c.github.io/webappsec-subresource-integrity/#integrity-metadata-description.
  2. The yarn.lock integrity field would be 100% consistent with the package-lock.json integrity field, even going so far as using the same module for parsing/stringifying/validating. I think this is very valuable and would help protect against future changes in npm.

The cons are:

  1. A slightly less readable yarn.lock file.
  2. Potential problems with consistency and sorting.

Regarding the cons:

  1. I think this is the major rub. And I admit it's a trade-off. But on the other hand, I don't think there will be many cases in which users would need to scrutinize the hashes in the yarn.lock file, seeing as all errors that are given for inconsistencies are quite verbose. But I will of course defer to your experience in the matter.
  2. Even if providing a consistently sorted string would be an issue for the ssri module (re-reading @zkat's response, I think she mentioned the ordering of the hashes might be important - so I'd of course understand if they'd want to keep things as they are), the only thing that would change a yarn.lock in such a case is a change in the Object.keys function or in the ssri module itself (please correct me if I'm wrong and didn't think of something!) And even then, the only thing that would happen is a slight diff in the file, with all other functionality remaining intact.

In my opinion, the pros outweigh the cons here. But I will of course defer to your judgement if you disagree. :)

@BYK
Copy link
Member

BYK commented Nov 28, 2017

@imsnif I understand where you are coming from and thanks for sharing the relevant specs which helped me see a clearer picture. That said, the only reason for the large string blob concatenated with spaces seems to be the restrictions in HTML (you cannot have the same attribute name more than once in an element so when you need to express an array you either need children, which is not possible with every tag, or need a serialization method).

These are indeed a list of hash specs. If you ask me, we should denormalize this data, even more, to make it explicit which part is the hash-specifier and which part is the hash itself. I don't see any value in doing another pass of string parsing where we already have a string format data supports lists and objects.

I haven't been a part of package-lock.json file format so I cannot comment on how or why they made the decision to use a single string blob for this field but I do not see a good reason to store serialized data in an already serialized data format.

@zkat - is it possible for you to provide the reasons why you have chosen to use this field as a string blob instead of a JSON array?

@BYK
Copy link
Member

BYK commented Nov 28, 2017

I don't see any value in doing another pass of string parsing where we already have a string format data supports lists and objects.

To contradict myself, URLs themselves are serialized string blobs in an already serialized data format here but when it comes to lists, I think this makes a significant improvement in terms of reviewing changes in the lockfile. Does that make sense? :)

@imsnif
Copy link
Member

imsnif commented Nov 28, 2017

@BYK - alright, I understand the reasoning. Thanks for taking the time to explain it.

@zkat - I too would be very interested in hearing your reasoning, if you'd be willing.

Otherwise though, I'll continue working on this (I've got some other issues to get through, so this won't block me for now). For the time being though, just to make sure, our agreed solution is @zkat's suggestion:

{
    integrity: [
      'sha1-deadbeef',
      'sha1-c0ffee',
      'sha512-deadc0ffee?blah'
    ]
}

right?

@imsnif
Copy link
Member

imsnif commented Nov 28, 2017

Or, rather:

    integrity:
      sha1-deadbeef
      sha1-c0ffee
      sha512-deadc0ffee?blah

@BYK
Copy link
Member

BYK commented Nov 28, 2017

@imsnif yup :)

@graingert
Copy link
Contributor

graingert commented Nov 28, 2017

what's the behavior with two different sha1s?

@graingert
Copy link
Contributor

what's the behavior if the sha1 matches but the sha512 does not match?

@imsnif
Copy link
Member

imsnif commented Nov 28, 2017

@graingert - You raise valid and important questions, whose answers I plan to articulate with tests. So would it be fine if we go into the details after I submit the PR? I think it'll be much easier. :)
If you'd like a sneak preview, I'm validating with https://github.com/zkat/ssri (my current implementation uses integrityStream, but it's subject to change).

@zkat
Copy link
Contributor

zkat commented Nov 28, 2017

The reason for having had a string was purely to be close to the actual SRI spec. That is, providing them as single, whitespace-separated strings. I can see npm bumping the lockfileVersion in the future for the sake of making it an array, but it'll be a long time before it starts making any real impact on diffs at all.

While in theory there can be a bunch of stuff in that integrity field, in practice, it's almost always going to be a single entry (until we start upgrading algorithms or using more advanced options). The idea that it would significantly affect diffs didn't really cross my mind, cause it's a relatively stable field: if you get a diff, you'll likely get a diff for the entire entry, so you won't get individual lines for each hash anyway.

As far as ordering goes: I may have misspoken here as far as whether ordering within an algorithm matters. I think it's actually fine because individual hashes within an algorithm will just be tried until one succeeds.

When you have multiple algorithms, though, for security reasons, you must always use the strongest one that you know you can use. So if you have sha512 and sha1, you must always use sha512 and never, ever fall back to sha1. OTOH, if you see a sha3 and sha1, and you're on a version of Yarn that does not support sha3, you can use sha1. This is a future-proofing feature. You'll notice that ssri does its own algorithm ordering, and it's fairly conservative about it (again, for security reasons).

@imsnif
Copy link
Member

imsnif commented Nov 30, 2017

@BYK - slight hitch with the array approach. It is not backward compatible. Issuing the yarn command from latest with this yarn.lock file:

# THIS IS AN AUTOGENERATED FILE. DO NOT EDIT THIS FILE DIRECTLY.
# yarn lockfile v1


array-flatten@1.1.1:                                                                                                                                                                                                                           
  version "1.1.1"
  resolved "https://registry.npmjs.org/array-flatten/-/array-flatten-1.1.1.tgz#9a5f699051b1e7073328f2a008968b64ea2955d2"
  integrity:
    sha1-ml9pkFGx5wczKPKgCJaLZOopVdI=

Results in:

$ yarn
yarn install v1.3.2
error An unexpected error occurred: "Invalid value type 9:0 in /home/aram/code/yarn-test/yarn.lock".

If I am not mistaken, this is because the current lockfile format does not support arrays.

So seeing as we cannot use arrays (unless I'm missing something with the above? please do correct me as I'm fairly new to the codebase and could be overlooking something!)
And we can't use objects (because the spec allows for multiple identical keys and npm plan on using that option)... that brings us back to strings?
I know that is not an ideal solution as you mentioned, but given the circumstances - if my reasoning is correct - I think it's the best option. What say you?

Edit:
I already got the lockfile to parse/stringify arrays before thinking about this. It is not extremely difficult, but I feel it would be easier to save this breaking change for another occasion and handle all its ramifications separately.

@zkat
Copy link
Contributor

zkat commented Nov 30, 2017

Like I said above: you can totally get away with strings until such a time when you're ready for the breaking change. integrity will almost definitely only contain sha512 for at least a couple of years, is my bet. So if the issue is visual diffs, there will be no difference between the two approaches until a future where node integrates new algorithms and npm starts publishing using them (I think it'll be a while after node even adds sha3 support for npm to actually start using it -- sha512 is much more stable, implementation-wise, and there's no reason to believe it's going to be vulnerable any time soon)

@BYK
Copy link
Member

BYK commented Dec 1, 2017

@imsnif that's a bummer, for some reason I thought arrays were possible in the lockfile. Alright, let's go with the sting and see how bad it hurts. I don't like this but the only alternative without a breaking change seems to treat arrays as objects with numeric indexes which would both make the lockfile (0: sha256-xxxx) and the code using it quite weird.

@vog
Copy link

vog commented May 22, 2018

Has there been any progress on this?

Given the strong security promises on the website, it is very disappointing to see SHA-1 checksums in use.

@imsnif
Copy link
Member

imsnif commented May 22, 2018

Hey @vog - you can follow our progress here: #5042
Hoping to have this merged really soon.

@Gudahtt
Copy link
Member

Gudahtt commented Aug 19, 2018

Now that that PR has been merged, should this issue be closed?

The only missing piece now is that some dependencies (e.g. those installed from git repositories) still don't have an integrity field. npm includes an integrity field for these packages, so I'm assuming they must calculate it after downloading the package.

@zkat
Copy link
Contributor

zkat commented Aug 20, 2018

npm has not been generating integrity hashes for git dependencies for a while, because git packages can often change hashes unless devs are careful to make builds reproducible.

@rstuven
Copy link

rstuven commented Jan 8, 2020

We have computed the very first chosen-prefix collision for SHA-1. In a nutshell, this means a complete and practical break of the SHA-1 hash function, with dangerous practical implications if you are still using this hash function. To put it in another way: all attacks that are practical on MD5 are now also practical on SHA-1.

https://sha-mbles.github.io/

@graingert
Copy link
Contributor

Perhaps it would be worth using SHA1DC as a dropin replacement?

@arcanis
Copy link
Member

arcanis commented Jan 23, 2020

Closing since SHA256 support has been released more than a year ago 🙂

@arcanis arcanis closed this as completed Jan 23, 2020
@graingert
Copy link
Contributor

@arcanis how did this manage to get left open for so long in "high-priority" when it was already fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests