Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a custom SHA1 digest implementation to no longer depend on the digest gem before we know which version to activate #4989

Merged

Conversation

casperisfine
Copy link

@casperisfine casperisfine commented Oct 15, 2021

Ref: https://bugs.ruby-lang.org/issues/17873

Ruby 3.1. is making digest a default gem, because of this if bundler
depends on it, it might cause already activated issues.

This currently happens if your Gemfile contains a git gem, because
the GitSource makes an SHA1 of the repository URL.

What was the end-user or developer problem that led to this PR?

Currently it is very complicated if not impossible to use a gem that has digest in its dependencies if you also have a gitst gem.

source "https://rubygems.org"

gem "net-imap"
gem "bootsnap", github: "Shopify/bootsnap"
$ ruby -v 
ruby 3.1.0dev (2021-10-06T06:42:37Z master d53493715c) [x86_64-darwin20]
$ bundle
Using msgpack 1.4.2
Using bootsnap 1.9.1 from https://github.com/Shopify/bootsnap.git
Using bundler 2.2.25
Using digest 3.0.0
Using io-wait 0.1.0
Using timeout 0.2.0
Using net-protocol 0.1.1
Using strscan 3.0.0
Using net-imap 0.2.2
Bundle complete! 2 Gemfile dependencies, 9 gems now installed.
Use `bundle info [gemname]` to see where a bundled gem is installed.
$ bundle exec ruby -r bundler/setup -r digest -e 'p :hello'
gems/bundler-2.2.25/lib/bundler/runtime.rb:309:in `check_for_activated_spec!': You have already activated digest 3.1.0.pre2, but your Gemfile requires digest 3.0.0. Prepending `bundle exec` to your command may solve this. (Gem::LoadError)
	from gems/bundler-2.2.25/lib/bundler/runtime.rb:25:in `block in setup'
	from gems/bundler-2.2.25/lib/bundler/spec_set.rb:136:in `each'
	from gems/bundler-2.2.25/lib/bundler/spec_set.rb:136:in `each'
	from gems/bundler-2.2.25/lib/bundler/runtime.rb:24:in `map'
	from gems/bundler-2.2.25/lib/bundler/runtime.rb:24:in `setup'
	from gems/bundler-2.2.25/lib/bundler.rb:149:in `setup'
	from gems/bundler-2.2.25/lib/bundler/setup.rb:10:in `block in <top (required)>'
	from gems/bundler-2.2.25/lib/bundler/ui/shell.rb:136:in `with_level'
	from gems/bundler-2.2.25/lib/bundler/ui/shell.rb:88:in `silence'
	from gems/bundler-2.2.25/lib/bundler/setup.rb:10:in `<top (required)>'
	from <internal:/opt/rubies/3.1.0-dev/lib/ruby/3.1.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
	from <internal:/opt/rubies/3.1.0-dev/lib/ruby/3.1.0/rubygems/core_ext/kernel_require.rb>:85:in `require'

What is your fix for the problem, implemented in this PR?

I found https://github.com/Solistra/ruby-digest which is under public domain and could be vendored. It conveniently implement the 3 hash algorithms used by bundler.

This is a rough first draft, it could certainly be cleaned a bit more.

@deivid-rodriguez how does this sound?

@casperisfine casperisfine force-pushed the fix-digest-dependency branch 4 times, most recently from 7aa6a43 to e4dcb33 Compare October 15, 2021 10:52
@deivid-rodriguez
Copy link
Member

@casperisfine Yes, the explanation of the issue makes sense and the solution seems like the way to go.

The scary part is to start depending on a bunch of code from a random public repository. But as long as we properly audit the code, it should be fine. Did you do that?

As a side note, we could also workaround the issue by using OpenSSL::Digest instead of Digest directly, thanks to openssl not activating digest as a gem. But at ruby/digest@1617261#commitcomment-57698673 we agreed that was an openssl issue, which has already been fixed at ruby/openssl#463. So it would still be an issue if ruby 3.1 ships an openssl version with that patch. And in any case, it would just trick bundler into not detecting the problem, but you would still get redefinition warnings and your application would be using two different copies of the gem at the same time. Well, actually, you might not even get any warnings due to ruby/digest@1617261, but the problem will still be there and I believe ruby/digest@1617261 should be reverted anyways.

@casperisfine
Copy link
Author

But as long as we properly audit the code, it should be fine. Did you do that?

Only very quickly, but if you think it's the way to go I can do a proper pass and even delete the parts we don't need. Overall this is all fairly well known and stable algorithms, and we can also vendor digest's test suite and run it against to make sure it works well.

As for the security risk, it's not hard to audit that it doesn't do anything funky, it's 99% bitwise operations on strings.

we could also workaround the issue by using OpenSSL::Digest i

Good point. Whatever you think is best, you have the most context here. I'll happily submit a PR for either solution. I mostly wanted to start a discussion as I really fear digest will cause a huge mess when 3.1.0 is released.

@simi
Copy link
Member

simi commented Oct 15, 2021

@casperisfine There is one general rule - "don't roll your own crypto". Some info could be found at https://arxiv.org/abs/2107.04940 (or just google that term). I'm 👎 on this.

I tried to look around, but it is not clear to me why we can't vendor ruby/digest as we do with other gems (like fileutils). Can anyone briefly explain?

@casperisfine
Copy link
Author

I tried to look around, but it is not clear to me why we can't vendor ruby/digest as we do with other gems (like fileutils).

Because it's a native gem (most of the code is in C)

@deivid-rodriguez
Copy link
Member

@simi We can't really vendor C code, at least I don't know how.

@casperisfine
Copy link
Author

There is one general rule - "don't roll your own crypto".

I don't disagree with the idea, but in this instance I don't think it qualifies as our own crypto. It's just a pure ruby implementation of MD5/SHA1/SHA256, it's very easy to unit test it.

It's not like we're implementing our own cryptographic algo.

@simi
Copy link
Member

simi commented Oct 15, 2021

Just to ensure I do understand it well, is this is the problematic code doing MD5 too early?

uri_digest = SharedHelpers.digest(:MD5).hexdigest(uri_parts.compact.join("."))

@deivid-rodriguez
Copy link
Member

deivid-rodriguez commented Oct 15, 2021

Also, the particular use of hashing algorithms here (at least the one that's causing trouble), it's not really crypto-related. It's used to digest the url for git repositories into something that can be used as a file name on disk. So I think it should pose no issue if the digest could be reversed into the original uri. We could even stop using it and do something different like replacing slashes with other characters that can be used in filenames if it wasn't for backwards compatibility concerns.

@simi
Copy link
Member

simi commented Oct 15, 2021

@deivid-rodriguez that's the way I'm currently looking as well. Remove the need of MD5 at all if not really needed.

@deivid-rodriguez
Copy link
Member

We crossed messages. Yes. that's the line causing the issue.

@casperisfine
Copy link
Author

if it wasn't for backwards compatibility concerns.

Yeah, that was my first though, but then I figured it wasn't possible to just change the algo.

Also re openssl, isn't OpenSSL a gem as well, wouldn't we run into the same problem exactly if we were to load it that soon?

@casperisfine
Copy link
Author

As for the line causing the issue, in my case it's that one:

SharedHelpers.digest(:SHA1).hexdigest(input)

@deivid-rodriguez
Copy link
Member

Also re openssl, isn't OpenSSL a gem as well, wouldn't we run into the same problem exactly if we were to load it that soon?

Yes, that's what I was trying to explain earier. The only reason why bundler doesn't complain when digest is loaded through openssl is because openssl used to require digest from the C-extensions without using rubygems require, so when that happens digest is not activated as a gem and bundler is not able to detecte that the are two versions of digest loaded at the same time. But in any case, it's been fixed by ruby/openssl#463.

@casperisfine
Copy link
Author

Ah, that makes sense now. Ok, so I suppose vendoring MD5/SHA1/SHA256 is the only way to go?

@deivid-rodriguez
Copy link
Member

Yes, sorry, the problematic line is the one @casperisfine pointed out. The one pointed out by @simi doesn't seem to cause issues for the moment, but in any case, it's non crypto-related either.

I'm leaning towards going with the approach in this PR. We can cleanup all the unused code given the liberal license and the lack of updates, and limit this only to "url-digesting" usages (I think they might be our only usages of digest). The library can be unit tested easily as @casperisfine pointed out, and even if it had a bug, I don't see how it could become a security issue.

@casperisfine
Copy link
Author

Ok, thanks. I'll cleanup this PR then, and ping back when it's in a decent state.

@deivid-rodriguez
Copy link
Member

Alright, I'd like to have @simi on board here though. Hopefully our reasoning sounds good to him too.

@casperisfine
Copy link
Author

What if the vendored code is in a namespace like Bundler::CryptoUnsafeDigest, would that solves the concerns of this ever be used in a crypto context?

@simi
Copy link
Member

simi commented Oct 15, 2021

If that's related to git only, maybe we can use git to do the work for us.

echo "abc/cde" | git hash-object --stdin
52fdf764bebd1d866eba5402b89b9d6b4f1d9f53

🤔 Anyway that is still backwards incompatible. Also I wasn't able to find out how stable this hash is (if that depends on your preferred git hash function).

@deivid-rodriguez
Copy link
Member

It's not really related to git specifically. For example, the usage you pointed out is to save a cache of gems for a specific rubygems (not git) source to disk. I think the general issue is that we need to save something related to some url to disk with a unique name. But it can't be the uri itself (I guess mainly because it includes slashes and other forbidden characters in filenames). So we use a hashing function. But we don't really care whether it's a hash function in the cryptographic sense (hard to reverse/decrypt), since we have nothing to hide in this case.

@casperisfine casperisfine force-pushed the fix-digest-dependency branch 3 times, most recently from dfc2777 to 2220385 Compare October 15, 2021 15:26
@simi
Copy link
Member

simi commented Oct 15, 2021

What about to just call it path sanitizer/hasher and the fact it is MD5 algo leave hidden (mentioned in describing comment at the top of the class)?

@casperisfine
Copy link
Author

Sure I can do that. Please note though that this callsite is the one currently causing problem, there might be other callsites causing problems in other situations that might be revealed later.

But OK. I'll likely resume work monday, and simply include a standalone MD5 function inside Source::Git with a comment explaining that MD5 is only used for backward compatibility, and than any simple mangling would work just as well.

@casperisfine
Copy link
Author

Hum, good point. I modified the test to wrap it in a if RUBY_VERSION >= "2.7". Hopefully that's the right call and it's mergeable, otherwise let me know.

@simi
Copy link
Member

simi commented Oct 21, 2021

It fails for 2.7 as well. Is that expected?

@casperisfine
Copy link
Author

Hum, weird I thought it was a default gem in 2.7 already, but I might be wrong. I updated the version check.

@casperisfine
Copy link
Author

Hum, Bundler under a ruby-core setup is failing too, but bot sure what it corresponds to.

@deivid-rodriguez
Copy link
Member

That means the spec will fail when this PR is imported into ruby-core. Let me have a look at it.

@deivid-rodriguez
Copy link
Member

@casperisfine I had a look. In the ruby-core test environment, there are no default gems installed, so dependency resolution fails because the default digest gem can't be used to provide bar's dependency on it. This is expected and the same reason why the spec fails on older rubies, so it's fine to skip the spec in ruby-core. Add the :ruby_repo tag to the spec to skip it and it should be fine 👍.

This allows `Source::Git` to no longer load the `digest` gem as it is causing
issues on Ruby 3.1.
@casperisfine
Copy link
Author

@deivid-rodriguez done. Thank you!

@simi
Copy link
Member

simi commented Oct 22, 2021

Nice!

@deivid-rodriguez deivid-rodriguez changed the title Vendor RubyDigest::SHA1 to no longer depends on the digest gem before we know which version to activate Add a custom SHA1 digest implementation to no longer depend on the digest gem before we know which version to activate Oct 25, 2021
Copy link
Member

@deivid-rodriguez deivid-rodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, thank you so much @casperisfine. I'll try to get this released today!

@deivid-rodriguez deivid-rodriguez merged commit e75f245 into rubygems:master Oct 25, 2021
@casperisfine
Copy link
Author

Thanks for the merge!

@deivid-rodriguez
Copy link
Member

Thanks for your work ❤️

size = string.bytesize * 8
buffer = string.bytes << 128
buffer << 0 while buffer.size % 64 != 56
[size].pack("Q").bytes.reverse_each {|b| buffer << b }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why converting to native endian and reverse?
I suspect that this doesn't work on big endian platforms.
Isn't it buffer.concat([size].pack("Q>").bytes)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why converting to native endian and reverse?

It directly comes from the library i vendored: https://github.com/Solistra/ruby-digest/blob/d15f906caf09171f897efc74645c9e31373d7fd1/lib/ruby_digest.rb#L521

But you are right, something's off.

casperisfine pushed a commit to casperisfine/rubygems that referenced this pull request Oct 25, 2021
As noticed by @nobu rubygems#4989 (comment)

From wikipedia: https://en.wikipedia.org/wiki/SHA-1#SHA-1_pseudocode

> append ml, the original message length in bits, as a 64-bit big-endian integer.

`Q` is native endian, so little-endian on most modern hardware.
The original code from RubyDigest reverses the bytes:
https://github.com/Solistra/ruby-digest/blob/d15f906caf09171f897efc74645c9e31373d7fd1/lib/ruby_digest.rb#L521

But that makes the code non-portable, the correct way is to directly ask
for a big-endian representation.
matzbot pushed a commit to ruby/ruby that referenced this pull request Oct 25, 2021
As noticed by @nobu rubygems/rubygems#4989 (comment)

From wikipedia: https://en.wikipedia.org/wiki/SHA-1#SHA-1_pseudocode

> append ml, the original message length in bits, as a 64-bit big-endian integer.

`Q` is native endian, so little-endian on most modern hardware.
The original code from RubyDigest reverses the bytes:
https://github.com/Solistra/ruby-digest/blob/d15f906caf09171f897efc74645c9e31373d7fd1/lib/ruby_digest.rb#L521

But that makes the code non-portable, the correct way is to directly ask
for a big-endian representation.

rubygems/rubygems@ba2be01ea4
deivid-rodriguez added a commit that referenced this pull request Oct 25, 2021
Add a custom SHA1 digest implementation to no longer depend on the digest gem before we know which version to activate

(cherry picked from commit e75f245)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants