-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a custom SHA1 digest implementation to no longer depend on the digest gem before we know which version to activate #4989
Add a custom SHA1 digest implementation to no longer depend on the digest gem before we know which version to activate #4989
Conversation
7aa6a43
to
e4dcb33
Compare
@casperisfine Yes, the explanation of the issue makes sense and the solution seems like the way to go. The scary part is to start depending on a bunch of code from a random public repository. But as long as we properly audit the code, it should be fine. Did you do that? As a side note, we could also workaround the issue by using |
Only very quickly, but if you think it's the way to go I can do a proper pass and even delete the parts we don't need. Overall this is all fairly well known and stable algorithms, and we can also vendor As for the security risk, it's not hard to audit that it doesn't do anything funky, it's 99% bitwise operations on strings.
Good point. Whatever you think is best, you have the most context here. I'll happily submit a PR for either solution. I mostly wanted to start a discussion as I really fear |
@casperisfine There is one general rule - "don't roll your own crypto". Some info could be found at https://arxiv.org/abs/2107.04940 (or just google that term). I'm 👎 on this. I tried to look around, but it is not clear to me why we can't vendor ruby/digest as we do with other gems (like fileutils). Can anyone briefly explain? |
Because it's a native gem (most of the code is in C) |
@simi We can't really vendor C code, at least I don't know how. |
I don't disagree with the idea, but in this instance I don't think it qualifies as our own crypto. It's just a pure ruby implementation of MD5/SHA1/SHA256, it's very easy to unit test it. It's not like we're implementing our own cryptographic algo. |
Just to ensure I do understand it well, is this is the problematic code doing MD5 too early?
|
Also, the particular use of hashing algorithms here (at least the one that's causing trouble), it's not really crypto-related. It's used to digest the url for git repositories into something that can be used as a file name on disk. So I think it should pose no issue if the digest could be reversed into the original uri. We could even stop using it and do something different like replacing slashes with other characters that can be used in filenames if it wasn't for backwards compatibility concerns. |
@deivid-rodriguez that's the way I'm currently looking as well. Remove the need of MD5 at all if not really needed. |
We crossed messages. Yes. that's the line causing the issue. |
Yeah, that was my first though, but then I figured it wasn't possible to just change the algo. Also re |
As for the line causing the issue, in my case it's that one: rubygems/bundler/lib/bundler/source/git.rb Line 310 in c9f072a
|
Yes, that's what I was trying to explain earier. The only reason why bundler doesn't complain when |
Ah, that makes sense now. Ok, so I suppose vendoring MD5/SHA1/SHA256 is the only way to go? |
Yes, sorry, the problematic line is the one @casperisfine pointed out. The one pointed out by @simi doesn't seem to cause issues for the moment, but in any case, it's non crypto-related either. I'm leaning towards going with the approach in this PR. We can cleanup all the unused code given the liberal license and the lack of updates, and limit this only to "url-digesting" usages (I think they might be our only usages of digest). The library can be unit tested easily as @casperisfine pointed out, and even if it had a bug, I don't see how it could become a security issue. |
Ok, thanks. I'll cleanup this PR then, and ping back when it's in a decent state. |
Alright, I'd like to have @simi on board here though. Hopefully our reasoning sounds good to him too. |
What if the vendored code is in a namespace like |
If that's related to git only, maybe we can use git to do the work for us.
🤔 Anyway that is still backwards incompatible. Also I wasn't able to find out how stable this hash is (if that depends on your preferred git hash function). |
It's not really related to git specifically. For example, the usage you pointed out is to save a cache of gems for a specific rubygems (not git) source to disk. I think the general issue is that we need to save something related to some url to disk with a unique name. But it can't be the uri itself (I guess mainly because it includes slashes and other forbidden characters in filenames). So we use a hashing function. But we don't really care whether it's a hash function in the cryptographic sense (hard to reverse/decrypt), since we have nothing to hide in this case. |
dfc2777
to
2220385
Compare
What about to just call it path sanitizer/hasher and the fact it is MD5 algo leave hidden (mentioned in describing comment at the top of the class)? |
Sure I can do that. Please note though that this callsite is the one currently causing problem, there might be other callsites causing problems in other situations that might be revealed later. But OK. I'll likely resume work monday, and simply include a standalone MD5 function inside |
da7eb7e
to
c024b7e
Compare
Hum, good point. I modified the test to wrap it in a |
c024b7e
to
bce2b7c
Compare
It fails for 2.7 as well. Is that expected? |
bce2b7c
to
5bcb1d7
Compare
Hum, weird I thought it was a default gem in 2.7 already, but I might be wrong. I updated the version check. |
Hum, |
That means the spec will fail when this PR is imported into ruby-core. Let me have a look at it. |
@casperisfine I had a look. In the ruby-core test environment, there are no default gems installed, so dependency resolution fails because the default digest gem can't be used to provide |
This allows `Source::Git` to no longer load the `digest` gem as it is causing issues on Ruby 3.1.
5bcb1d7
to
c19a9f2
Compare
@deivid-rodriguez done. Thank you! |
Nice! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome, thank you so much @casperisfine. I'll try to get this released today!
Thanks for the merge! |
Thanks for your work ❤️ |
size = string.bytesize * 8 | ||
buffer = string.bytes << 128 | ||
buffer << 0 while buffer.size % 64 != 56 | ||
[size].pack("Q").bytes.reverse_each {|b| buffer << b } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why converting to native endian and reverse?
I suspect that this doesn't work on big endian platforms.
Isn't it buffer.concat([size].pack("Q>").bytes)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why converting to native endian and reverse?
It directly comes from the library i vendored: https://github.com/Solistra/ruby-digest/blob/d15f906caf09171f897efc74645c9e31373d7fd1/lib/ruby_digest.rb#L521
But you are right, something's off.
As noticed by @nobu rubygems#4989 (comment) From wikipedia: https://en.wikipedia.org/wiki/SHA-1#SHA-1_pseudocode > append ml, the original message length in bits, as a 64-bit big-endian integer. `Q` is native endian, so little-endian on most modern hardware. The original code from RubyDigest reverses the bytes: https://github.com/Solistra/ruby-digest/blob/d15f906caf09171f897efc74645c9e31373d7fd1/lib/ruby_digest.rb#L521 But that makes the code non-portable, the correct way is to directly ask for a big-endian representation.
As noticed by @nobu rubygems/rubygems#4989 (comment) From wikipedia: https://en.wikipedia.org/wiki/SHA-1#SHA-1_pseudocode > append ml, the original message length in bits, as a 64-bit big-endian integer. `Q` is native endian, so little-endian on most modern hardware. The original code from RubyDigest reverses the bytes: https://github.com/Solistra/ruby-digest/blob/d15f906caf09171f897efc74645c9e31373d7fd1/lib/ruby_digest.rb#L521 But that makes the code non-portable, the correct way is to directly ask for a big-endian representation. rubygems/rubygems@ba2be01ea4
Add a custom SHA1 digest implementation to no longer depend on the digest gem before we know which version to activate (cherry picked from commit e75f245)
Ref: https://bugs.ruby-lang.org/issues/17873
Ruby 3.1. is making
digest
a default gem, because of this if bundlerdepends on it, it might cause already activated issues.
This currently happens if your Gemfile contains a git gem, because
the GitSource makes an SHA1 of the repository URL.
What was the end-user or developer problem that led to this PR?
Currently it is very complicated if not impossible to use a gem that has
digest
in its dependencies if you also have a gitst gem.What is your fix for the problem, implemented in this PR?
I found https://github.com/Solistra/ruby-digest which is under public domain and could be vendored. It conveniently implement the 3 hash algorithms used by bundler.
This is a rough first draft, it could certainly be cleaned a bit more.
@deivid-rodriguez how does this sound?