Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove hashes from library file names? #10188

Closed
brson opened this issue Oct 31, 2013 · 18 comments · Fixed by #10593
Closed

Remove hashes from library file names? #10188

brson opened this issue Oct 31, 2013 · 18 comments · Fixed by #10593
Milestone

Comments

@brson
Copy link
Contributor

brson commented Oct 31, 2013

These hashes are the version number plus some additional metadata, mostly the #[link] metadata. Right now we don't have any actual use cases where distinguishing libraries by this metadata matters, nor do we have any mechanism for changing the link metadata for different compilation configurations. I don't know what this naming convention is buying us.

Nominating.

@metajack
Copy link
Contributor

A lot of pain? :)

How would these get used? My understanding was that they would disambiguate multiple crates so that two versions of the same library could be loaded at once (if I depended on X version 1 and some dependency of mine depended on X version 2, they could both have what they wanted, instead of beign out of luck unless all dependency requirements matched). I'm not sure how one would use this though, or what this buys over a version number.

@brandonson
Copy link
Contributor

I'm willing to jump in and come up with a patch that does this (hopefully without removing too much code other than whatever includes the hash in the name, leaving the code for generating said metadata in place - still could be useful in the future). Anyone else want this? If not, I'll take a look into it, see if it's something I can do. As simple as it sounds right now, I haven't read enough of the compiler code to be sure...

Of course, there's not even any assurance this is wanted, but I feel that just removing it from the library file name could be a good start point. I've definitely had a couple times I've tried to use libraries and had conflicts, and if none of it is used, removal of that metadata sounds good, even if it does end up temporary,

@mletterle
Copy link
Contributor

I was wondering where this came from.. Why does it use a hash? Why can't it just use the actual text? That would allow the v1 + v2 scenario and at least make sense. i.e. liba-1.so, liba-1.1.so, etc, etc.

@catamorphism
Copy link
Contributor

@mletterle Graydon had a sophisticated plan for how Rust should handle versioning, and the hashes are part of that. They're meant to encode dependencies and not just versions. However, now that Graydon has stepped down from the team, that plan lacks a champion, and we may have to punt on it.

@metajack
Copy link
Contributor

Did that plan get documented? It's hard to evaluate without knowing what the plan was and how the hash relates.

@brandonson
Copy link
Contributor

IMHO, versioning is important, but not so important that it should cause the difficulties that it currently does, with no appreciable benefit. Ending up with multiple files containing essentially the same library just from a small change is often a bit of a pain, and I'd imagine hashes in names could cause difficulties in linking from C code. (Though I haven't tried that, so I'm not sure)

This all lacks perspective on the versioning plan graydon had. Filenames including info about dependencies, versions, etc. could offer interesting benefits. Until that plan is implemented or at least confirmed, however, I feel working with libraries could be simplified somewhat.

@brson
Copy link
Contributor Author

brson commented Oct 31, 2013

The only description I'm aware of for how our version hashing operates is here: https://github.com/mozilla/rust/blob/master/src/librustc/back/link.rs#L462. It may not perfectly reflect reality.

@huonw
Copy link
Member

huonw commented Oct 31, 2013

There is this too: https://github.com/mozilla/rust/wiki/Doc-crate-hashes (which is based on an IRC conversation with graydon according to the history).

@alexcrichton
Copy link
Member

From my understanding of the wiki @huonw pointed out and the comment in the code, the only real reason for a hash in the crate name is for two same-named crates to actually be different libraries. I could create my own libstd and because it's location (pkgid) was different than rust's libstd, the two libraries would have different filenames (because the hash is different).

This seems a little nice, but it may not be worth having to explain to everyone why we have this weird hash in the crates (and this makes writing scripts working on crates incredibly difficult). It's worth considering this use case though.

@metajack
Copy link
Contributor

metajack commented Nov 1, 2013

That use case actually makes a lot of sense, and solves a real problem. In that case, the hash can be absolutely predictable, since you can just use the linkid UUID for the hash and require those be unique. Unfortunately using a UUID here would greatly expand the size of the hash value.

If the hash is predictable, all the problems of using it pretty much disappear. The problem right now is that the output of compilation is effectively a randomly named file with a known prefix. It's easy enough to grep the link args out of a file to determine the filename needed.

@mletterle
Copy link
Contributor

I can see the benefits of having such a hash, I just don't understand the need for it to be in the filename. Can't it be placed in the binary itself?

@brson
Copy link
Contributor Author

brson commented Nov 1, 2013

@mletterle it does need to go in the library symbols, but it's in the filename so they can coexist next to each other in the same directory on the filesystem.

@mletterle
Copy link
Contributor

I had a hunch it might be something like that, but I don't think it's worth the pain personally. If a developer actually needs two copies of the same library hanging around, they can go through the trouble to uniquely rename them, no? Or maybe I'm missing something fundamental here...

@lucab
Copy link
Contributor

lucab commented Nov 2, 2013

Reading the wiki, it looks like the only purpose of the hash is to resolve soname clashes. This makes sense, also taking into account the fact that for (far future) system-wide libraries, the soname space is shared also with non-rust libraries.

As the hash is expected to be static (like the library name and in opposition to the library version/interface signature), I'd like to adopt @metajack proposal: re-use the first group of the crate UUID as the hash. This gives us 8 random alphadigits with a low probability of collision, which are statically defined and known to rustpkg, as well as short and easily greppable. Eg, this will result in a libhttp-d2ad8df0.so.0.1-pre for the current httplib version.

@metajack
Copy link
Contributor

metajack commented Nov 3, 2013

Yes, with using the UUID as the hash, you can just pick a new uuid if you find a collision that is particularly bad. Probably needs a bit more thinking but it seems like this will salvage the main benefits while also fixing all the tooling issues.

@brson
Copy link
Contributor Author

brson commented Nov 7, 2013

If we simply stopped hashing the upstream dependencies into the hash, then they would be stable - it would just be the link metadata, including any potential uuid.

@pnkfelix
Copy link
Member

pnkfelix commented Nov 7, 2013

Need to decide something for 1.0. P-backcompat-lang.

@metajack
Copy link
Contributor

The link metadata is supposed to be disappearing (see #8523), so using a UUID from that is probably not the best.

Here's what I propose:

  1. Change #[link(package_id="...")]; to #[pkgid="..."];
  2. Remove all link metadata
  3. Calculate the CMH with SHA1(pkgid)
  4. Calculate the STH(sym) with SHA1(CMH + type_str(sym))
  5. For libraries, the output name is NAME-CMH-VERSION or NAME-CMH depending on whether the pkgid contains a version.

This means:

  • the hash is completely predictable and easily calculated. either you can grep it out of lib.rs or you can hardcode the package id in a build script.
  • name collisions for different package ids with the same name are avoided (important for /usr/lib use case)
  • libraries that don't contain a version in the pacakge id won't conflict with ones that do.

What do you think?

metajack added a commit to metajack/rust that referenced this issue Nov 21, 2013
This replaces the link meta attributes with a pkgid attribute and uses a hash
of this as the crate hash. This makes the crate hash computable by things
other than the Rust compiler. It also switches the hash function ot SHA1 since
that is much more likely to be available in shell, Python, etc than SipHash.

Fixes rust-lang#10188, rust-lang#8523.
bors added a commit that referenced this issue Dec 11, 2013
This replaces the link meta attributes with a pkgid attribute and uses a hash
of this as the crate hash. This makes the crate hash computable by things
other than the Rust compiler. It also switches the hash function ot SHA1 since
that is much more likely to be available in shell, Python, etc than SipHash.

Fixes #10188, #8523.
flip1995 pushed a commit to flip1995/rust that referenced this issue Jan 12, 2023
…fate

expl_impl_clone_on_copy: ignore packed structs with type/const params

changelog: [`expl_impl_clone_on_copy`]: Ignore `#[repr(packed)]` structs with type or const paramaters

Fixes rust-lang#10188

A more involved solution that checks if any bound on the trait impl aren't present on the struct definition would be ideal, but I couldn't see a nice way to go about that
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants