Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LD_LIBRARY_PATH order is non-deterministic #3800

Closed
jethrogb opened this issue Mar 6, 2017 · 14 comments · Fixed by #4724
Closed

LD_LIBRARY_PATH order is non-deterministic #3800

jethrogb opened this issue Mar 6, 2017 · 14 comments · Fixed by #4724
Labels
A-environment-variables Area: environment variables A-linkage Area: linker issues, dylib, cdylib, shared libraries, so C-bug Category: bug

Comments

@jethrogb
Copy link
Contributor

jethrogb commented Mar 6, 2017

# Cargo.toml
[dependencies]
openssl-sys = "0.7"
pq-sys = "0.4"
// main.rs
fn main() {
    println!("{:#?}",std::env::var("LD_LIBRARY_PATH").unwrap().split(":").collect::<Vec<_>>());
}

Running this program with PQ_LIB_DIR=/test cargo run will result in either of the following outputs:

[
    "/usr/lib/x86_64-linux-gnu",
    "/test",
    "/tmp/ctest/target/debug",
    "/tmp/ctest/target/debug/deps",
    "/home/jethro/.multirust/toolchains/nightly-2016-10-03-x86_64-unknown-linux-gnu/lib"
]
[
    "/test",
    "/usr/lib/x86_64-linux-gnu",
    "/tmp/ctest/target/debug",
    "/tmp/ctest/target/debug/deps",
    "/home/jethro/.multirust/toolchains/nightly-2016-10-03-x86_64-unknown-linux-gnu/lib"
]

This can lead to very confusing intermittent problems, especially with crates such as openssl-sys 0.7 which include the system lib directory.

@alexcrichton
Copy link
Member

Right now Cargo doesn't maintain a DAG between these paths, so otherwise it's basically correct for it to be slightly nondeterministic (because how else would Cargo order paths?)

What's the correct ordering here?

@jethrogb
Copy link
Contributor Author

jethrogb commented Mar 7, 2017

I don't think there's a “correct” ordering. If libraries have the same name, there will be conflicts, period. In particular, crates should just not be outputting the system directory as a search path (and newer versions of openssl-sys don't, luckily).

However, I think it's important that the order is deterministic to aid in debugging these issues. Possible orderings include:

  • lexically
  • link order of corresponding crates
  • reverse of the link order
  • others...?

@alexcrichton
Copy link
Member

In some sense though I'd figure that a slighlty nondeterministic order is good because it helps weed out issues like this...

@IvanUkhov
Copy link
Contributor

@alexcrichton, I wonder if it could be related to #3809.

@alexcrichton
Copy link
Member

Perhaps yeah, although I wouldn't be certain

@lilith
Copy link
Contributor

lilith commented Mar 10, 2017

@alexcrichton Except rare and nondeterministic issues are hell to debug. Better if certain environments trigger the bug instead, so that they can be reliably reproduced.

Non-deterministic behavior in cargo has cost me more time than all other Rust bugs combined. If something makes me leave Rust, it will be the non-determinism.

I'd suggest we at least sort the output.

@alexcrichton
Copy link
Member

@nathanaeljones can you point me to the other bugs you've experience? 99.9% of the time I've found at least that nondeterminism is not the bug but rather a legitimate other bug is being exposed, and sorting just hides it. That may not be the case always though!

@lilith
Copy link
Contributor

lilith commented Mar 13, 2017

Well, I ran into the static linking form of this bug very painfully, over, and over again.

Linking can be a house of cards, but with non-determinism it's trying to stack them on a spinning table.

I realize that randomized failure may cause a wider number of environments to experience a bug at least once, but it also diminishes the probability that anyone can make a useful or reproducible bug report.

I'm an independent solo developer who bet the house (literally) on Rust. I abandoned Microsoft dev tools (primarily) for their propensity towards non-deterministic failure. Rust seemed to offer the best guarantees for building reliable and predictable systems. I realize Cargo is not Rust, but I did expect that determinism would be considered a virtue.

I realize that purging HashMaps from Cargo won't solve all the non-determinism, but I think it's good to consider the cost/benefit for things that risk inducing I-can't-do-my-job rage. Sometimes list.sort() is cheap after all.

@jethrogb
Copy link
Contributor Author

jethrogb commented Mar 13, 2017

I realize that randomized failure may cause a wider number of environments to experience a bug at least once, but it also diminishes the probability that anyone can make a useful or reproducible bug report.

100x this.

Non-determinism in the build chain is also an impediment to reproducible builds.

@lilith
Copy link
Contributor

lilith commented Mar 13, 2017

As I push the edges of Cargo's less tested feature set, I often run into things that don't do what I'd expect. When they're deterministic, I know what I changed before it broke, I change it back - and I'm back to work.

With non-determinism, I've gone as long as 9 days between the crucial change and the error appearing. My project is < 12kloc, and I'm doing everything I can to decrease compile times and avoid problematic areas. But I have no idea how I'm going to keep things working at 20 or 30kloc and perhaps 2x as many dependencies...

@alexcrichton
Copy link
Member

@nathanaeljones @jethrogb

I realize that this can be painful and I'm sorry you've had to go through this sort of build pain. To be clear a major goal of Cargo is reproducible and deterministic builds. I'm definitely not a proponent that we should just randomize everything.

Most of the breakage so far seems to have been accidental non-determinism simply inherited through use of a HashMap. It should be ok for a build to always be slightly nondeterministic (Cargo compiles in parallel after all) but only in controlled fashions. If nondeterminism breakages a build, then that is a bug.

@nathanaeljones it's interesting you bring up #3659 though. The initial commit to fix that (22096de) was actually incorrect. The final result actually addressed the underlying bug with using a HashMap to order insertion into a vector. I just mean to highlight that blindly sorting everything or blindly trying to remove HashMap will tend to cover up more bugs than it actually fixes.


With that in mind, is there a way to fix this bug? If there's no correct choice, how does Cargo select a choice?

@aturon
Copy link
Member

aturon commented Mar 13, 2017

TL;DR, I think we should sort.

@alexcrichton I feel like you and the others on this thread are talking past each other a bit.

  • I think everyone is in agreement that non-determinism can sometimes reveal the existence of a bug.

    • However, as @nathanaeljones points out (rightly IMO), finding out that there's a bug when you can't reproduce it is usually worse than the bug being hidden. And in particular, the bugs we're talking about here, AIUI, will tend to result in very visible failures when they're triggered. So "masking" them doesn't mean that there's problems lurking in production code, but rather that your build could fail if something gets perturbed.
  • @alexcrichton, you say that it's OK for the build system to have "controlled" nondeterminism (and you cite parallel builds as an example). I think that's probably true at a fundamental level, since build scripts can observe arbitrary machine state.

    • However, I think a basic goal of reducing the amount of nondeterminism over time is a good one for Cargo. This isn't about blindly sorting in an effort to fix bugs, but rather about systematically making builds easier to reproduce, so that you have some hope of debugging things when they do go wrong. So, in particular, I don't think we should choose to be nondeterministic when there's an easy way around it.
  • Finally, @alexcrichton you've mentioned a couple times that there's no "right" order for these items to appear in. But that seems beside the point, to me. What we're after is just determinism, so we can choose any order as long as its consistent. Sorting seems totally reasonable to me.

@lilith
Copy link
Contributor

lilith commented Mar 14, 2017

@alexcrichton I see #3659 differently; the input is deterministic and has explicit order, but that order is lost due to storage in a HashMap. Given the source information is deterministic, it makes sense that one could hash it in natural order... but the HashMap causes random permutation, and likely caused the assumption behind the bug to be incorrect. Using order-preserving maps would have solved both issues. When randomness is hidden behind abstraction (as it was here in fundamental data structures), it tends to cause errors.

@aturon
Copy link
Member

aturon commented Mar 14, 2017

@nathanaeljones BTW, would you be up for a video call with me and @alexcrichton at some point soon? I know that you've been dealing with non-determinism and other Cargo-related issues, as well as some pkg-config issues, and I think some high-bandwidth discussion might be helpful for making progress together.

If that sounds good, drop me a line at aturon@mozilla.com

@carols10cents carols10cents added A-environment-variables Area: environment variables A-linkage Area: linker issues, dylib, cdylib, shared libraries, so C-bug Category: bug labels Oct 2, 2017
kornelski added a commit to kornelski/cargo that referenced this issue Nov 16, 2017
bors added a commit that referenced this issue Jan 23, 2018
Sort native library paths for deterministic builds

Fixes #3800 by using a `BTreeSet`, which guarantees a deterministic iteration order.

Since the order was previously arbitrary, a sorted order is just as good. The list is so small, that any performance difference between `BTreeSet` and `HashSet` is negligible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-environment-variables Area: environment variables A-linkage Area: linker issues, dylib, cdylib, shared libraries, so C-bug Category: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants