Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vendor all rustbuild dependencies in this repo #37524

Merged
merged 2 commits into from
Nov 9, 2016

Conversation

alexcrichton
Copy link
Member

This commit vendors all crates.io dependencies into the rust-lang/rust repository using the cargo-vendor tool. This is done in an effort to make rustbuild distro-ready by ensuring that our source tarballs are self-contained units which don't need extraneous network downloads.

A new src/vendor directory is created with all vendored crates, and Cargo, when using rustbuild, is configured to use this directory. Over time we can deduplicate this directory with the actual src tree (e.g. src/librustc_serialize, src/liblibc, src/libgetopts, ...). For now though that's left to a separate commit.

@alexcrichton
Copy link
Member Author

r? @brson

@rust-highfive
Copy link
Collaborator

r? @aturon

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive assigned brson and unassigned aturon Nov 1, 2016
@alexcrichton
Copy link
Member Author

Note that the first commit is purely just adding src/vendor, where the second commit is the build system changes and such. (e.g. shouldn't be a need to look at the first commit)

@retep998
Copy link
Member

retep998 commented Nov 2, 2016

Isn't it only the source tarballs which need to be self contained and vendor everything? If so, why does the git repo need to vendor everything? Can't rustbuild just acquire the external dependencies when creating the source tarball in the first place?

@alexcrichton
Copy link
Member Author

Yes we only strictly need this source tarballs, but it's far easier to just check it into the repo (like LLVM) and not worry about that automation.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Nov 2, 2016

LLVM uses git submodules. As far as I can tell, this just copies all the files. I don't know if it makes a difference but it does seem odd at a glance, could you elaborate?

@brson
Copy link
Contributor

brson commented Nov 3, 2016

@rkruppe This is not using submodules because it is using cargo's vendoring support. So this is dogfooding the tools we are having our users use for similar purposes.

@brson
Copy link
Contributor

brson commented Nov 3, 2016

@alexcrichton what is the process for updating dependencies after this patch? That's going to be something we do a lot so it needs to be easy. Where is it documented (the diff is truncated so maybe it's in here somewhere)?

@alexcrichton
Copy link
Member Author

@brson hm yeah that's a good point about updating dependencies (or just restructuring the DAG)

Right now we're always passing --frozen to Cargo everywhere, so it's not allowed to change the lock file. This means that if you change it then ./x.py build will fail by default. You can fix this by just running cargo generate-lockfile manually inside src, but perhaps that's not the best interface?

Alternatively we could change all the bots to always pass --frozen and users by default don't pass that (allowing updates). @brson do you know of a way to detect whether we're running on the bots or not?

@brson
Copy link
Contributor

brson commented Nov 3, 2016

I don't know a definite way to detect when running on the bots, but we could add a flag to configure to indicate such. I don't know it's worth it for this though.

A good workflow to me might be for bootstrap to somehow detect when the frozen build fails because it would change the lockfile and say "it looks like you are changing a dependency. run x.py generate-lockfiles" or something.

@brson
Copy link
Contributor

brson commented Nov 3, 2016

This vendor directory seems to provide a convenient way to implement a license-check on third-party crates...

@retep998
Copy link
Member

retep998 commented Nov 3, 2016

In my opinion one of the great benefits of rustbuild was that Rust itself would be able to easily use external crates and not have to shove everything in-tree. This PR will undo a significant amount of that progress. Updating these dependencies will be a painfully manual process, and if it could be automated then there would be no reason for this PR as rustbuild would be able to automatically do it when creating a source tarball.

@alexcrichton
Copy link
Member Author

@brson unfortunately I don't know of a great way to detect that Cargo failed because it would change the lock file, scraping the output tends to be harder than it's worth

I figured we could change to a system like this and then if it becomes a problem we could look for a solution (likely adding a "I'm a bot" flag).

@retep998 this PR is just a baby step towards making rustbuild more official. I wanted to reduce the chances this is blocked by a stray "zomg 3MB commit" comment by vendoring as little as possible. This is not a declaration that we'll never use dependencies from crates.io, as that's exactly the point of rustbuild. We'll get there in time, we're just not there yet.

@brson
Copy link
Contributor

brson commented Nov 3, 2016

In my opinion one of the great benefits of rustbuild was that Rust itself would be able to easily use external crates and not have to shove everything in-tree. This PR will undo a significant amount of that progress.

This is still making it easy to use external crates. It is still using cargo to do the build. This is the exact tooling we expect our users to use when they need to store code in-tree. The only difference between this arrangement and downloading from crates.io is that there is one extra step to do the vendoring when updating crates.

At the moment we believe we must provide such trees for source distribution, and offhand I think it would be unnecessarily risky for our own builds to be that different from the builds we are giving to packagers, though we do have distcheck that could help ensure the source tarball works.

Although I wonder how true it is that packagers expect the source to be vendored - they are already packaging cargo, and cargo does not vendor it's dependencies does it?

@brson
Copy link
Contributor

brson commented Nov 3, 2016

Another approach we could take here is to have a tool that produces a vendored source tree on demand, but we don't use it by default. So in the cases where distro packagers do need it, they can do it themselves.

Maybe we should consult some distro folks and see what their opinions are.

@brson
Copy link
Contributor

brson commented Nov 3, 2016

(On the other hand, I kinda want to just do something so we can get rid of the makefiles, perfect it later; and having vendored source is the conservative option that distros won't complain about).

@alexcrichton
Copy link
Member Author

Yeah I was kinda hoping we wouldn't block this on too many concerns. We're already "vendoring" hoedown, miniz, jemalloc, and llvm. While LLVM is normally a special case, all these other pieces I really hope we don't try to split out into separate packages as it's not worth it right now.

@liigo
Copy link
Contributor

liigo commented Nov 4, 2016

You can download these files to local disk before compile/build starting. Don't pushing them into git repo, before you must fork & modify their source code.

@cuviper
Copy link
Member

cuviper commented Nov 4, 2016

having vendored source is the conservative option that distros won't complain about

I do complain a little... but to the extent this stuff is used just at build time and not installed code, I care less. If there's a way to eventually override the vendoring to point to a distro directory registry instead, that would be even better. (We don't have any packaged crates in Fedora yet, but that's the plan.)

We're already "vendoring" hoedown, miniz, jemalloc, and llvm.

And in Fedora I've disabled jemalloc and use system llvm. The other two I just haven't got around to bothering with, since support for those as external libraries isn't already there.

@cuviper
Copy link
Member

cuviper commented Nov 4, 2016

they are already packaging cargo, and cargo does not vendor it's dependencies does it?

It does not, so in Fedora's cargo.src.rpm I have my own cargo-vendor'ed tarball as an extra source. But again, I eventually want that to point to a distro directory registry.

Actually, last I checked there was no cargo source distribution at all, so even for that I'm just grabbing from the git tag.

@brson
Copy link
Contributor

brson commented Nov 4, 2016

Here are my outstanding concerns:

  • If we are going to vendor crates, then the workflow for devs needs to be simple and obvious.
  • I would like there to be a build system command to do the vendoring, like x.py generate-lockfiles, and for it to be documented how to deal with the vendoring. I'm not sure where the build system is documented presently.
  • So I think with --frozen the workflow would be that cargo complains about lockfiles, some dev asks on IRC "what do I do?", then they run x.py generate-lockfiles.
  • @cuviper's comments make me think that distros do generally need a vendoring solution (at least until we have a way to user their packaged crates ... which is a whole nother ball of wax).
  • I'm not sure that keeping these crates in-tree is better than providing a tool for ditros to do the vendoring themselves, though as I mentioned before it has the dogfooding advantage.

@brson
Copy link
Contributor

brson commented Nov 4, 2016

Don't pushing them into git repo, before you must fork & modify their source code.

@liigo I don't follow what you mean hear by "before you must fork and modify their source code". When would we be forking and modifying the vendored code?

@alexcrichton
Copy link
Member Author

@brson ok I've implemented:

  • --frozen is no longer passed by default, it's only passed on ./configure --enable-vendor.
  • Travis passes --enable-vendor

That way devs adding deps should work normally, it's just when you land an updated dep you'll have to run cargo-vendor

@alexcrichton
Copy link
Member Author

@est31 using submodules precludes standard usage of Cargo overrides, so no, unfortunately it can't be used.

Yes tarballs should have dependencies in them, and this is the easiest way to do so.

@brson
Copy link
Contributor

brson commented Nov 8, 2016

As formulated now nothing in this PR will immediately affect developers, nor distros, though distros will likely want to turn on --enable-vendor. It seems to me like the argument against this PR now is the extra source code that will need to be maintained in the history forever.

I'm not convinced by this argument, primarily because it is a small amount of source code. Git is very good at managing source code and we've already churned through orders of magnitude more than this in Rust's history. It will accumulate over time though and we can never take it back.

So I'm inclined to land this now to proceed with removal of the make build system. It doesn't commit us to anything beyond maintaining 20k lines of code in the git history, and can be re-considered later if needed.

cc @rust-lang/compiler @rust-lang/libs this is a somewhat significant change to the Rust source tree that you might want to be aware of. The TL;DR:

In order to make building Rust with cargo not hit the network (primarily for distros and our CI), this vendors all out-of-tree source in the src/vendor directory. Right now that only applies to rustbuild itself, but this strategy will extend to any out-of-tree components that become part of Rust in the future.

The developer workflow change is: during CI, builds will be run against the vendored tree. If the build fails, the PR author will need to run cargo vendor in the src directory and ammend their PR (in other words, when developers update cargo dependencies they need to vendor the deps).

The downside of this is that we will be storing the source code to deps in-tree, increasing the repo's size.

@eddyb
Copy link
Member

eddyb commented Nov 8, 2016

One argument for this is that people might want to automate distro builds from git directly instead of using tarballs. It's probably fine in the short term.

@nrc
Copy link
Member

nrc commented Nov 8, 2016

+1, but nit, can these be in vendor rather than src/vendor? I often grep src/* -r to avoid llvm etc., and I would like to easily ignore these dependencies. It also feels morally right too - these are not our source.

@est31
Copy link
Member

est31 commented Nov 8, 2016

At the very least, can there some policy be established to never ever modify the code in the vendor subdir? I've seen many C++ projects going down that path, and it has caused very bad issues every time, because updating now becomes something dangerous and scary.

There should also be some automatic verification by travis or bors to make sure a PR that updates some dependency doesn't mess with its source.

@eddyb
Copy link
Member

eddyb commented Nov 8, 2016

@est31 If possible, these should be hashed (in Cargo.lock), so the build can't ever start with a modification.

@steveklabnik
Copy link
Member

steveklabnik commented Nov 8, 2016

At the very least, can there some policy be established to never ever modify the code in the vendor subdir?

Fear not, @est31 :

[steve@becoming lol (master)]$ cat Cargo.toml 
[package]
name = "lol"
version = "0.1.0"
authors = ["Steve Klabnik <steve@steveklabnik.com>"]

[dependencies]
ref_slice = "1.1.0"

[steve@becoming lol (master)]$ cargo build
   Compiling ref_slice v1.1.1
   Compiling lol v0.1.0 (file:///home/steve/tmp/lol)
    Finished debug [unoptimized + debuginfo] target(s) in 0.12 secs
[steve@becoming lol (master)]$ echo "// testing" >> vendor/ref_slice/src/lib.rs 
[steve@becoming lol (master)]$ cargo clean
[steve@becoming lol (master)]$ cargo build
error: the listed checksum of `/home/steve/tmp/lol/vendor/ref_slice/src/lib.rs` has changed:
expected: 54152bc0580b0f05b37bc8a97263a36bcf43af25ec8d13c90d58dce67f919ed4
actual:   e04c52abc3c719cdba6ed6a38ad10dba75ee276617fc8a43a325ccc1c5c03f40

directory sources are not intended to be edited, if modifications are required then it is recommended that [replace] is used with a forked copy of the source

@alexcrichton
Copy link
Member Author

can these be in vendor rather than src/vendor

Yeah this was my initial inclination but jemalloc/llvm/hoedown/etc all stopped my as the only "convention" we have now is code goes under src basically

@est31
Copy link
Member

est31 commented Nov 8, 2016

Thanks for clarifying @eddyb @steveklabnik !

@brson
Copy link
Contributor

brson commented Nov 9, 2016

@bors r+

Let's do it as-is. Maybe once we've got the other makefiles gone we can move all third-party code to a directory outside src/.

@bors
Copy link
Contributor

bors commented Nov 9, 2016

📌 Commit 31a8638 has been approved by brson

eddyb added a commit to eddyb/rust that referenced this pull request Nov 9, 2016
Vendor all rustbuild dependencies in this repo

This commit vendors all crates.io dependencies into the rust-lang/rust repository using the `cargo-vendor` tool. This is done in an effort to make rustbuild distro-ready by ensuring that our source tarballs are self-contained units which don't need extraneous network downloads.

A new `src/vendor` directory is created with all vendored crates, and Cargo, when using rustbuild, is configured to use this directory. Over time we can deduplicate this directory with the actual src tree (e.g. src/librustc_serialize, src/liblibc, src/libgetopts, ...). For now though that's left to a separate commit.
bors added a commit that referenced this pull request Nov 9, 2016
Rollup of 15 pull requests

- Successful merges: #36868, #37134, #37229, #37250, #37370, #37428, #37432, #37472, #37524, #37614, #37622, #37627, #37636, #37644, #37654
- Failed merges: #37463, #37542, #37645
@bors bors merged commit 31a8638 into rust-lang:master Nov 9, 2016
@alexcrichton alexcrichton deleted the vendor branch November 11, 2016 22:01
frewsxcv added a commit to frewsxcv/rust that referenced this pull request Feb 13, 2017
…hton

Automate vendoring by invoking cargo-vendor when building src dist tarballs.

This avoids rust-lang#39633 bringing the `src/vendor` checked into git by rust-lang#37524, past 200,000 lines of code.

I believe the strategy of having rustbuild run `cargo vendor` during the `dist src` step is sound.

However, the only way to be sure `cargo-vendor` exists is to run `cargo install --force cargo-vendor`, which will recompile it every time (not passing `--force` means you can't tell between "already exists" and "build error"). ~~This is quite suboptimal and I'd like to somehow do it in each `Dockerfile` that would need it.~~

* [ ] Cache `CARGO_HOME` (i.e. `~/.cargo`) between CI runs
  * `bin/cargo-vendor` and the actual caches are the relevant bits
* [x] Do not build `cargo-vendor` all the time
  * ~~Maybe detect `~/.cargo/bin/cargo-vendor` already exists?~~
  * ~~Could also try to build it in a `Dockerfile` but do we have `cargo`/`rustc` there?~~
  * Final solution: check `cargo install --list` for a line starting with `cargo-vendor `

cc @rust-lang/tools
bors added a commit that referenced this pull request Feb 14, 2017
Automate vendoring by invoking cargo-vendor when building src dist tarballs.

This avoids #39633 bringing the `src/vendor` checked into git by #37524, past 200,000 lines of code.

I believe the strategy of having rustbuild run `cargo vendor` during the `dist src` step is sound.

However, the only way to be sure `cargo-vendor` exists is to run `cargo install --force cargo-vendor`, which will recompile it every time (not passing `--force` means you can't tell between "already exists" and "build error"). ~~This is quite suboptimal and I'd like to somehow do it in each `Dockerfile` that would need it.~~

* [ ] Cache `CARGO_HOME` (i.e. `~/.cargo`) between CI runs
  * `bin/cargo-vendor` and the actual caches are the relevant bits
* [x] Do not build `cargo-vendor` all the time
  * ~~Maybe detect `~/.cargo/bin/cargo-vendor` already exists?~~
  * ~~Could also try to build it in a `Dockerfile` but do we have `cargo`/`rustc` there?~~
  * Final solution: check `cargo install --list` for a line starting with `cargo-vendor `

cc @rust-lang/tools
bors added a commit that referenced this pull request Feb 14, 2017
Automate vendoring by invoking cargo-vendor when building src dist tarballs.

This avoids #39633 bringing the `src/vendor` checked into git by #37524, past 200,000 lines of code.

I believe the strategy of having rustbuild run `cargo vendor` during the `dist src` step is sound.

However, the only way to be sure `cargo-vendor` exists is to run `cargo install --force cargo-vendor`, which will recompile it every time (not passing `--force` means you can't tell between "already exists" and "build error"). ~~This is quite suboptimal and I'd like to somehow do it in each `Dockerfile` that would need it.~~

* [ ] Cache `CARGO_HOME` (i.e. `~/.cargo`) between CI runs
  * `bin/cargo-vendor` and the actual caches are the relevant bits
* [x] Do not build `cargo-vendor` all the time
  * ~~Maybe detect `~/.cargo/bin/cargo-vendor` already exists?~~
  * ~~Could also try to build it in a `Dockerfile` but do we have `cargo`/`rustc` there?~~
  * Final solution: check `cargo install --list` for a line starting with `cargo-vendor `

cc @rust-lang/tools
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.