Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve deps in cache and mirrors #547

Merged
merged 16 commits into from
Aug 18, 2016
Merged

Resolve deps in cache and mirrors #547

merged 16 commits into from
Aug 18, 2016

Conversation

mattfarina
Copy link
Member

@mattfarina mattfarina commented Aug 12, 2016

This implements two complementary features that are linked.

First, dependencies are resolved in the cache rather than the vendor/ directory. If the dependency tree is successfully figured out the source is exported to vendor/. This deprecates a number of flags that dealt with stripping VCS information as they are no longer needed. For the time their use will display a deprecation warning that will be removed in a future version.

There are a few reasons for this change:

  1. When the dependency tree is worked out in the vendor folder and there is a problem Glide can exit. This leaves the tree within the vendor directory in a bad state which is not ideal.
  2. When vendoring is used a lot of time is spent re-downloading the repos when updates and installs happen. Using a cache and copying from there is slow (when written generically) or very platform specific when you have to deal with multiple platforms (as we do).
  3. The VCS and metadata source can now be shared cross project.

In using the cache this removes the use of flags to pull dependencies from the GOPATH. To alleviate that change a mirrors feature has been added. This allows a repo location (rather than a package) to be overridden. This feature is currently global to a user on a system.

So, you can take a repo location such as https://github.com/example/foo and tell Glide to use https://git.example.com/example/foo.git instead. Or, file:///path/to/local/repo if you want to fetch from the GOPATH or other local location.

This is a major change and feedback is requested.

@mattfarina mattfarina added this to the 0.12 milestone Aug 12, 2016
@mattfarina
Copy link
Member Author

Not sure why GitHub didn't register the passing travis tests. that's at https://travis-ci.org/Masterminds/glide/builds/151840337.

// Export from the cache to the vendor directory
func (i *Installer) Export(conf *cfg.Config) error {
msg.Info("Removing existing vendor dependencies")
err := os.RemoveAll(i.VendorPath())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably preferable to write out to a temporary directory, then remove the old vendor dir and move the new one into in place if and only if creating the new one succeeded

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I'll make that change.

gpath "github.com/Masterminds/glide/path"
)

var overrides map[string]*override
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen any, but asking just in case - is there any possibility of parallel goroutine access to this map?

Copy link
Member Author

@mattfarina mattfarina Aug 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a great question. I'm pretty sure there is so I'm not sure why it's not complained. I'll fix this. Thanks.

Edit: There is only parallel read. No chance of parallel write.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool cool, then we're good.

@sdboyer
Copy link
Member

sdboyer commented Aug 15, 2016

in general, i think this looks good. 🎉

i do have some concerns about the implications of overrides, though. i'm going to try to return to write that up this evening.

@sdboyer
Copy link
Member

sdboyer commented Aug 16, 2016

So, overrides. Lemme preface by saying that I think this is a crucial use case - #548 is just the most recent example. I just want to make sure we do it in such a way that we minimize the potentially nasty second-order effects.

My basic concern is, from the examples given, overrides allow/encourage the user to name local paths as substitutions for ordinary upstream paths. As a result, lock files produced when such overrides are in place may be non-portable, as they now rely on potentially arbitrary versions of code that just happen to be on peoples' local system. If the goal is to create a portable lock file (which...yes, right?), then the list of weird failures modes preventing that goes something like this:

  1. There is no .git/.bzr/.hg dir at the root to which the user points
  2. There is a .git/.bzr/.hg dir, but in a parent or a child of the dir to which the user points
  3. There are committed changes that have not been pushed upstream
  4. There are uncommitted changes
  5. There are extra files the VCS knows nothing about

All of these are cases in which glide would either be unable to write out a rev to the lock, or would write out one that's incorrect or unusable.

Now, I realize that these were all situations that could occur without this PR if the appropriate flags were passed and packages were sourced from GOPATH (although I'm not sure how lock files were written in those cases). That was always a problem for reproducibility, but it was at least somewhat implicit and aligned with how existing go tooling works.

This PR, though, introduces a new element - instead of implicitly searching GOPATH, the user can specify a local path. I don't have a specific example of how "misuse" could become problematic, but it's introducing new user choice to cover use cases that I suspect we'll find better solutions for, reasonably soon. When they arrive, we may want to roll this back.

Sadly, I don't have a really good alternative right now. The work I just finished (sdboyer/gps#83) lays the groundwork for "path-based" import satisfaction, which is how I envision satisfying this class of requirement. But it's not there yet. The only real idea I have at this point is to define "portability" levels, and incorporate that information into the lock file, so that we can at least provide saner errors to other users when they fail to install from a glide.lock that was generated with overrides on.

@sdboyer
Copy link
Member

sdboyer commented Aug 16, 2016

Oh, also - do you have specific stories around referencing GOPATH in your mind? The two important use cases I can currently recall that would be helped by this feature are

  1. non-vendor mode? #548
  2. Facilitating organizations that want to have a bloc of dependencies pre-loaded on a machine at specific, set versions, rather than having them be negotiated by the tool

I do not think the "performance/waste" argument for this is valid - the one that goes, "it's already on my GOPATH, why redownload it?" That can be mitigated, and is a one-time cost anyway.

@sdboyer
Copy link
Member

sdboyer commented Aug 16, 2016

Ugh, I did forget a couple things. Sorry.

First, kinda more a nit, but - gps has a thing called overrides (which is generally consistent with what other comparable systems call overrides). These allow the root project to constraints and source locations for imports that will override anything declared by any dep that's brought in. You said you were considering naming this mirroring, though, so...

Second - if this kind of information is going to be included, there's also an approach here that puts this information in the manifest. Yes, it's a bunch more work to decide on how to reconcile it with everything else, but...having it in the manifest at least makes it publicly visible (to humans, and to solvers on other machines/importing this as a dep) that "this path is not sourced from a repo."

If that doesn't match the use case you have in mind, then a) good, let's talk use cases and b) i think that should highlight how varied the use of this may be, and maybe some of the classes of problems that could arise.

@mattfarina
Copy link
Member Author

@sdboyer A few things...

  1. I think changing override to mirror makes the intent clearer.
  2. The GOPATH is a mirror here. I would not recommend using it so the docs should be updated for that. The mirror was the goal of the old flags though it was not clearly communicated.
  3. We need to help people be successful and mirrors are required for that. But, with that power people can shoot themselves in the foot. No idea how to avoid that.
  4. Is extra metadata required for any tooling to work? Sure, someone could put an errant commit somewhere, like a mirror, and distribution will be a problem. But, how often is that situation a reality? In other tools that already support mirrors?

@mattfarina mattfarina changed the title Resolve deps in cache and overrides Resolve deps in cache and mirrors Aug 16, 2016
@mattfarina mattfarina mentioned this pull request Aug 16, 2016
@sdboyer
Copy link
Member

sdboyer commented Aug 16, 2016

We need to help people be successful and mirrors are required for that.

Could you please explain that requirement in terms of use cases? Are there ones beyond what I already described?

Is extra metadata required for any tooling to work?

"any tooling" referring to, e.g., bundler/npm? Well, bundler calls them "local git repos," and yes, they require that it be a valid repository, and they have several other requirements as well, in order to ensure that the generated lock file remains sane:

Bundler does many checks to ensure a developer won't work with invalid references. Particularly, we force a developer to specify a branch in the Gemfile in order to use this feature. If the branch specified in the Gemfile and the current branch in the local git repository do not match, Bundler will abort. This ensures that a developer is always working against the correct branches, and prevents accidental locking to a different branch. (docs link)

emphasis mine, to highlight that yes, they consider it important that the generated lock file is portable.

But, how often is that situation a reality?

My subjective experience? Often. Just last night, when I was updating #384, I had an error occur because I had extra uncommitted files in my local gps tree that I unintentionally rsynced into glide's vendored copy (I have to use a weird workflow to pull in updates). Even though that workflow is weird, it's symptomatic of case no. 5. Unpushed changes (no. 3) happened to me all the time when using that feature in bundler, and no. 4 bit me on a hobby project this past winter.

But, with that power people can shoot themselves in the foot. No idea how to avoid that.

I think we do it by focusing on the strict use cases the feature is trying to meet, satisfying those, and then constraining its power or layering on additional checks/requirements that reduce and clarify the failure modes. IMO, not having bad failure modes is also an important component of success.

Users may still be shooting themselves in the foot, but we can give them a BB-gun instead of a bazooka.

@mattfarina
Copy link
Member Author

@sdboyer thanks for the link and all the thoughts on this. I think there might be a use case gap here. So, let me outline a couple.

  1. As a developer, I need my CI/CD system to pull from a local mirror to my environment. An example of this, in a network sense, is the CI system for OpenStack. It does more volume than Travis CI, last I checked, and they do a lot of dependency installation. The CI system is distributed in several regions of several public clouds. To put the dependencies close to the CI runs for fast installation they have mirrors in each region.
  2. As a developer, I need to only use dependencies from my companies trusted store. These are public packages but the trusted ones are stored in a mirror for me. A place I've seen this several times with with debian packages of libraries. Many enterprises use them.

Your example of Bundler installing Gems is an interesting one. The link you provided is about switching from Gems to the source and the rules around that switch. In Go we always use the source. What caught my attention was the way you can configure bundler to use a mirror for your gems.

Bundler is a different case because it's a central package repo. Go is distributed so the mirroring functionality needs to handle distributed.

Another place to look is at PyPI mirrors and caches. Once you have a mirror up you can specify it in your pip.conf file.

I renamed it to mirrors rather than overrides because a true mirror use case is the primary one I had in mind.

Now, some folks are going to use this to route to their GOPATH or possibly some other development environment. This is where the real possibility of trouble comes in.

What kinds of problems can come up if a developer doesn't push commits from a dependency stilling in a dev environment to their public normal distribution locations?

  • A CI system or other developer that needs to install a version (release or revision) cannot find that version and exits in error.
  • A CI system or other developer resolving releases installs an older release, because the newer one isn't accessible in the public/sharing location.

The first one, where it exits in error, is due to a glide.lock file and there will be an error. For someone, other than the original developer who failed to push the commits publicly, this may be a bit perplexing of an error.

The second one, at first, seemed a bit more painful. But, if the resolver matches for compatible versions than you should be ok. It's not idea but it matched the version range support. It's not ideal.

Can you think of other bad situations and explain how they would be bad in ways where something doesn't tell you? I imagine there are other cases.

Again, using mirrors for a dev environment is not the goal of this. It never was the goal of the gopath flags in glide either. I just realize there will be some abuse of that situation.

@mattfarina
Copy link
Member Author

Two more notes:

  • I'm open to additional changes to stop people from hurting themselves. This PR is already large enough so those can go in as additional changes. I'm a fan of more small changes than fewer large ones.
  • The mirrors.yaml file being a separate file from any configuration file was intentional. It can be shared to different environments, like CI systems separately from any other configuration.

@sdboyer
Copy link
Member

sdboyer commented Aug 17, 2016

Ahh cool, OK - I realize my comments were probably too focused in on file:///. It seems we both fundamentally agree that relying on GOPATH is more or less an abuse - which means I think we're mostly on the same page here.

As a developer, I need to only use dependencies from my companies trusted store. These are public packages but the trusted ones are stored in a mirror for me. A place I've seen this several times with with debian packages of libraries. Many enterprises use them.

Cool, this is the use case I was trying to describe in my earlier comment.

As a developer, I need my CI/CD system to pull from a local mirror to my environment. An example of this, in a network sense, is the CI system for OpenStack. It does more volume than Travis CI, last I checked, and they do a lot of dependency installation. The CI system is distributed in several regions of several public clouds. To put the dependencies close to the CI runs for fast installation they have mirrors in each region.

Ah, yes. Pretty much functionally identical to the other use case (which is why they can both be serviced by the one feature).

The link you provided is about switching from Gems to the source and the rules around that switch. In Go we always use the source.

Right, so, let's dispatch with this first - the issue I was really centrally focused on was allowing file:///. From the OP:

Or, file:///path/to/local/repo if you want to fetch from the GOPATH or other local location.

If we allow file:///, then it's no longer true that, as you put it, we "always use the source." Now, sometimes we use the source, and sometimes we just use whatever happens to be at that location. Maybe it's source, maybe it isn't, but because we can't reliably infer anything from it, it creates all those nasty failure modes I enumerated above. This is a garbage-in makes for garbage-out-type situation. If we want to avoid spitting out garbage, we have to avoid taking it in.

Can you think of other bad situations and explain how they would be bad in ways where something doesn't tell you? I imagine there are other cases.

Yes, I gave five, one of which I think is equivalent to the two you gave. The latter three all involve subtle, reasonably easy slip-ups that the dev could make during the course of normal development - I did.


Bundler is a different case because it's a central package repo. Go is distributed so the mirroring functionality needs to handle distributed.

Right, SO! This is the crux of the issue. It's also a facet of a general problem in distributed architectures - who decides what names mean? (I won't wax poetic - just want to point out that this is known, and not easy.)

I think the safest way to approach the issue is to allow for a URL rewrite - perhaps as a regex - that is applied during the process of transforming an import path into a URL for source retrieval. This would, I think, satisfy #372. I haven't written support for doing it into gps yet, but doing so would be trivial - one of the express design goals in sdboyer/gps#83 is supporting this. There's also validation in there that precludes expressing file:///.

Now, URL rewrites + scheme validation would, I think, cover the true mirroring cases. True mirrors don't need to be reflected in the lock file, because they generally don't affect build outcomes (in practice they do, but those are distributed systems problems that we don't need to touch in this discussion).

URL rewrite + scheme validation would not cover #548, however, as that use case, while important, is not for a mirror; it's a fundamentally different type of source. I'm moving gps in the direction of supporting different source types - sdboyer/gps#83 introduced the explicit notion of a source, and that there can be types of them, and that there's a formal system for mapping different import paths to different source types, and that all that can ultimately be reflected in the lock.

@mattfarina
Copy link
Member Author

@sdboyer I don't think we really can block file:// paths because you could have your mirror be a shared filesystem. Mounting a shared filesystem (like NFS) isn't uncommon.

Rewrite rules is an interesting idea some of us have talked about in the past. That's worth doing.

@sdboyer
Copy link
Member

sdboyer commented Aug 17, 2016

@mattfarina I don't see how NFS is relevant? I could have source files mounted RO through ZFS into a container with the host machine synchronizing them over the network through a Cassandra-backed FUSE filesystem...or just have them on local disk. The filesystem is not the problem; provenance is. Knowing provenance is how a tool can create a lock file, because provenance tells us how we can recreate the source code later. This is exactly why bundler imposes the requirements it does.

If file:/// is allowed without restriction, then the word "mirror" is just wishful thinking. We're talking about a different type of source. These different sources - one a proper upstream vcs, the other some random filepath - might happen to contain the same code, but unless the tool has a way of knowing that source code's provenance, it can't provide any guarantees. Right now, vcs interaction is our only tool for handling provenance.

@@ -248,6 +248,16 @@ func (i *Installer) Export(conf *cfg.Config) error {
if err != nil {
return err
}
// defer func() {
// err = os.RemoveAll(tempDir)
Copy link
Member

@sdboyer sdboyer Aug 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i may have had this problem before, if the issue was os.RemoveAll() failing on windows? if so, it's fixed in go1.7, but for previous versions, this will fix it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a goof from debugging. Thanks for catching it.

@sdboyer
Copy link
Member

sdboyer commented Aug 17, 2016

In reading #548, i realize that i'd, again, misunderstood something about how you're intending that this be used. Sorry.

You're picturing that whatever's at that location be treated as a source. So, we have to inspect the path, determine if it is, or contains, a repo - or read this from the mirror.yaml file. And, if any of that doesn't work out, then we have a hard failure.

OK, yeah, I think this is a lot less harmful. Lemme ponder a bit, too. (The mechanism to feed this in to gps will be interesting...and the rewrites are still probably a good idea.)

@mattfarina mattfarina merged commit 6ca050c into master Aug 18, 2016
@mattfarina mattfarina deleted the feat/resolve-in-cache branch August 18, 2016 13:31
@sdboyer
Copy link
Member

sdboyer commented Aug 18, 2016

ah i see

@VladRassokhin
Copy link

Seems new functionality completely removed my 'vendor' folder onupdate because I've /tmp on tmpfs:

[INFO]  Replacing existing vendor dependencies
[ERROR] Unable to export dependencies to vendor directory: rename /tmp/glide-vendor175416651/vendor /media/data/devel/gopath/src/github.com/mkuzmin/terraform-vsphere/vendor: invalid cross-device link

@sdboyer sdboyer mentioned this pull request Aug 22, 2016
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants