Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Ability to configure VCS rewrites #372

Closed
ascandella opened this issue Apr 4, 2016 · 21 comments
Closed

Proposal: Ability to configure VCS rewrites #372

ascandella opened this issue Apr 4, 2016 · 21 comments

Comments

@ascandella
Copy link

So I realize this is a bit of a special case, but wanted to run by you to see if you'd be open to accepting a PR.

At my place of employment, we are switching to Glide (from Godeps). At the same time, we'd like to move away from vendoring. Glide is great for this, and we get reproducible builds when GitHub is up, but want to guard against repo renames/GH instability.

Internally, we have a read-through cache/mirror of GitHub repos, so that if we pull from a special git URI (e.g. git clone git@mirror:github/Masterminds/glide) it will read the latest state from GitHub, but still return you the code if GitHub goes down (via gitolite's mirroring options).

We'd like the ability to configure glide internally, such that if an engineer adds a github.com dependency to their glide.lock/yaml, it will get pulled through our mirror.

Alternatively, we are open to patching glide on the client side, such that any "glide get github.com/..." will result in the proper VCS settings in glide.yaml. This is slightly less preferable because then we have to distribute a patched glide.

I'm open to implementing this on an internal fork if you don't foresee others needing this functionality.

What I'm thinking is a ~/.glide.yaml that has a block like:

vcs:
  rewrites:
    from: git@github.com
    to: git@mirror:github/

What do you think?

@sdboyer
Copy link
Member

sdboyer commented Apr 5, 2016

I think you might be able to do this today using the repo field that you can declare on deps. Ordinarily it's used to declare aliases/forks, but I think it could be bent to this purpose.

If glide was to support this properly - particularly for the use pattern you're describing - I'd argue it would be preferable to have it be something where the URIs are switched in transparently - not something declared in glide.yaml. Yes, that would probably entail compiling a custom version of glide. Though, glide does have a "plugin" system already - perhaps that could be leveraged.

Point is, having to repeatedly add that information in glide.yaml is not only likely to promote errors and inconsistencies because user input, but would also result in the default behaviors of commands like glide get becoming suboptimal for your workflow (users would need to go into the glide.yaml later to add that information). Best to avoid design choices that effectively cripple other parts of the tool, if possible.

PS - does your internal gitolite service really have more uptime 9's than github? 😁

@ascandella
Copy link
Author

It actually does, believe it or not :)

Anyway, yes, we can (and do) accomplish this today by hand, by editing the glide.yaml file to point to our mirror. This all works well. The problem is that it's opt-in, so if somebody forgets to do it (and really, it's a pain to do manually anyway), then our build hosts will be pulling from github.com, which is considered a non-starter to our SRE team.

What you've suggested, transparently rewriting the URLs in a custom glide, is the approach that seems easiest (nothing to maintain for you, we can write code specifically tailored to our needs internally).

Before I start going down that road, I was curious whether this issue has come up before, and whether it's worth making generic for you. Happy to do it either way. It sounds like a custom glide on our build servers is the best way to go for now.

@sdboyer
Copy link
Member

sdboyer commented Apr 5, 2016

It actually does, believe it or not :)

I can! Mostly ribbing.

The problem is that it's opt-in, so if somebody forgets to do it (and really, it's a pain to do manually anyway)

Glad we're on the same page about that :) And I realize that I think I misread your original proposal - I was assuming you were suggesting an additional stanza on a per-dep basis, but instead you were suggesting a single, additional config stanza that would perform a general rewrite on all dep URIs it runs into. My bad.

If the concern is really just over ensuring that the build servers aren't reaching directly out to github, wouldn't the easier solution - and more fully within the SREs' control, I imagine - be to intercept that via DNS? I didn't mention that approach at first because I thought the goal was also to affect developers' environments, but if it's just build boxes you're concerned about...

In any case, yes, the issue's come up in discussion, and there's probably at least one issue with a similar flavor somewhere in the queue. IMO, there is space for adding something like this.

Since you're asking about this, though, and it does intersect a bit with something else, let me ask you: would the presence of a local cache of repositories address any of the issues here? If/when we get around to integrating the vsolver engine, glide will maintain a cache of repositories at some non-GOPATH location (configurable, defaulting to under your home dir), and reach out to upstream (e.g. github) as little as possible. They would behave in a similar fashion to a read-through cache, and replicating the cache dir/creating VM images with the cache dir prepopulated should be pretty trivial for your SREs. (That said, I don't know what synchronization strategy you're using to keep the gitolite mirrors up to date, but in both of the two I can imagine, the local-only caches will be slightly more stale.)

To be clear, these 'caches' are necessary whether or not you have the mirrored intermediaries. But I'm asking because one of the design choices I'm actively considering right now is how well to support "offline" use of these caches. left-pad-style problems, github going down, or being truly offline all fall loosely under this header; right now, I care about those in decreasing order.

@ascandella
Copy link
Author

DNS rewriting is interesting, but wouldn't be an out-of-the-box solution, in our case. It's not just hostname rewriting (regardless of the implications of SSH key verification), but the path attribute that comes after needs to have a "github/" prefix.

I think the local cache would totally solve our use case. The only behavior I'm not sure would be defined for the filesystem cache, as opposed to the gitolite cache, is what happens when somebody force-pushes their master branch, and this orphans a ref that eventually gets garbage collected? Rare, and totally avoidable if people are being good citizens, but we really don't want a production build to ever fail due to changes happening outside of our control.

Is there a timeline on the vsolver engine? Sounds like "TBD".

I think in the interim we will patch our internal glide build to have this behavior, and then at some point we may be able to drop it.

I'll close this for now. If somebody finds this issue at a future point in time and would like to see our code upstreamed, please re-open and I'll share.

Thanks!

@sdboyer
Copy link
Member

sdboyer commented Apr 5, 2016

is what happens when somebody force-pushes their master branch, and this orphans a ref that eventually gets garbage collected?

Or, gasp, when they move a tag to a different commit. Yep, we have to handle the bad citizen cases, and we do. Though I've opened an issue to cover it in particular: sdboyer/gps#6 . Kind of less "issue" and more "dump of thoughts," but gotta start somewhere.

we really don't want a production build to ever fail due to changes happening outside of our control.

IMO, that's a significant responsibility of a package management tool.

Is there a timeline on the vsolver engine? Sounds like "TBD".

We're trying to sort that out now. The code's coming along nicely - hopefully will be ready for integration in a matter of weeks. But there are a number of changes to glide's workflows and options that we need to sort out. @technosophos, @mattfarina and I need to talk that through.

In the meantime, I wouldn't at all mind you opening an issue (and/or jumping on that one I linked earlier) detailing the expectations you'd around the behavior of a local cache.

I think in the interim we will patch our internal glide build to have this behavior, and then at some point we may be able to drop it.

Fingers crossed!

@mattfarina
Copy link
Member

@sectioneight this is actually a duplicate of #39. I identified early on that we need this.

Having overriding aliases is definitely something we need to have for many enterprise environments and some places that require high control. Where I work we need the same level of control your SREs require.

If you're going to craft this for your own needs would you want to try and contribute it?

@technosophos
Copy link
Member

I believe @arschles is also working on this. If the git protocol were
restricted to HTTP, would we be able to sneak in an HTTP proxy solution?

On Tue, Apr 5, 2016, 7:05 PM Matt Farina notifications@github.com wrote:

@sectioneight https://github.com/sectioneight this is actually a
duplicate of #39 #39. I
identified early on that we need this.

Having overriding aliases is definitely something we need to have for many
enterprise environments and some places that require high control. Where I
work we need the same level of control your SREs require.

If you're going to craft this for your own needs would you want to try and
contribute it?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#372 (comment)

@sdboyer
Copy link
Member

sdboyer commented Apr 6, 2016

@technosophos restricting to http has its own weirdness, i think - it would require (potentially) ignoring, or at least rewriting, custom-set repo properties on a project. while i can see an argument for how that sorta makes sense...well, it just seems like creating a too-long-for-sanity tail of options overriding one another.

of course, i suppose that'd also depend on the implementation you have in mind :)

@mattfarina
Copy link
Member

@technosophos @sdboyer I think it's far easier than that. When a vcs.Repo instance is created we do a lookup on the remote from a list. If the one we were going to use is there and an alternative is listed use that instead.

@sdboyer
Copy link
Member

sdboyer commented Apr 6, 2016

@mattfarina ok - where is that list coming from?

such a list has also been on my thoughts, because (assuming that i'm thinking what you're thinking) i need to include it as part of the identity of a source URI in the solver. basically, we're talking a list of URLs that are all capable of locating the same URI, right?

@crsmithdev
Copy link

@technosophos @mattfarina I'm working with @sectioneight and may submit a PR with this functionality in the near future (rewrite rules)...how would you feel about supporting this in some kind of global config (~/.gliderc or similar)? It could be applicable on a per-project or global basis; for us, being able to specify it globally makes it easier to ensure that all of our engineers are using it and they don't have to add it to every single project using Glide.

Then again, it's adding a new global config file, so I figured I'd ask specifically first.

@technosophos
Copy link
Member

@crsmithdev That is sorta the direction I was thinking, as well. I guess we could do something like "load the global .gliderc, and if there's a local .gliderc (e.g. in same dir as glie.yaml), merge that with the global". But even starting with a global config gives us a strong starting point.

@vektah
Copy link

vektah commented Apr 19, 2016

I guess we could do something like "load the global .gliderc, and if there's a local .gliderc (e.g. in same dir as glie.yaml), merge that with the global". But even starting with a global config gives us a strong starting point.

I would love to see this.

In our CI stack the infrastructure is shared between multiple projects with different config, writing to the $HOME is likely to cause concurrency and clean up issues. A file that can be git ignored and put into the project dir by the initialization hook would be great.

@arschles
Copy link

Late to the party here, responding to @technosophos. I've been working on a proxy (https://github.com/arschles/goprox) that I just got minimally working. It serves the same functionality as the gitolite solution that @sectioneight originally mentioned, but backs each repo to S3. On the functionality side, I'm planning on adding some other small features (like aliasing, for those who don't use glide) and a server admin tool, but it's just about feature complete now on the end-user side.

Anyway, I discussed with @technosophos a few weeks ago, and the topic of rewriting transitive dependencies came up. What's the thinking on this?

As a concrete example, if I have import github.com/gorilla/mux in my code, and the git URL is aliased for the repo that corresponds to (whether with repo: or some future awesomeness), the git URL for github.com/gorilla/context, a dependency of gorilla/mux, would have to be rewritten. I wasn't clear from the above examples (like the content in ~/.glide.yaml) if the rewriting would apply to all resolved dependencies, not just top-level ones.

@ascandella
Copy link
Author

Good question about transitive dependencies. We're currently solving this by patching masterminds/vcs internally, but there will likely need to be a PR to vcs as well to support transitive dep rewrites.

@arschles
Copy link

thanks for letting me know @sectioneight. fwiw, I'd want to rewrite the entire dependency tree by default. rewrites would be significantly less useful for me if I was only able to rewrite direct dependencies (i.e. at the top of the dep tree).

The k8s.io/kubernetes dependency tree is a good example - it's very wide and in some cases, very deep. Ideally I'd like to have all those dependencies (or, ideally packages that match a regex) rewritten. In this specific case, I'd rewrite them to a proxy to (significantly) increase download speeds.

@ascandella
Copy link
Author

Yes, our end goal is the same: transparent rewrites including transitive dependencies

@sdboyer
Copy link
Member

sdboyer commented Apr 25, 2016

i started writing a response here, but ended up not putting it in because...reasons. so i'll just note that i suspect that the first post-vsolver integration release of glide (watch #384 for progress) will have this pretty much hammered out. no (more) changes to Masterminds/vcs necessary.

@arschles
Copy link

@sdboyer @sectioneight is there an issue/issues that I can follow related to that work?

@sdboyer
Copy link
Member

sdboyer commented Apr 25, 2016

@arschles yep - initial vsolver integration is being worked on in #384 . This particular problem is second or third in line of knotty problems that need addressing before I would feel it's release-ready.

I also jotted down some of my notes in a vsolver issue (sdboyer/gps#10), though I'm now revisiting that thinking a bit. The issue is probably a bit arcane, but if you want to discuss, I'm happy to clarify as needed.

@arschles
Copy link

Thanks @sdboyer !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants