-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mrg] Per spec configuration #888
Conversation
binderhub/builder.py
Outdated
@@ -424,9 +424,14 @@ def escape(s): | |||
|
|||
async def launch(self, kube, provider): | |||
"""Ask JupyterHub to launch the image.""" | |||
# Load the spec-specific configuration if it has been overridden | |||
spec_config = provider.spec_configuration_override | |||
|
|||
# check quota first | |||
if provider.has_higher_quota(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we retire this at the same time as we introduce the new functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably good to have a deprecation transition, in which case include a warning if the old way is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine w/ a transition (though we've still only ever had one release of BinderHub so the "higher_quota" never actually made it into a release lol)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True on paper and in practice we release several times a week and people actually use BinderHub so I think we should be nice to people. The "higher quota" thing has only existed for a week or two so :-/
warnings.warn(
"XXX is deprecated, use YYY instead",
DeprecationWarning
)
is the generic Python warnings thing. It isn't ideal (most people won't see deprecation warnings except in their tests). Does traitlets have a tool for deprecations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to the invisible-by-default deprecation warnings, I think that application-level deprecation warnings should actually not use DeprecationWarning, so they are visible by default. It's a tricky choice since deprecation warnings are always safely ignored "for now" but if folks never see them, they may as well not exist. As for a transition, I was mostly thinking of ourselves at mybinder.org, since it would allow us to decouple "adopt the new way" from "bump binderhub" PRs without config changes.
FWIW, I don't think the deprecation needs to be a blocker here, as the old way still works. Cleanly deprecating the old way can be a follow-up PR.
Does traitlets have a tool for deprecations?
Traitlets doesn't have any helpers for deprecations, but the way traitlets works can make writing deprecations handy since deprecations can go in observers, and completely removed from the bodies of methods, etc. for example:
@observe('old_way')
def _old_way_changed(self, change):
warnings.warn("old way is deprecated as of...new way is...")
self.new_way = modified_form(change.new)
# never need to look directly at self.old_way after this
...
def method(self):
only_use(self.new_way)
The simplest version of this can lead to undefined behavior (which wins?) if both the old and new way are specified in the same config, but I think that's not a big deal if the old-way warning is visible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using the traitlets observe
together with printing a warning (via warnings.warn()
) is a good way to do this.
binderhub/builder.py
Outdated
# check quota first | ||
if provider.has_higher_quota(): | ||
quota = self.settings.get('per_repo_quota_higher') | ||
elif "per_repo_quota" in spec_config: | ||
quota = spec_config.get('per_repo_quota') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quota = spec_config.get('per_repo_quota') | |
quota = spec_config.get('quota') |
(also on the line above)
Though you could write this as:
quota = spec_config.get('quota', self.settings.get('per_repo_quota'))
which is a well used pattern in Python for saying "get this from the dictionary and if it doesn't exist use this thing as default"
How about For documentation of which keys/options are valid I'd put a pointer in the traitlets' doc that says "repo providers are free to define their own options, see each repo provider for a list of valid keys and their meaning". Then add appropriate docs to the repo providers/base repo provider. |
binderhub/repoproviders.py
Outdated
@@ -111,6 +121,21 @@ def has_higher_quota(self): | |||
return True | |||
return False | |||
|
|||
def spec_configuration_override(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of naming this spec_config()
and taking care of finding and populating all the default values in this function? Then in the builder we wouldn't need to have any checks for "do we override this or not, what is the default". We have one place that is already ready to use.
The reason to call it spec_config
is that it would be for one particular spec and not a whole bunch any more. Maybe we could even call it repo_config
? Depends a bit on what words the builder uses (does it refer to building a spec or a repo). But this might be getting a bit OCD on naming :)
TL;DR: should we compute the configuration (combining over rides and defaults) for the spec here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it - that seems reasonable to me, lemme give it a stab and see how it looks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the latest push for an attempt at what this would look like, is this the kinda thing you had in mind?
1399d58#diff-c5688934f1e6dc3e932b6c84c1bbbd5dR133
In this case, spec_config
would be defined at the helm chart config level, and repo_config is a method of RepoProvider that returns a dictionary of configuration values for that repository (which might have been updated from the spec_config setting)
binderhub/builder.py
Outdated
@@ -424,9 +424,14 @@ def escape(s): | |||
|
|||
async def launch(self, kube, provider): | |||
"""Ask JupyterHub to launch the image.""" | |||
# Load the spec-specific configuration if it has been overridden | |||
spec_config = provider.spec_configuration_override | |||
|
|||
# check quota first | |||
if provider.has_higher_quota(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably good to have a deprecation transition, in which case include a warning if the old way is used.
binderhub/repoproviders.py
Outdated
for spec, config in self.spec_configuration_override: | ||
# Ignore case, because most git providers do not | ||
# count DS-100/textbook as different from ds-100/textbook | ||
if re.match(spec, self.spec, re.IGNORECASE): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, using re.match
compiles the regex on each call, which means we are compiling every regex here on every repo provider. Compiling them on application load would make this a bit more efficient.
Related to this: dicts are in a random order, but overrides might want to have a priority if more than one pattern matches the repo. With a dict, the behavior is undefined, with a list it would be consistent. So perhaps a list of tuples is more appropriate than a dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmmm, I am not sure how to do this :-) feel free to suggest the code that would still allow for config patterns like the one in #888 (comment) and I'll give it a shot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about the random order! I had pondered that one might have multiple patterns and if they should be additive or "last one wins" or what, then decided that for a first pass we should ignore that. However the random order thing is something we need to address (or inform the user that they have more than one pattern that matches and that this is an error (for now)).
I think what we want is a list like:
specs = [{"pattern": "some-pattern", "quota": 10}, {"pattern": "some-other-pattern", "quota": 12}]
in YAML it would look like:
# not super sure about the first one but the second two should work
- {pattern: "yet another pattern", quota: 33}
- pattern: some-pattern
quota: 10
- pattern: some-other-pattern
quota: 12
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I like this pattern because it’s more explicit! @minrk does it look good to you as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that looks great to me! The only question is that 'pattern' is in the same namespace as the config overrides. This is nice since it's more concise, but could allow collisions or confusion. The more rigorous, but slightly more tedious would be to explicitly separate the match from the override config:
- pattern: some-pattern
config:
quota: 10
other_config: "x"
This would allow adding other siblings to pattern
if we e.g. had some other options that should influence the selection rather than the overridden config. I don't have strong feelings, but I've occasionally regretted not separating things like this in JupyterHub in the past (see spawner options in the REST API).
You can see the difference in code where 'pattern' needs to be handled specially:
# copy because we need to modify the dict
update_config = config.copy()
# remove pattern before updating to config because pattern is not part of the config:
pattern = config.pop('pattern')
if pattern.match(...):
config.update(update_config)
vs:
pattern = item['pattern']
update_config = item['config']
if pattern.match(spec):
config.update(update_config)
That is a good point. Let's follow that suggestion. Being able to bump binderhub on mybinder.org without having to also change the config makes life easier (and lets us use henchbot!). |
See the latest push for the following updates:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments in-line, but this is looking good to me. I think the only decision left is what do we want to happen when two patterns match:
- both config overrides are applied, later in the list takes priority for any multi-matches (current behavior)
- only first match is applied (
break
after match) - only last match is applied (
break
after match, reverse iteration order) - both applied, but earlier entries have higher priority (reverse iteration order)
I think any choice is reasonable (current behavior is most powerful, but I suspect folks who don't think carefully about the implementation could expect items higher in their yaml config to have higher priority), but we should mention which one we chose in the docs.
For example, the current behavior:
If multiple patterns match a given repository, all matching overrides will be applied in the order they appear in the list, meaning that overrides in the last item in the list will have highest priority.
|
||
provider = GitHubRepoProvider( | ||
spec='jupyterhub/zero-to-jupyterhub-k8s/v0.4', | ||
config=[base_config] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spec_config?
I'd got with option (1) for the moment in terms of priority/when to stop. |
thanks for the feedback on tests etc The latest commit:
|
Nice work! Tests, docs and a new feature! What is the etiquette for marking conversations as "resolved"? Can I do it in a PR that isn't mine? Should the author do it? When I am the author I tend to mark things are solved (if they don't auto collapse) when I have implemented the feedback or the discussion has somehow ended. In this PR there were several people involved so it felt weird to do that but I also found myself scrolling through each discussion several times trying to not miss new comments. |
Re: "resolved" that's a good question, I'm generally happy for people to resolve things even if I'm the author on the PR, as it often means that my to-do list has shrunk :-) as with the other team compass guidelines, I trust the judgment of other folks on the project so I'm happy for them to take the initiative there |
In #887 we talked about how it'd be cool to have per-repo configuration. This is a first attempt at implementing it.
The basic idea is that we operate similar to how "banned" and "quota increase" functionality works, with one difference:
Instead of a list of specification regexes to match to "quota increase" or "banned", we have a dictionary where keys are specification regexes, and values are dictionaries of "key:value" pairs that can over-ride configuration on a repo-specific basis.
For example, you could do something like:
Questions to answer
quota: <int>
key/value. In the future, should we just expose this with good documentation for what the accepted key/values are? (we could also imagine some key/values being repoprovider-specific)Need to add