-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
values.schema.json ships with chart and configuration reference now covers all options #2033
values.schema.json ships with chart and configuration reference now covers all options #2033
Conversation
af9f7d7
to
a3d6140
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I AM ABSOLUTELY AMAZED BY THE AMOUNT OF WORK AND CARE YOU PUT INTO THIS. WOW
import yaml | ||
|
||
# Change current directory to this directory | ||
os.chdir(os.path.dirname(sys.argv[0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.chdir(os.path.dirname(sys.argv[0])) | |
os.chdir(os.path.abspath(os.path.dirname(__file__))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also see comment below in validate-against-schema - I don't think os.chdir
in scripts is usually something we want to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like the script to function the same way independently of where they are executed from.
I guess the crux of using os.chdir is that it can influence the environment calling the script? Running it from my terminal didn't influence the terminal, but it could perhaps?
I'm not sure on the path forward here, suggestions?
UPDATE: Will resolve this using https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/2033/files#r576103689 logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved by cb0ae7d
jupyterhub/values.yaml
Outdated
url: | ||
password: | ||
subPath: "" | ||
storageClassName: "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's set to empty rather than unset, will the resulting field be present? I think many fields in Kubernetes have adifferent effect when it's set to empty string - particularly, the defaults might not apply.
For example, an empty storageClassName has different behavior from having it be not present (see doc).
So my instinct is to allow anything that can be a non-container type (so strings, numbers, bools) be nullable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I think null for unset is often the right thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes storageClassName was one of those fields where we differentiate between unset and empty string!
{{- if typeIs "string" .Values.hub.db.pvc.storageClassName }}
storageClassName: {{ .Values.hub.db.pvc.storageClassName | quote }}
{{- end }}
I agree. Resolved by 1153aa3.
This is awesome! ❤️
I think we need to be careful with this a bit. Setting to null and leaving unset are not always equivalent, but often are, and that seems to be the norm for helm/kubernetes, but not for traitlets. If they are interpreted by the helm chart with things like The exception is the subset of config that is not direct passthrough, and instead is specifically transformed by the chart from values.yaml to traitlets config (e.g. |
import yaml | ||
|
||
# Change current directory to this directory | ||
os.chdir(os.path.dirname(sys.argv[0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also see comment below in validate-against-schema - I don't think os.chdir
in scripts is usually something we want to do.
jupyterhub/values.yaml
Outdated
url: | ||
password: | ||
subPath: "" | ||
storageClassName: "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I think null for unset is often the right thing to do.
@minrk regarding comments on Accept null or not? I think we are in agreement and have arrived at the same conclusions, but I find the topic quite complicated in general. Is it correct that you don't suggest changes other than those you have already explicitly pointed out? |
jupyterhub/values.yaml
Outdated
@@ -35,11 +35,11 @@ hub: | |||
annotations: {} | |||
ports: | |||
nodePort: | |||
loadBalancerIP: | |||
loadBalancerIP: "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should also be unset, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand why some vars are empty string and some are unset. My intuition is that default should be either a value we want (like uid: 1000
), unset, or empty container ([]
, {}
, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The principle ive had while writing these evolved to be:
1: if the helm chart provide an explicit default value: set it.
2: else if the helm chart config is a passthrough config for
jupyterhub/kubespawner/authenticator etc: we set it to null to represent the intent for the helm chart to not change the default value of the underlying config, and a falsy value to represent a wish for it to be explicitly set to falsy value.
3: else if, null snd a falsy instance type both imply a unset value, set it to the falsy instance type
4: else set it to null
To me, the most controversial point is the third. The question in my mind boils down to: if both null and a falsy type ("", {}, [], 0) by the helm templates imply that a value wont be set, should we specify the values falsy type of null?
I lean towards specifying the falsy instance type rather than null if the config isn't passthrough.
I think we break this logic on point 2 for array/objects passthrough config, which i think is motivated by avoiding warnings and help helm templste logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So with loadBalancerIP, I see the following code
{{- with .Values.proxy.service.loadBalancerIP }}
loadBalancerIP: {{ . }}
{{- end }}
So if I understand correctly, the loadBalancerIP
key in the Service object won't be present at all in the following circumstances:
.Values.proxy.service.loadBalancerIP
is specified in helm chart values but not set (current master situation).Values.proxy.service.loadBalancerIP
is set to""
(as in the PR)
Is this understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yepp! The with keyword using a falsy value is like a do nothing statement, it has become a practice to use it throughout the helm ecosystem over if ... Then ... - probably to avoid duplicating the reference needed in both the conditional statement and within
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. So if I see the values.yaml
file, it looks like I'll actually get loadBalancerIP: ""
in my Service object, which is confusing! Since k8s differentiates between unset values and ""
values, so should we - otherwise it can be pretty confusing. I think the mental model of if a value is unspecified (null), it is not present in the target
is better than if a value is falsey (by go template's definition of falsey) it will not be present in the target
. The latter also has exceptions, like storageClassName, making it even more confusing.
So ideally, I'd like to hold the mental model of 'if it is null in my values.yaml, it will not be present in the output Service object'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuvipanda I can make it so for strings/integers without problem I think, but if we apply the same principle to {}
and []
I believe there will be trouble such as helm emitting warnings, broken assumptions by helm template logic, and potentially a schema that fail to capture otherwise easy to capture issues.
I suggest we apply the principle that you suggest anyhow to our scalar strings and numbers specifically, but don't try to apply it on falsy values for object {}
and array []
, does that sound okay to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's perfect! I totally agree, @consideRatio :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for quickly iterating with me on this @yuvipanda!! I love it when I arrive at a concrete action point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a quite hard rule principle to follow without making exceptions. I was able to allow many strings to be null so our default values are now null
rather than ""
by default, but plenty of booleans remain required to be "boolean" rather than "boolean or null" still.
Example of complexity arisen:
If helm templates have logic making an eq
comparison between a variable and a string, then it will error if it holds a null value. And, since helm template logic will evaluate Y in statements like like if X and Y
, one is forced to do if X then if Y then ...
or if X and (Y | default "")
etc.
Anyhow, things work okay now, don't want to change back, I don't see a perfect tradeoff with regards to these considerations.
Resolved by 2c1cc57 (that also includes some misc changes deemed relevant)
I think I've addressed all review points again and that this PR is potentially ready for a merge. |
@consideRatio happy to hit merge after the merge conflict is resolved. |
The idea is to first bootstrap our schema with entries, and then do the work to describe all entries further for the configuration reference.
I decided that schema native values should always be set and not allowed to be null. But, for configuration values that are pass through configuration to JupyterHub and KubeSpawner for example, it made more sense to let them be null as the default of the Helm chart may be to not set it at all unless explicitly passed by a user.
35fd5db
to
6c2a38a
Compare
@yuvipanda 🎉 rebased to resolve simple merge conflict |
\o/ THANK YOU, @consideRatio! |
Wieeee ❤️ 🎉 thank you @yuvipanda!! |
Summary
helm
validation logic is packaged with the chart in values.schema.json, generated from schema.yaml - closes Helm 3: Values.yaml schema validations #1316Breaking changes
If someone has a configuration that overrides a default value to null or similarly, then they can get validation errors before the k8s api-server has been contacted by the
helm
CLI.Implementation notes
Helper scripts
tools/compare-values-schema-content.py
This new script isn't run as part of the CI system, but was helping me to spot if there were values in values.yaml that weren't in
schema.yaml and vice versa.
tools/validate-against-schema.py
This script was previously located in the chart folder and named validate.py, I've just moved it and made it validate with the lint-
and-validate-values.yaml file as well. This script now runs in a dedicated job part of the test-chart workflow, before it was undocumented and not part of the CI system.
tools/generate-json-schema.py
This new script reads schema.yaml, cleans it from descriptions that isn't relevant for the
helm
CLI's logic as far as I know, and emits jupyterhub/values.schema.json. This script runs in all chart tests and before we package and publish the Helm chart.Accept null or not?
I have considered the type enforcement of all values that require a string or integer and considered if they should allow null or not. The principle I concluded was that if the field is a Helm chart native configuration is should typically not allow null. But, if it is a passthrough configuration that isn't required by the Helm chart templates, then we should allow it be null in order to be able to use the default values of the actual software such as JupyterHub or KubeSpawner. For such values, null implies the default value used in the software rather than the default value of the Helm chart typically.
If we would support null for all values, we would risk introducing unneeded runtime errors helm template rendering errors that are far harder to interpret. Due to this, I have opted to avoid allowing null.
Sometimes the chart has given a blank string different meaning to null, such as for
fullnameOverride
. At this times null is also allowed.Notes on
helm
validation logicI don't think it is possible to be very strict and warn for passed configuration not part of the schema which would be useful if a user has misspelled an option. What we can do, is to declare required fields (
required
), and enforce the fields passed have accepted data types (type
).In other words, this is not an end to the mistakes you can make, but it is a good improvement. While there may be some fields that are required, I have only used that keyword to require the root objects such as
hub
andproxy
to be set.