Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

129 validate compose yml #1808

Merged
merged 17 commits into from
Aug 12, 2015
Merged

Conversation

mnowster
Copy link

@mnowster mnowster commented Aug 4, 2015

Introducing schema validation for compose yml.

Utilising jsonschema, we have a defined schema we can pass to it so it can validate the config that the user has specified.

The validation errors that are raised replicate existing behaviour and print out human readable error messages. It also expands to cover new scenarios and provide helpful error messages rather than stack traces.

Functionality in here should address each of the points raised in this issue: #129

✨ 🐈

@aanand
Copy link

aanand commented Aug 4, 2015

Looking great.

@thaJeztah
Copy link
Member

Nice!! thank you :-)

@mnowster mnowster force-pushed the 129-validate-compose-yml branch from 6bd2819 to 489edf7 Compare August 5, 2015 10:24
@mnowster mnowster force-pushed the 129-validate-compose-yml branch 2 times, most recently from 6d233bc to 096fd41 Compare August 7, 2015 10:34
@mnowster mnowster changed the title WIP: 129 validate compose yml 129 validate compose yml Aug 7, 2015
@mnowster mnowster force-pushed the 129-validate-compose-yml branch from 096fd41 to 368639b Compare August 7, 2015 10:53
@mnowster mnowster force-pushed the 129-validate-compose-yml branch from 368639b to 4cb7dc0 Compare August 7, 2015 11:03
Define a schema that we can pass to jsonschema to validate against the
config a user has supplied. This will help catch a wide variety of common
errors that occur.

If the config does not pass schema validation then it raises an exception
and prints out human readable reasons.

Signed-off-by: Mazz Mosley <mazz@houseofmnowster.com>
We validate the config against our schema before a service is created
so checking whether a service name is valid at time of instantiation of
the Service class is not needed.

Signed-off-by: Mazz Mosley <mazz@houseofmnowster.com>
Signed-off-by: Mazz Mosley <mazz@houseofmnowster.com>
Improves readability.

Signed-off-by: Mazz Mosley <mazz@houseofmnowster.com>
Signed-off-by: Mazz Mosley <mazz@houseofmnowster.com>
jsonschema provides a rich error tree of info, by parsing each error
we can pull out relevant info and re-write the error messages.

This covers current error handling behaviour.

This includes new error handling behaviour for types and formatting of
the ports field.

Signed-off-by: Mazz Mosley <mazz@houseofmnowster.com>
Signed-off-by: Mazz Mosley <mazz@houseofmnowster.com>
These functions weren't being called by anything.

Signed-off-by: Mazz Mosley <mazz@houseofmnowster.com>
Move validation out into its own file without causing circular
import errors.

Fix some of the tests to import from the right place.

Also fix tests that were not using valid test data, as the validation
schema is now firing telling you that you couldn't "just" have this
dict without a build/image config key.

Signed-off-by: Mazz Mosley <mazz@houseofmnowster.com>
@mnowster mnowster force-pushed the 129-validate-compose-yml branch from 4cb7dc0 to 2e428f9 Compare August 7, 2015 11:07
@mnowster
Copy link
Author

mnowster commented Aug 7, 2015

@aanand ok, I've squashed commits as far as I'm happy too without it becoming difficult to follow, imo.


"ports": {
"oneOf": [
{"type": "string", "format": "ports"},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually support a string value for ports - it must be a list.

Rather than implement the logic a second time, use docker-py
split_port function to test if the ports is valid.

Signed-off-by: Mazz Mosley <mazz@houseofmnowster.com>
@mnowster mnowster added this to the 1.5.0 milestone Aug 7, 2015
@funkyfuture
Copy link

ciao @mnowster, as i was already working on a config-validation, and formulated a lot of invalid service-dictionaries on that behalf. i tested these against your implementation and 80 of them failed. (and it's a total of 80, what baffles me the more. but i checked with debug-statements twice.)
shall i open a pull request on your repo to integrate these?

i'm in a mix of feelings here. i was about to do some work on my implementation, respectively the used validation-library first. (it deserves a better handling of error-messages incl. l11n.) another point that imo makes that library preferable is the possibility to use yaml for the schema, but that may be possible with jsonschema too.

on the other hand i'm relieved that i don't necessarily have that on my agenda anymore. and your implementation certainly is less invasive to the config-module than mine. seems to have gotten some proper refactoring in the meantime. :-)

@funkyfuture
Copy link

oh, the heat here… i found out that my tests do not cover the code, as they assume that validation happens after calling ServiceLoader.make_service_dict.

and it's to be criticized that it doesn't, as config-mixins are left out of validation. furthermore it makes more sense if you use the ServiceLoader in client-code.

i'll look into the test-behaviour with config.load when it cooled down.

@funkyfuture
Copy link

i opened a pull request with test-generators that will test a bunch of valid and invalid configurations. of these tests, 42 are failing.

i also roughly reviewed what i did in #1355 so far and want to emphasize which checks are imo missing here. my paradigm for that is the assumption that any configuration-error can possibly lead to data-loss and thus any validation must take racetime-conditions into consideration and anticipate any possible human- or machine-generated mistake.

  • validate complete configuration including anything resolved into it (paths, extends-, environment-files and such)
  • validate values for keys with a restricted set of allowed values (capabilities, net, memory, etc)
  • validate accessibility for referenced filesystem-objects
  • validate legal posix-paths for mount-points in containers
  • validate network-configuration like hostnames, ips, matching port-ranges, …

furthermore a nice-to-have:

  • propose keys that best fuzzy-match an invalid config-key

@mnowster
Copy link
Author

Hi @funkyfuture,

Thank you for your contribution to the initial spike on the approach.

I reviewed both your initial PR and draghuram's with @aanand and @bfirsh, and we decided on the jsonschema approach. An initial scope was agreed that we were seeking to replace stack traces with better error messages but doesn't go much further than what we already have with existing validation functionality, eg nothing extra for network config as the docker daemon takes care of this.

The requirement to get validation schema in for 1.5 release was increased in priority, as is the way of open source projects, so I took over the issue.

Thanks again for contributing.

🎈

@bfirsh
Copy link

bfirsh commented Aug 10, 2015

Does the schema get included in the Python source package? It might need adding to MANIFEST.in or something or rather...

@funkyfuture
Copy link

propably it's not a good idea to discuss this here, but i don't want to open a new issue unless you actually regard my concerns as an issue (or if you prefer: user story).

first of all, let me understand you correctly @mnowster:

The requirement to get validation schema in for 1.5 release was increased in priority, as is the way of open source projects

there surely are diverse ways to develop an open-source-project. do you rather mean "as is the way of agile development"? that'd make sense to me. otherwise i'd respond that there is always an alternative. if you mean "agile" and if this is the paradigm, me and my team will reconsider whether Compose is still the horse to bet on for orchestration in our production environments.

eg nothing extra for network config as the docker daemon takes care of this.

i remember this argument, but i also recall that it was regarded as a non-satisfying answer, because the Docker daemon handles single containers, not a set of 'em. thus, a misconfigured composition of containers can run into unknown outcomes if one container fails to start, but the others are started anyway.

to not validate anything that get's mixed in is imo a significant design flaw.

my perspective on this is one of a paranoid and extra-cautious sysadmin. is this perspective relevant for y'all? if so, what do you propose how to deal with it?

@aanand
Copy link

aanand commented Aug 10, 2015

@funkyfuture You're right - this isn't the place to discuss whether or not to validate things on the client that are already validated on the daemon. That's a complicated question, and you should feel free to open a separate issue for that.

As @mnowster said, this PR is an incremental step that should result in a significant reduction in cryptic stack traces for Compose users. We don't need to have a completely laid-out plan for our general approach to validation in order to merge it.

@mnowster
Copy link
Author

@bfirsh ah, that is a good point, I'll look into it, thank you.

While it was intended as a positive to be stricter in validation
it would in fact break backwards compatibility, which we do not
want to be doing.

Consider re-visiting this later and include a deprecation warning if
we want to be stricter.

Signed-off-by: Mazz Mosley <mazz@houseofmnowster.com>
Unfortunately the way that jsonschema is calling %r on its property
and then encoding the complete message means I've had to do this
manual way of removing the literal string prefix, u'.

eg:

key = 'extends'
message = "Invalid value for %r" % key
error.message = message.encode("utf-8")"

results in:

"Invalid value for u'extends'"

Performing a replace to strip out the extra "u'", does not change the
encoding of the string, it is at this point the character u followed
by a '.

Signed-off-by: Mazz Mosley <mazz@houseofmnowster.com>
The validation message was confusing by displaying only 1 level of
property of the service, even if the error was another level down.

Eg. if the 'files' property of 'extends' was the incorrect format,
it was displaying 'an invalid value for 'extends'', rather than
correctly retrieving 'files'.

Signed-off-by: Mazz Mosley <mazz@houseofmnowster.com>
When a schema type is set as unique, we should display the validation
error to indicate that non-unique values have been provided for a key.

Signed-off-by: Mazz Mosley <mazz@houseofmnowster.com>
Tiny bit of refactoring to make it clearer and only pop service_name
once.

Signed-off-by: Mazz Mosley <mazz@houseofmnowster.com>
We use $ref in the schema to allow us to specify multiple type, eg
command, it can be a string or a list of strings.

It required some extra parsing to retrieve a helpful type to display
in our error message rather than 'string or string'. Which while
correct, is not helpful. We value helpful.

Signed-off-by: Mazz Mosley <mazz@houseofmnowster.com>
Signed-off-by: Mazz Mosley <mazz@houseofmnowster.com>
@aanand
Copy link

aanand commented Aug 12, 2015

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants