-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add metadata validation #1562
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
""" | ||
Validation logic for package metadata, supporting PEP 566. | ||
See: https://www.python.org/dev/peps/pep-0566/ | ||
""" | ||
|
||
|
||
def single_line_str(val): | ||
"""A string that does not contain newline characters""" | ||
if val is None: | ||
return | ||
assert isinstance(val, str), \ | ||
"is of type %s but should be of type str" % type(val) | ||
assert '\n' not in val, \ | ||
"to contain a newline character (not allowed)" | ||
|
||
|
||
def multi_line_str(val): | ||
"""A string that can contain newline characters""" | ||
if val is None: | ||
return | ||
assert isinstance(val, str), \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do not use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would be similarly elegant? I don't want to do an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In this specific case, you cannot use it because it will not work. Assertions are to be treated as optional statements and if you run Python in optimized mode, they will simply not run at all, for example: $ python -Oc "assert False"
$ python -c "assert False"
Traceback (most recent call last):
File "<string>", line 1, in <module>
AssertionError In this particular case, I would either raise There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bittner Please update the code and use raise exception. Assert should be used only inside tests and I am sure that some linters are even able to identify accidental use outside tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ssbarnea I think at this point this PR needs a much bigger overhaul, and most of it may need to live in this repo. Probably best to hold off on cleaning up this specific code until the overhaul is done. |
||
"is of type %s but should be of type str" % type(val) | ||
|
||
|
||
def list_of_str(val): | ||
"""A list of single-line text strings""" | ||
if val is None: | ||
return | ||
assert isinstance(val, list), \ | ||
"is of type %s but should be of type list(str)" % type(val) | ||
for item in val: | ||
single_line_str(item) | ||
|
||
|
||
def set_of_str(val): | ||
"""A set (i.e. a list of unique entries) of single-line text strings""" | ||
if val is None: | ||
return | ||
assert isinstance(val, set), \ | ||
"is of type %s but should be of type set(str)" % type(val) | ||
for item in val: | ||
single_line_str(item) | ||
|
||
|
||
default_validator = single_line_str | ||
validators = dict( | ||
long_description=multi_line_str, | ||
classifiers=list_of_str, | ||
keywords=list_of_str, | ||
platforms=list_of_str, | ||
project_urls=list_of_str, | ||
requires=list_of_str, | ||
provides=list_of_str, | ||
provides_extras=set_of_str, | ||
obsoletes=list_of_str, | ||
) | ||
|
||
|
||
class Metadata: | ||
""" | ||
A validation object for PKG-INFO data fields | ||
""" | ||
|
||
def __init__(self, *data_dicts, **kwargs): | ||
""" | ||
Takes JSON-compatible metadata, as described in PEP 566, which | ||
can be added as both dictionaries or keyword arguments. | ||
""" | ||
self.errors = [] | ||
self.fields = [] | ||
|
||
for data in data_dicts: | ||
for key in data: | ||
setattr(self, key, data[key]) | ||
self.fields += [key] | ||
|
||
for key in kwargs: | ||
setattr(self, key, kwargs[key]) | ||
self.fields += [key] | ||
|
||
def validate(self, throw_exception=False): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's no need for try:
metadata.validate()
invalid = False
except InvalidMetadataError:
invalid = True There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please note that @di suggested to add validation functionality that behaves like I felt that throwing an exception by default would make code more clumsy. It's just not beautiful. -- I wouldn't want to miss that possibility in general, though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
He suggested no such thing. He suggested that something like the warehouse functionality go into a different project, pypa/packaging, not this one. I think you may have misunderstood him.
This is in general not an example of good design, because there are now two different error handling pathways. Additionally, the Pythonic way to signal an error condition is to raise an exception. It is not clumsy and in fact it leads to a lot of clumsy code in downstream consumers. The reason is that in most cases, the functions calling your function can do nothing about an error except abort early and notify the user, if you raise an exception in error conditions, anyone who isn't going to try to recover from the error can just use your function as if it always works because the exception you raise will bubble up the stack until someone either catches it or the program is aborted and the user is notified of the error. If you return an error code instead, you are relying on your consumers to know that your function can either return an error code of some sort or not and raise an error or a handler. |
||
""" | ||
Runs validation over all data, and sets the ``error`` attribute | ||
in case of validation errors. | ||
""" | ||
self.errors = [] | ||
|
||
for field in self.fields: | ||
validator = validators.get(field, default_validator) | ||
val = getattr(self, field) | ||
try: | ||
validator(val) | ||
except AssertionError as err: | ||
self.errors += [(field, err)] | ||
|
||
if throw_exception and self.errors: | ||
raise InvalidMetadataError(self.errors) | ||
|
||
return not self.errors | ||
|
||
|
||
class InvalidMetadataError(ValueError): | ||
"""An error that can be raised when validation fails""" | ||
|
||
def __init__(self, error_list): | ||
self.errors = error_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you wrapping
DistributionMetadata
in aMetadata
object? Even if there were not other problems with this approach, I would think that the validator could work just as well onDistributionMetadata
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. You're right.
On the other hand, consider the validation functionality suggested by @di: it requires that you pass a dictionary to the validation class constructor. How would you solve this with the
DistributionMetadata
constructor?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe @di was referring to more generic functionality for a package other than
setuptools
, so it depends on the details there. One option is to just generate a non-validated PKG-INFO and pass it to a function that validatesPKG-INFO
files.Another is to have an intermediate format that represents
PKG-INFO
as JSON (thus retaining some type information and avoiding the problem where content and format are indistinguishable) which is convertable bypackaging
into aPKG-INFO
file.packaging
could then validate the JSON file before aPKG-INFO
file is generated.If we're already trying to validate a
DistributionMetadata
file, then I would just validate the results of the actual getter operations.