-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a warning when a regex in a provider isn't valid #3258
Conversation
When a provider has an invalid regular expression, datalad currently provides obtuse error messages like "bad character range [re.py_compile:251] (error)". This catches any exceptions while compiling the re to explicitly warn the user that there is an invalid regular expression in the provider. Unfortunately, there doesn't seem to be any way to provide the path to the provider config file at this point in the code, so we simply give the provider name.
datalad/downloaders/providers.py
Outdated
if re.match(url_re, url): | ||
lgr.debug("Returning provider %s for url %s", provider, url) | ||
matching_providers.append(provider) | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't exception more specific? but oh my:
$> python -c "import re; re.match('(?<ZZ>.*)', '1')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python2.7/re.py", line 141, in match
return _compile(pattern, flags).match(string)
File "/usr/lib/python2.7/re.py", line 251, in _compile
raise error, v # invalid expression
sre_constants.error: syntax error
remnants of the past syntax (pre 2.5?)
Overall, here it is - re.error
import re;
try:
re.match('(?<ZZ>.*)', '1')
except re.error as exc:
print(type(exc))
Please add a unittest to datalad/downloaders/tests/test_providers.py
to assure that no exception is raised and the warning is logged ( with swallow_logs(new_level=logging.WARNING) as cml: ...assert_in(...)
- search for those to see how it is done)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't exception more specific? but oh my
The reason I just caught all exceptions is that I can't think of any type of exception that would fail to compile the regex that you wouldn't want to turn into a warning. Regardless of how it's invalid, it should probably warn.
Please add a unittest to [...]
Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is just a wrong regex - it is one thing. But may be it managed to run out of RAM and issued MemoryError right then -- should we just keep plowing forward? If user preseed Ctrl-C - should we ignore and just skip this particular regex issuing a warning that regex was invalid? ;-)
Regardless how unlikely, unless there is really an intention to ignore all errors, better to be specific. If something else unexpected happens (e.g. KeyboardInterrupt
) - it might need to be handled "upstairs".
yeah, ATM IIRC they are all identified by their names, not config paths. I guess could be changed/extended if there is a use case for it |
It would be especially useful for the error below this where it says multiple configs are found and it's using the first one. For this regex error it would be useful for the user to know the path, but the name should be enough to hunt it down, it's just not as easy. |
Codecov Report
@@ Coverage Diff @@
## master #3258 +/- ##
==========================================
- Coverage 91.02% 90.94% -0.08%
==========================================
Files 263 263
Lines 34125 34240 +115
==========================================
+ Hits 31062 31140 +78
- Misses 3063 3100 +37
Continue to review full report at Codecov.
|
Attempting to provide the error message from the exception was too ambitious. The exception is not the same in Python 2.7 and Python 3.
sweet - thanks! |
When a provider has an invalid regular expression, datalad
currently provides obtuse error messages like
"bad character range [re.py_compile:251] (error)".
This catches any exceptions while compiling the re to explicitly
warn the user that there is an invalid regular expression in the
provider.
Unfortunately, there doesn't seem to be any way to provide the
path to the provider config file at this point in the code, so we
simply give the provider name.