-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix inheritance with quotes using shlex #3003
Conversation
Pythex is great. How do you find this stuff?! (Kind of a rhetorical question ... I might be just asking for a sarcastic "Let Me Google That For You" link 😁 ) |
Codacy can be pretty retarded - "possible hard code password". 😬 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
cc59f68
to
04364eb
Compare
I think I found it looking for something like "rubular for python". Rubular is the equivalent for Ruby (which is also a good joke... ruby... regular... rubular...).
That really didn't make sense to me, so I had to look it up in their source code. Looks like using token rings an alarm for bandit 🤷♂️ So edited the commit to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick comments for you!
04364eb
to
4b5fe77
Compare
@kinow By the way, @hjoliver I must confess to have a bit of a problem with the current comma delimited syntax. While it works OK for a list of values such as a list of durations, it does not work well for a list of shell logic. E.g. The example you have raised does not feel natural to my brain when I attempt to read it: echo hello %(event)s, echo goodbye %(event)s Where my brain would normal expect: echo hello %(event)s; echo goodbye %(event)s
# OR
echo hello %(event)s
echo goodbye %(event)s
# OR
echo hello %(event)s && echo goodbye %(event)s
# OR
echo hello %(event)s & echo goodbye %(event)s & wait
# OR
echo hello %(event)s &
echo goodbye %(event)s &
wait |
@matthewrmshin - I agree, but that's what we currently support! Because |
(Why do we support unquoted config values? Presumably a ConfigObj legacy thing), |
But |
Sorry, I'm confused as to what your point is! I was merely pointing out above that this is currently valid suite.rc syntax:
The RHS is not valid shell syntax as a whole, but that's irrelevant because the RHS is the value of a config item that contains a comma-separated list of shell command lines (only each list item has to be valid shell syntax). And ... if we decide to enforce quoting of suite.rc item values, then it will break some existing suites that have this sort of construct without quoting. I'm sure I'm stating the obvious here, must be just misunderstanding what your point is! |
Sorry, probably reading your comments upside-down! 🙃 Yes, I do understand the current syntax (and probably where it comes from). I was just saying that it does not feel natural when it is used to specify multiple event handlers. I am not suggesting that we remove backward compatibility just yet (unless that's what you want to discuss as well 😉). |
I think this line won't be parsed by But if there are other configuration values that could start with these quotes, and contain spaces, it would be good to know and see if we have coverage for those cases. I've changed how the lexer is created, and removed the space
All produce the same unquoted value, |
Not sure if we do cover that, we need to check. |
I always wondered what would be the use for a list of event handlers where the single handler wouldn't do. |
Off topic ( |
I think it would be nice to have a formal grammar for the formats supported by parsec. Most formats have something like a BNF spec that can be interpreted by a parser generated (e.g. ply, or antlr). If we had one, it could be used as source of truth for other lexers, documentation, code highlighting, auto-complete, and also when writing a parser. |
@kinow, do you mean a formal grammar for |
And if we switched to YAML, we would not need to do this ourselves(?) |
A formal grammar for
Not all of this. We could use pyaml to parse the YAML, but then there would be certain rules to be enforced (e.g. that a cyclepoint is an integer or a valid iso8601 time point). The e.g. for JSON: https://github.com/aws/aws-cli/blob/b01fb388e9e627bf512b11197e3e85d250822998/awscli/schema.py e.g. for YAML: https://github.com/aws/aws-cli/blob/b01fb388e9e627bf512b11197e3e85d250822998/awscli/customizations/cloudformation/yamlhelper.py (used around here later) |
Another reason to seriously consider switching to YAML @dpmatthews (when/if we have time etc.!) |
@kinow -
Is it broken? (According to Travis CI?) ... your comment above #3003 (comment) seems to imply you got it working with and without quotes? |
17f5fb0
to
ce43824
Compare
Yup, working with and without quotes, but I am not sure if that's the correct behaviour for all the keys in parsec parsing. Rebased and fixed conflicts. So I think it tests pass on Travis, then it would be ready for review again. Just needs someone that understands the format well enough to confirm if the change is sensible 😬 |
I think this fix will be fine, but will take a closer look in a bit. |
ce43824
to
04f6b09
Compare
Travis build was broken here, but the link is 404 now that the repository was renamed. So just rebased onto |
04f6b09
to
d7703ba
Compare
Travis build failed due to the new lexer approach returning blank strings too (e.g. |
Travis CI still unhappy... |
Oh, two builds failed. Kicked the first one, and it passed. So decided to kick the second one. If that does not fail, now my working copy is all sorted out, so will re-run any failed tests and update the PR. |
Failure of |
It is failing for
It is a valid suite on the I had a quick look at our docs, but the example there uses commas.
|
Will fix the conflict later 👍 |
I'd say yes. Others might say no. |
If yes, But with quoted values being separated by space, looks like it becomes too ambiguous for I used the following while testing and trying different parameters for value = '''some value = 123, another value, "style=filled" "fillcolor=green"'''
import shlex
lexer = shlex.shlex(value, posix=True, punctuation_chars=True)
lexer.debug = True
lexer.commenters = '#'
lexer.whitespace_split = False
lexer.whitespace = "\t\n\r,"
lexer.wordchars += " "
lexer = list(lexer)
print(lexer)
values = [t.strip() for t in lexer if t != "," and t.strip()]
print(values)
print(shlex.split(value, posix=False)) Which returned:
So perhaps the best will be to close this pull request and find an alternative solution. Maybe keeping the current approach with regular expressions and trying to work around the case described in #2700 |
I say no! As far as I was aware we only ever supported comma-separated list values. I hadn't noticed that aberrant case in a test suite. So I'm happy just to add commas to that, and stick with the approach on this PR. @matthewrmshin might want to say whether or not he has seen many users relying on the wrong syntax though (I haven't). Even if that is the case, it's not a difficult change to force on users so long as validation gives a clear error message. |
d7703ba
to
0024e25
Compare
Added a new unit test with some cases raised during a meeting. Build - finally - passing on Travis. Ready for review 👍 |
(Two approvals). |
Add changelog entry for #3003 (fix inheritance PR)
Hi,
I was doing some clean up in my local branches, and found one with a fix for #2700. As I am waiting for some code to be fixed for the
setup.py
, decided to work on this issue today in the afternoon.The issue #2700 happens due to the regex used, which misses the second unquoted value of the
inherit
configuration.Here's an example of what happens: https://pythex.org/?regex=%27(%5B%5E%27%5C%5C%5D*(%3F%3A%5C%5C.%5B%5E%27%5C%5C%5D*)*)%27&test_string=%27BIGFAM%27%2C%20SOMEFAM&ignorecase=1&multiline=1&dotall=1&verbose=0
I think we could try to fix that regex... possibly combining more regexes to match other cases? But an alternative solution would be to use the
shlex
module (used in other parts of Cylc I believe) instrip_and_unquote_list()
.Passing the raw configuration string value through shlex removes quotes as per POSIX. And I believe it uses lookup tables to parse the values, and only uses regex for putting quotes back (which we are not doing here). So there could be a tiny performance improvement too (maybe?).
Added some unit tests too. Will add @oliver-sanders as one of the reviewers as I've seen him working on even harder issues with regexes, and I think I saw the
shlex
module in his Python 3 pull request.Not sure if it will be fixed in time for next Cylc 7.x, so setting milestone as 8.0a1.
Cheers
Bruno