Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inheritance with quotes using shlex #3003

Merged
merged 4 commits into from
Jun 25, 2019

Conversation

kinow
Copy link
Member

@kinow kinow commented Mar 14, 2019

Hi,

I was doing some clean up in my local branches, and found one with a fix for #2700. As I am waiting for some code to be fixed for the setup.py, decided to work on this issue today in the afternoon.

The issue #2700 happens due to the regex used, which misses the second unquoted value of the inherit configuration.

Here's an example of what happens: https://pythex.org/?regex=%27(%5B%5E%27%5C%5C%5D*(%3F%3A%5C%5C.%5B%5E%27%5C%5C%5D*)*)%27&test_string=%27BIGFAM%27%2C%20SOMEFAM&ignorecase=1&multiline=1&dotall=1&verbose=0

I think we could try to fix that regex... possibly combining more regexes to match other cases? But an alternative solution would be to use the shlex module (used in other parts of Cylc I believe) in strip_and_unquote_list().

The shlex class makes it easy to write lexical analyzers for simple syntaxes resembling that of the Unix shell. This will often be useful for writing minilanguages, (for example, in run control files for Python applications) or for parsing quoted strings.

Passing the raw configuration string value through shlex removes quotes as per POSIX. And I believe it uses lookup tables to parse the values, and only uses regex for putting quotes back (which we are not doing here). So there could be a tiny performance improvement too (maybe?).

Added some unit tests too. Will add @oliver-sanders as one of the reviewers as I've seen him working on even harder issues with regexes, and I think I saw the shlex module in his Python 3 pull request.

Not sure if it will be fixed in time for next Cylc 7.x, so setting milestone as 8.0a1.

Cheers
Bruno

@kinow kinow added bug Something is wrong :( small labels Mar 14, 2019
@kinow kinow added this to the cylc-8.0a1 milestone Mar 14, 2019
@hjoliver
Copy link
Member

Pythex is great. How do you find this stuff?! (Kind of a rhetorical question ... I might be just asking for a sarcastic "Let Me Google That For You" link 😁 )

@hjoliver
Copy link
Member

Codacy can be pretty retarded - "possible hard code password". 😬

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kinow kinow force-pushed the fix-inheritance-with-quotes branch from cc59f68 to 04364eb Compare March 14, 2019 05:56
@kinow
Copy link
Member Author

kinow commented Mar 14, 2019

Pythex is great. How do you find this stuff?!

I think I found it looking for something like "rubular for python". Rubular is the equivalent for Ruby (which is also a good joke... ruby... regular... rubular...).

Codacy can be pretty retarded - "possible hard code password".

That really didn't make sense to me, so I had to look it up in their source code. Looks like using token rings an alarm for bandit 🤷‍♂️

https://github.com/PyCQA/bandit/blob/54d75a716ea6b56cef5ad88ab2de6324f36c0bab/bandit/plugins/general_hardcoded_password.py#L25

So edited the commit to use t instead of token (variable in a list comprehension scope only, not used anywhere else, and the lexer should give a hint that that's a token [plus shlex docs]).

@cylc cylc deleted a comment Mar 14, 2019
Copy link
Contributor

@matthewrmshin matthewrmshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick comments for you!

lib/cylc/tests/test_config.py Outdated Show resolved Hide resolved
lib/parsec/validate.py Outdated Show resolved Hide resolved
@kinow kinow force-pushed the fix-inheritance-with-quotes branch from 04364eb to 4b5fe77 Compare March 14, 2019 09:53
@matthewrmshin
Copy link
Contributor

@kinow By the way, shlex should work with bare words as well...

@hjoliver I must confess to have a bit of a problem with the current comma delimited syntax. While it works OK for a list of values such as a list of durations, it does not work well for a list of shell logic.

E.g. The example you have raised does not feel natural to my brain when I attempt to read it:

echo hello %(event)s, echo goodbye %(event)s

Where my brain would normal expect:

echo hello %(event)s; echo goodbye %(event)s
# OR
echo hello %(event)s
echo goodbye %(event)s
# OR
echo hello %(event)s && echo goodbye %(event)s
# OR
echo hello %(event)s & echo goodbye %(event)s & wait
# OR
echo hello %(event)s &
echo goodbye %(event)s &
wait

@hjoliver
Copy link
Member

@matthewrmshin - I agree, but that's what we currently support! Because suite.rc item values (whether list-valued or not) are not shell syntax and do not have to be quoted. I was just saying, suddenly requiring quotes would break some suites, because of this.

@hjoliver
Copy link
Member

(Why do we support unquoted config values? Presumably a ConfigObj legacy thing),

@kinow kinow mentioned this pull request Mar 18, 2019
@matthewrmshin
Copy link
Contributor

But shlex does not require words to be quoted?

@hjoliver
Copy link
Member

But shlex does not require words to be quoted?

Sorry, I'm confused as to what your point is!

I was merely pointing out above that this is currently valid suite.rc syntax:

handlers = echo hello %(event)s, echo goodbye %(event)

The RHS is not valid shell syntax as a whole, but that's irrelevant because the RHS is the value of a config item that contains a comma-separated list of shell command lines (only each list item has to be valid shell syntax). And ... if we decide to enforce quoting of suite.rc item values, then it will break some existing suites that have this sort of construct without quoting.

I'm sure I'm stating the obvious here, must be just misunderstanding what your point is!

@matthewrmshin
Copy link
Contributor

Sorry, probably reading your comments upside-down! 🙃 Yes, I do understand the current syntax (and probably where it comes from). I was just saying that it does not feel natural when it is used to specify multiple event handlers. I am not suggesting that we remove backward compatibility just yet (unless that's what you want to discuss as well 😉).

@kinow
Copy link
Member Author

kinow commented Mar 18, 2019

handlers = echo hello %(event)s, echo goodbye %(event)

I think this line won't be parsed by shlex, as it doesn't start with " or '?

But if there are other configuration values that could start with these quotes, and contain spaces, it would be good to know and see if we have coverage for those cases.

I've changed how the lexer is created, and removed the space from the list of whitespace (ignored), and added it to the lexer's wordchars. This way it will consume all characters and space, so

ABC DEF, "ABC DEF", 'ABC DEF'

All produce the same unquoted value, ABC DEF. Wondering now if Travis-CI will pass, and whether it covers the possible configuration values for Cylc that may contain quote/unquoted tokens.

@hjoliver
Copy link
Member

But if there are other configuration values that could start with these quotes, and contain spaces, it would be good to know and see if we have coverage for those cases

Not sure if we do cover that, we need to check.

@TomekTrzeciak
Copy link
Contributor

I was just saying that it does not feel natural when it is used to specify multiple event handlers.

I always wondered what would be the use for a list of event handlers where the single handler wouldn't do.

@hjoliver
Copy link
Member

I was just saying that it does not feel natural when it is used to specify multiple event handlers.

I always wondered what would be the use for a list of event handlers where the single handler wouldn't do.

Off topic (handlers was just an example - there could be other list valued config items with spaces) but ... it depends on whether or not you want a monolithic event handler that does several things, or several that do one thing each and can be used alone in other contexts. Or you might be using existing event handlers of the latter kind.

@kinow
Copy link
Member Author

kinow commented Mar 19, 2019

I think it would be nice to have a formal grammar for the formats supported by parsec. Most formats have something like a BNF spec that can be interpreted by a parser generated (e.g. ply, or antlr).

If we had one, it could be used as source of truth for other lexers, documentation, code highlighting, auto-complete, and also when writing a parser.

@hjoliver
Copy link
Member

@kinow, do you mean a formal grammar for suite.rc itself, or for non-trivial item values within it (graph strings being a prime example of that).

@hjoliver
Copy link
Member

And if we switched to YAML, we would not need to do this ourselves(?)

@kinow
Copy link
Member Author

kinow commented Mar 19, 2019

kinow, do you mean a formal grammar for suite.rc itself, or for non-trivial item values within it (graph strings being a prime example of that).

A formal grammar for suite.rc. That produced a result object, e.g. SuiteConfiguration (or even reuse the ParsecConfig objects).

And if we switched to YAML, we would not need to do this ourselves(?)

Not all of this. We could use pyaml to parse the YAML, but then there would be certain rules to be enforced (e.g. that a cyclepoint is an integer or a valid iso8601 time point). The aws-cli, for instance, reads JSON and YAML, and validates that the user has entered at least the expected values with the correct types.

e.g. for JSON: https://github.com/aws/aws-cli/blob/b01fb388e9e627bf512b11197e3e85d250822998/awscli/schema.py

e.g. for YAML: https://github.com/aws/aws-cli/blob/b01fb388e9e627bf512b11197e3e85d250822998/awscli/customizations/cloudformation/yamlhelper.py (used around here later)

@hjoliver
Copy link
Member

Another reason to seriously consider switching to YAML @dpmatthews (when/if we have time etc.!)

@hjoliver
Copy link
Member

@kinow -

The fix here is broken, but not sure which direction to go from here 😟

Is it broken? (According to Travis CI?) ... your comment above #3003 (comment) seems to imply you got it working with and without quotes?

@kinow kinow force-pushed the fix-inheritance-with-quotes branch from 17f5fb0 to ce43824 Compare April 23, 2019 00:42
@kinow
Copy link
Member Author

kinow commented Apr 23, 2019

Is it broken? (According to Travis CI?) ... your comment above #3003 (comment) seems to imply you got it working with and without quotes?

Yup, working with and without quotes, but I am not sure if that's the correct behaviour for all the keys in parsec parsing.

Rebased and fixed conflicts. So I think it tests pass on Travis, then it would be ready for review again. Just needs someone that understands the format well enough to confirm if the change is sensible 😬

@hjoliver
Copy link
Member

I think this fix will be fine, but will take a closer look in a bit.

@kinow kinow force-pushed the fix-inheritance-with-quotes branch from ce43824 to 04f6b09 Compare April 30, 2019 21:22
@kinow
Copy link
Member Author

kinow commented Apr 30, 2019

Travis build was broken here, but the link is 404 now that the repository was renamed. So just rebased onto master and push-force to kick a new build 👍

@kinow kinow force-pushed the fix-inheritance-with-quotes branch from 04f6b09 to d7703ba Compare May 1, 2019 02:52
@kinow
Copy link
Member Author

kinow commented May 1, 2019

Travis build failed due to the new lexer approach returning blank strings too (e.g. " "). Ignored these values and pushed a new commit.

@matthewrmshin
Copy link
Contributor

Travis CI still unhappy...

@kinow
Copy link
Member Author

kinow commented May 1, 2019

Oh, two builds failed. Kicked the first one, and it passed. So decided to kick the second one. If that does not fail, now my working copy is all sorted out, so will re-run any failed tests and update the PR.

@matthewrmshin
Copy link
Contributor

Failure of ./tests/validate/04-builtin-suites.t probably genuinely related to this change.

@kinow
Copy link
Member Author

kinow commented May 1, 2019

It is failing for cylc validate for the suite ./etc/examples/parameters/cycle-chunk/. Specifically on these lines:

ERROR - [visualization][node attributes]start = style=filled fillcolor=green
ERROR - [visualization][node attributes]stop = style=filled fillcolor=red
SuiteConfigError: Node attributes must be of the form 'key1=value1', 'key2=value2', etc.

It is a valid suite on the master branch. But in this pull request the logic was changed to support the quotes, but to be always expecting values quoted to be separated by commas. Are we supposed to accept lists as quoted values separated by spaces too?

I had a quick look at our docs, but the example there uses commas.

# excerpt
[visualization]
    [[node attributes]]
        start = "style=filled", "fillcolor=skyblue"
        foo = "style=filled", "fillcolor=slategray"
        bar = "style=filled", "fillcolor=seagreen3"
        stop = "style=filled", "fillcolor=orangered"

@kinow
Copy link
Member Author

kinow commented May 1, 2019

Will fix the conflict later 👍

@matthewrmshin
Copy link
Contributor

Are we supposed to accept lists as quoted values separated by spaces too?

I'd say yes. Others might say no.

@kinow
Copy link
Member Author

kinow commented May 1, 2019

I'd say yes. Others might say no.

If yes, shlex doesn't seem able to parse correctly the value '"style=filled" "fillcolor=green"'. It is able to split by white space too, but I actually had to include whitespace to the list of wordchars (consumed characters), and also use punctuation_chars to force it to accept values like "this is a valid value, second in the list".

But with quoted values being separated by space, looks like it becomes too ambiguous for shlex, and the value is simply returned as a complete string.

I used the following while testing and trying different parameters for shlex in a notebook:

value = '''some value = 123, another value, "style=filled" "fillcolor=green"'''

import shlex

lexer = shlex.shlex(value, posix=True, punctuation_chars=True)
lexer.debug = True
lexer.commenters = '#'
lexer.whitespace_split = False
lexer.whitespace = "\t\n\r,"
lexer.wordchars += " "

lexer = list(lexer)
print(lexer)

values = [t.strip() for t in lexer if t != "," and t.strip()]

print(values)

print(shlex.split(value, posix=False))

Which returned:

shlex: token='some value = 123'
shlex: token=' another value'
shlex: token=' style=filled fillcolor=green'
['some value = 123', ' another value', ' style=filled fillcolor=green']
['some value = 123', 'another value', 'style=filled fillcolor=green']
['some', 'value', '=', '123,', 'another', 'value,', '"style=filled"', '"fillcolor=green"']

So perhaps the best will be to close this pull request and find an alternative solution. Maybe keeping the current approach with regular expressions and trying to work around the case described in #2700

@hjoliver
Copy link
Member

hjoliver commented May 1, 2019

Are we supposed to accept lists as quoted values separated by spaces too?

I'd say yes. Others might say no.

I say no! As far as I was aware we only ever supported comma-separated list values. I hadn't noticed that aberrant case in a test suite. So I'm happy just to add commas to that, and stick with the approach on this PR.

@matthewrmshin might want to say whether or not he has seen many users relying on the wrong syntax though (I haven't). Even if that is the case, it's not a difficult change to force on users so long as validation gives a clear error message.

@kinow kinow force-pushed the fix-inheritance-with-quotes branch from d7703ba to 0024e25 Compare June 25, 2019 10:52
@kinow
Copy link
Member Author

kinow commented Jun 25, 2019

Added a new unit test with some cases raised during a meeting. Build - finally - passing on Travis. Ready for review 👍

@hjoliver hjoliver removed the request for review from oliver-sanders June 25, 2019 19:00
@hjoliver
Copy link
Member

(Two approvals).

@hjoliver hjoliver merged commit b431555 into cylc:master Jun 25, 2019
hjoliver added a commit that referenced this pull request Jun 25, 2019
Add changelog entry for #3003 (fix inheritance PR)
@kinow kinow deleted the fix-inheritance-with-quotes branch July 25, 2021 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :( small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants