-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corner cases where getopt behavior is not mimiced: -- or --help as string values #601
Comments
#607 implements this. The Tokenizer did end up needing a big rewrite, but surprisingly, I was able to leave the parser almost entirely as-is, with the one exception of moving the pre-processor check for |
Lovely - this would help me a lot! I have an adjacent gotcha, which is the use of - in some utilities to mean stdin or stdout. For example, The release version of this tool presently appears to parse |
@Ozzard #607 also treats a bare |
Win! Any feel as to when we might see a new release with this in? |
@moh-hassan Could I get a code review of #608, which solves this issue as well as #600? |
@rmunn The caller program (gzip in our case) call the func getopt stop the processing of options and consider all the next are non-option values (even if start with -/--) Let is show these corner cases:
gzip output:
Why:
gzip output:
getopt find value at start and consider all the followed as values including -- and --help
gzip output: gzip use -a (although it's option ) as a value for -S, getopt allow -- to be a value for scalar option although gnu standard didn't mention that -- can be used as a value. The question: Is CLP Required to do this? getopt didn't return --help and use it as a value and didn't apply gnu standard 4.2 getopt allow an option (-a) to be a value for another option -S. Also, getopt has three modes of scanning and they are completely different:
These modes are controlled by an environment variable It's a wisdom to be care in resolving the corner boundaries in using -- and --help and also following GNU standard with open minded. Notes:
What is your suggestions based on the above behavior of getopt used by gzip? References:
|
@moh-hassan - What version of gzip were you using when you ran those command lines in your comment? From which Linux distribution? (Or was it FreeBSD/OpenBSD/some other Unix that's not Linux?) Because the examples of gzip behavior that you're showing are not the same results that I get when I run it. Here's the output of
This is from the Below, in between my comments on what you wrote, I've also taken the examples of gzip's behavior that you give in your comment above, and run them myself. You'll see that the results I get when I run the exact same command are different from what you see when you run gzip, and are the same results that I designed PR #607 to achieve. My comments on what you wrote:
Correct. The
Correct. Multi-values the way CLP does them (
This isn't exactly right. getopt has three modes of operation; one of them does do what you describe (stop when it finds a free value), but that's only the default in three cases:
In all the real-world Linux software I've ever experienced as a user, however,
What I get when I run
And the exit code is 0. The above is the same text that gzip prints as a result of
In the gzip version on my system, gzip uses
What I got when I ran
and exit code 0.
Again, the version of gzip that I ran did not stop at the first value. So
Here I get the same behavior as you: gzip does not consider
This is against the GNU coding standards I just linked to one paragraph above, which say: "Other options and arguments should be ignored once this" (that is, the
I believe it is, because we are trying to mimic the behavior of getopt. The GNU coding standards don't mention anything about what valid values can be for an option, because that's not something that the programmer using the
Same response as the paragraph above. Yes, CLP should follow this behavior, because that's what getopt is expected to do. After a value-taking option, the next argument should be treated as the value, no matter what it is.
Same response as the paragraph above. Yes, CLP should follow this behavior, because that's what getopt is expected to do. After a value-taking option, the next argument should be treated as the value, no matter what it is.
Almost every program I've seen uses the PERMUTE option (which is the default of
Yes, that could be implemented separately from my PR, quite easily. I fixed it in my PR because it was very easy to do, but if you want to reject my PR, then allowing
My PR honors AutoHelp=false. Or at least, it should; if there's any part of my code that fails to honor AutoHelp=false, that's a bug and I'll fix it.
I don't understand what you mean by "it can passed as """--""" without change", so I'll have to skip commenting on that part of this point. As for the rest, using
I would VERY strongly disagree. The getopt behavior is that any text (no matter what it is) that follows after a value-consuming option should be consumed. If it didn't work that way, then there are two scenarios that would be impossible or very difficult:
My suggestion is to mimic the default behavior of Note that the examples you gave above do not, except for example 3, follow the behavior of getopt_long, so I'm very, VERY curious to know which Linux distro those examples came from, and what the output of That means that:
This is the POSIX standard, which is what
This actually says that the default behavior of
I see this is calling SummarySorry about how long this was. The summary is: AFAICT, the version of gzip that you used to produce those examples is buggy, and the way |
@moh-hassan - I really do want to know what version of gzip you were using when you ran those command lines in your comment, and which Linux distribution it came from. Because if there are Linux distros out there whose standard tools do not allow interleaving options and values the way my gzip examples do (i.e., they stop processing options after the first value is encountered), then I should change the defaults on my PR. So if you got those examples from running a real gzip command, please let me know how to reproduce those results. (If you got them from reading the gzip source code and thinking that's how it would work, then I suspect you made a mistake about the defaults and that if you ran a real gzip command you would get the same results as me.) If I can reproduce the gzip results you got, I'll be better placed to judge whether my PR needs to change. |
This is from the gzip package, version 1.6, in Ubuntu 18.04.2 Linux.
NO, it's absolutely correct, but gzip has no control at all to disable Getopt(getopt_long) IGNORE VALIDATION of the next token, but CLP apply guards and validation (more than 16 validation rule) based on the Option class (including value and data type). It's not logic to take the primitive behavior of getopt_xxx and enforce CLP to take the next token blindly. See this example
It will generate a file: CLP validate the token and fire error and it is controlled by parser setting . For help, getoptxxx is not aware of Verbs and how to call help in verb scenario. If you want to trigger help with --help, -h from any position(as getopt mimiced), this can be done with a minor change in HelpText class using Regex, but again take the syntax of help verbs into account.
Can you imagine that I can do this mistake and imagine output based on source code reading? Summary The goal of this library is not to mimic the behavior of getopt, but to apply GNU standard for using short/long options (vs forward slash) with controlling parser behavior and apply validation rules on tokens and extra features. |
Ah, so you were using POSIXLY_CORRECT in those options. I didn't understand that, since you didn't show it in the And you're arguing that CLP should mimic the POSIX standard by default, whereas I'm arguing that it should mimic getopt's default (non-POSIX) behavior by default. Actually, it will be pretty easy to allow both; I'll tweak PR #607 to add a ParserSettings option called PosixlyCorrect that turns on the POSIX behavior (stop processing optons after first non-option argument), and I'll also make it honor the POSIXLY_CORRECT environment variable so that end users who expect that behavior can make it happen. (And after doing a bit of Googling on the subject myself, I've come to the conclusion that sometimes POSIXLY_CORRECT is what you want, but most of the time it's not since most people write Bash scripts with the assumption that getopt's default mixed-options-and-values behavior is what they're going to get. So allowing for both behaviors is definitely the right thing to do here. I'll leave it defaulting to mixed, since it seems that that's what most people expect, but there will be a ParserSettings option to change that (like putting a As for the question of validation of option values, I am firmly convinced that CLP should do exactly as much validation as is needed to validate the types of the options, and nothing more. I.e., if AFAICT, the changes I made to the parser don't change the validation of ints or other types: |
The goal of this library is to mimic the behavior of getopt, but there are a few corner cases where this library behaves differently than getopt would: in the handling of
--
or--help
when they are the value of a string parameter.How getopt behaves
First, an illustration of how getopt works with the particular corner case I'm demonstrating. Let's look at the standard
gzip
andgunzip
tools found with any Linux distribution. They take many options, but one of them is--suffix
(or-S
for short); this lets you specify a different suffix than the standard.gz
for the compressed file. E.g. if you have a README.md file in the current directory, thengzip -S .compressed README.md
will create a README.md.compressed file instead of README.md.gz.Now, what do you think will happen if I run this command?
The correct answer is that it will create a compressed file named
README.md--
in the current directory. Because the string--
was specified immediately after an option that takes a string value, it was processed as the value for that option (the--suffix
option), and so gzip created a file with a--
suffix instead of.gz
. Now look at these three examples:What do you think these will do? Answer:
--help
in the current directory, and create a file named--help.gz
.--help
in the current directory, and create a file named--help--
.Why did
gzip -S -- --help
print the help text? Because--
was the value for the-S
option, and so it was not treated as the "stop processing options now" marker. Then after the-S
option was fully processed, the only remaining options were--help
. Since--help
was encountered, gzip displayed the help screen and did nothing else.With the
gzip -S -- -- --help
line, OTOH, the first--
became the value for the-S
option. Then the second--
was processed as an option, and had the "stop processing options now" meaning. So the--help
text was treated as a value, and so it looked for a file named--help
to compress. And since I specified that the compressed suffix should be--
, the compressed file was named--help--
.What CommandLine does
The current way CommandLine works is to call a preprocessor function to look for any
--
options and, if found, mark anything found after them as a value. But this would mean that in thegzip -S -- --help
example, where the correct getopt-mimicing behavior would be to print the help text, CommandLine will instead return an error saying that-S
needed a value and didn't get one.This corner case actually shows a fundamental difference between the behavior of CommandLine and the behavior of getopt. CommandLine uses a tokenizer to parse the command-line arguments and decide, based on the presence of
-
or--
at the front, to treat them as Name tokens or Value tokens. But if you read thegetopt
source code and figure out what it's actually doing, it's parsing one argument at a time, deciding whether that argument needs a value, and then if a value is needed, it swallows the next argument without further processing. Which is why you can pass--
as the suffix in gzip, and it will happily accept that.What CommandLine should do
The tokenizer, instead of processing all the arguments at once and deciding whether they're names or values, should process each argument one at a time. Then the decision tree should look like:
--
and EnableDashDash is true? Then stop processing; the rest of the arguments are all values.--
and EnableDashDash is false? Then it is the value--
; continue processing the next argument.--
and contain an equals sign? Then split it into two tokens, the part before the=
is the name, and the part after the equals is the value. (Split at the first equals sign; any equals signs after that point would become part of the value).--
and not contain an equals sign? Then we look at the list of option longnames that the tokenizer was given:AllowMultiple=true
: this is a name token. Resume tokenizing with the next argument (it is NOT swallowed). (This allows for things like-v
or--verbose
to be passed multiple times, like-vvv
, which the parser will turn into Verbose=3 in the final options instance.)-
and contain only letters that match shortnames? Split it into multiple shortnames. (I.e.,-lR
would becomeName("l"), Name("R")
if there are both-l
and-R
options).-
and its first letter matches a shortname, but the rest does not? Split it into first letter & rest, and that's two tokens: Name(first letter) and Value(rest).-
and have only one letter? Then it's a shortname, and we look at the type of the option with that shortname:Conclusion
Unfortunately, if the goal of getopt compatibility is to be achieved, a big rewrite of the guts of CommandLine's tokenizer and parser will be needed, so this is a big job. But if we want to mimic the behavior of getopt, then that's what will be needed. And the behavior I described above is how getopt works.
Also unfortunately, this is probably going to be a breaking change, so it might end up requiring a 3.0 version number. Because some people might be very surprised when
--stringoption --booloption
ends up being parsed with--booloption
as the string value of--stringoption
; they would probably have come to expect that to produce aMissingValueOptionError
for--stringoption
. But surprise or not, the correct way to handle that is for--booloption
to be the string value of--stringoption
in that example.The text was updated successfully, but these errors were encountered: