Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corner cases where getopt behavior is not mimiced: -- or --help as string values #601

Open
rmunn opened this issue Mar 19, 2020 · 10 comments

Comments

@rmunn
Copy link
Contributor

rmunn commented Mar 19, 2020

The goal of this library is to mimic the behavior of getopt, but there are a few corner cases where this library behaves differently than getopt would: in the handling of -- or --help when they are the value of a string parameter.

How getopt behaves

First, an illustration of how getopt works with the particular corner case I'm demonstrating. Let's look at the standard gzip and gunzip tools found with any Linux distribution. They take many options, but one of them is --suffix (or -S for short); this lets you specify a different suffix than the standard .gz for the compressed file. E.g. if you have a README.md file in the current directory, then gzip -S .compressed README.md will create a README.md.compressed file instead of README.md.gz.

Now, what do you think will happen if I run this command?

gzip -S -- README.md

The correct answer is that it will create a compressed file named README.md-- in the current directory. Because the string -- was specified immediately after an option that takes a string value, it was processed as the value for that option (the --suffix option), and so gzip created a file with a -- suffix instead of .gz. Now look at these three examples:

1. gzip -- --help
2. gzip -S -- --help
3. gzip -S -- -- --help

What do you think these will do? Answer:

  1. This will compress a file named --help in the current directory, and create a file named --help.gz.
  2. This will print the help text, and do nothing else.
  3. This will compress a file named --help in the current directory, and create a file named --help--.

Why did gzip -S -- --help print the help text? Because -- was the value for the -S option, and so it was not treated as the "stop processing options now" marker. Then after the -S option was fully processed, the only remaining options were --help. Since --help was encountered, gzip displayed the help screen and did nothing else.

With the gzip -S -- -- --help line, OTOH, the first -- became the value for the -S option. Then the second -- was processed as an option, and had the "stop processing options now" meaning. So the --help text was treated as a value, and so it looked for a file named --help to compress. And since I specified that the compressed suffix should be --, the compressed file was named --help--.

What CommandLine does

The current way CommandLine works is to call a preprocessor function to look for any -- options and, if found, mark anything found after them as a value. But this would mean that in the gzip -S -- --help example, where the correct getopt-mimicing behavior would be to print the help text, CommandLine will instead return an error saying that -S needed a value and didn't get one.

This corner case actually shows a fundamental difference between the behavior of CommandLine and the behavior of getopt. CommandLine uses a tokenizer to parse the command-line arguments and decide, based on the presence of - or -- at the front, to treat them as Name tokens or Value tokens. But if you read the getopt source code and figure out what it's actually doing, it's parsing one argument at a time, deciding whether that argument needs a value, and then if a value is needed, it swallows the next argument without further processing. Which is why you can pass -- as the suffix in gzip, and it will happily accept that.

What CommandLine should do

The tokenizer, instead of processing all the arguments at once and deciding whether they're names or values, should process each argument one at a time. Then the decision tree should look like:

  • Is this option exactly -- and EnableDashDash is true? Then stop processing; the rest of the arguments are all values.
  • Is this option exactly -- and EnableDashDash is false? Then it is the value --; continue processing the next argument.
  • Does this option start with -- and contain an equals sign? Then split it into two tokens, the part before the = is the name, and the part after the equals is the value. (Split at the first equals sign; any equals signs after that point would become part of the value).
  • Does this option start with -- and not contain an equals sign? Then we look at the list of option longnames that the tokenizer was given:
    • Name matches a boolean option: this is a name token. Resume tokenizing with the next argument (it is NOT swallowed).
    • NEW FEATURE: Name matches an int option and the option attribute has AllowMultiple=true: this is a name token. Resume tokenizing with the next argument (it is NOT swallowed). (This allows for things like -v or --verbose to be passed multiple times, like -vvv, which the parser will turn into Verbose=3 in the final options instance.)
    • Name matches an option that's neither of the two cases above (boolean or int with AllowMultiple): this is a name token, and the next argument is a value token no matter what it is. "Swallow" the next argument, and resume tokenizing with the argument after next.
  • Does the option start with - and contain only letters that match shortnames? Split it into multiple shortnames. (I.e., -lR would become Name("l"), Name("R") if there are both -l and -R options).
  • Does the option start with - and its first letter matches a shortname, but the rest does not? Split it into first letter & rest, and that's two tokens: Name(first letter) and Value(rest).
  • Does the option start with - and have only one letter? Then it's a shortname, and we look at the type of the option with that shortname:
    • As above, if boolean, then don't swallow the next argument.
    • NEW FEATURE: As above, if int with AllowMultiple, then don't swallow the next argument.
    • As above, if other type, then swallow the next argument (WHATEVER it is) and treat it as a value.

Conclusion

Unfortunately, if the goal of getopt compatibility is to be achieved, a big rewrite of the guts of CommandLine's tokenizer and parser will be needed, so this is a big job. But if we want to mimic the behavior of getopt, then that's what will be needed. And the behavior I described above is how getopt works.

Also unfortunately, this is probably going to be a breaking change, so it might end up requiring a 3.0 version number. Because some people might be very surprised when --stringoption --booloption ends up being parsed with --booloption as the string value of --stringoption; they would probably have come to expect that to produce a MissingValueOptionError for --stringoption. But surprise or not, the correct way to handle that is for --booloption to be the string value of --stringoption in that example.

@rmunn
Copy link
Contributor Author

rmunn commented Mar 27, 2020

#607 implements this. The Tokenizer did end up needing a big rewrite, but surprisingly, I was able to leave the parser almost entirely as-is, with the one exception of moving the pre-processor check for --help and --version to be based on Tokens instead of strings. (This allows us to ignore cases where --help was a string value to a different option). And almost no tests needed changing except for the ones whose semantics explicitly change in #607. So this is probably not a breaking change after all.

@Ozzard
Copy link

Ozzard commented Apr 5, 2020

Lovely - this would help me a lot! I have an adjacent gotcha, which is the use of - in some utilities to mean stdin or stdout. For example, (cd /src/dir; tar -cf - .) | (cd /dest/dir; tar -xvf -) emits the tarfile to stdout in the first command, and consumes it from stdout in the second.

The release version of this tool presently appears to parse - as an empty option prefix. It shouldn't in the same situation as #607 fixes; a lone - after an option that takes an argument should be the argument.

@rmunn
Copy link
Contributor Author

rmunn commented Apr 5, 2020

@Ozzard #607 also treats a bare - as a value — see https://github.com/commandlineparser/commandline/pull/607/files#diff-c55127e12f4102753e3927ba25bfba42R59 — for precisely that reason. It's up to you to convert the - into stdin or stdout as appropriate, but it will be a value and not an empty option.

@Ozzard
Copy link

Ozzard commented Apr 5, 2020

Win! Any feel as to when we might see a new release with this in?

@rmunn
Copy link
Contributor Author

rmunn commented May 20, 2020

@moh-hassan Could I get a code review of #608, which solves this issue as well as #600?

@moh-hassan
Copy link
Collaborator

@rmunn
Discussion 1.0:
getopt process commandline one option at a time in order. getopt is not aware if -h or --help are used for displaying help, and can't enforce the caller program to use -h/--help for displaying help.

The caller program (gzip in our case) call the func getopt in a loop and process every option as a switch or scalar with value (only one value). Multi values are not supported.
The output of getopt is a vector with options at the first and values are next (also values can be mixed in between based on the mode of scanning).

getopt stop the processing of options and consider all the next are non-option values (even if start with -/--)
a) when find --
b) when find a free value !!!

Let is show these corner cases:

# Example1:
$ gzip -S -- file --help

gzip output:

gzip: file: No such file or directory
gzip: --help: No such file or directory
gzip didn't display help because it didn't receive --help from getopt.

Why:
gzip use -- as value for -S.
find value 'file' and stop processing options, and consider all the next as values including --help.
It didn't display help and didn't apply rule 4.2 for help.

# Example2:
$ gzip file -S -- --help

gzip output:

gzip: file: No such file or directory
gzip: -S: No such file or directory
gzip: --: No such file or directory
gzip: --help: No such file or directory

getopt find value at start and consider all the followed as values including -- and --help

# example 3:
gzip -S -a --help

gzip output:

gzip use -a (although it's option ) as a value for -S,
find --help and display help

note: CLP display error message missing values for both -S and -a and display help with errors.

getopt allow -- to be a value for scalar option although gnu standard didn't mention that -- can be used as a value.

The question: Is CLP Required to do this?

getopt didn't return --help and use it as a value and didn't apply gnu standard 4.2
The question: Is CLP Required to do this?

getopt allow an option (-a) to be a value for another option -S.
The question: Is CLP Required to do this?

Also, getopt has three modes of scanning and they are completely different:

REQUIRE_ORDER, PERMUTE, RETURN_IN_ORDER

These modes are controlled by an environment variable POSIXLY_CORRECT or + or - passed in front of the Short Option string.
-- and --help can be handled differently based on the active scanning mode that is left to the caller program.

It's a wisdom to be care in resolving the corner boundaries in using -- and --help and also following GNU standard with open minded.

Notes:

  1. Considering - (single dash) as a value was one of the missed feature needed by developers and can be implemented and didn't conflict with GNU standard or getopt corner cases.
  2. AutoHelp=false, give the freedom for developer to provide his helptext and not/use -h/--help for displaying help.
  3. If it's allowed to use -- as a value it should be declared as a setting in ParserSetting although it can passed as """--""" without change.
  4. If it's allowed to make an option to be a value like example 3, it should be avoided.

What is your suggestions based on the above behavior of getopt used by gzip?

References:

@rmunn
Copy link
Contributor Author

rmunn commented Jun 2, 2020

@moh-hassan - What version of gzip were you using when you ran those command lines in your comment? From which Linux distribution? (Or was it FreeBSD/OpenBSD/some other Unix that's not Linux?) Because the examples of gzip behavior that you're showing are not the same results that I get when I run it. Here's the output of gzip --version on my system:

gzip 1.6
Copyright (C) 2007, 2010, 2011 Free Software Foundation, Inc.
Copyright (C) 1993 Jean-loup Gailly.
This is free software.  You may redistribute copies of it under the terms of
the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.

Written by Jean-loup Gailly.

This is from the gzip package, version 1.6-5ubuntu1, in Ubuntu Linux.

Below, in between my comments on what you wrote, I've also taken the examples of gzip's behavior that you give in your comment above, and run them myself. You'll see that the results I get when I run the exact same command are different from what you see when you run gzip, and are the same results that I designed PR #607 to achieve.

My comments on what you wrote:

@rmunn
Discussion 1.0:
getopt process commandline one option at a time in order. getopt is not aware if -h or --help are used for displaying help, and can't enforce the caller program to use -h/--help for displaying help.

Correct. The -h/--help is a convention, though a strongly recommended one in the GNU standard. Nevertheless, programs are free not to use -h for --help, which is why in #608 I made it optional (and defaulting to off) to have -h as a shortname for --help. I would recommend defaulting it to on at some point, because I personally find it annoying when someprog -h doesn't display the help text, but I don't know how many people feel the same as me here.

The caller program (gzip in our case) call the func getopt in a loop and process every option as a switch or scalar with value (only one value). Multi values are not supported.
The output of getopt is a vector with options at the first and values are next (also values can be mixed in between based on the mode of scanning).

Correct. Multi-values the way CLP does them (--someopt one two three --someotheropt) where someopt is an IEnumerable<string> are an extension to the GNU standard, and I actually don't like them. But I'm not going to remove them from CLP; I'm trying to make a non-breaking change here. (More on that below).

getopt stop the processing of options and consider all the next are non-option values (even if start with -/--)
a) when find --
b) when find a free value !!!

This isn't exactly right. getopt has three modes of operation; one of them does do what you describe (stop when it finds a free value), but that's only the default in three cases:

  1. When it's called as getopt rather than getopt_long, so that long options (the --foo ones) are not allowed. Note that almost every real example I could find uses getopt_long, so this is not at all widely used in practice.

  2. When the code calling getopt_long requests this mode by putting a + as the first character of the options string.

  3. When the environment variable POSIXLY_CORRECT is set (which allows the end user to select this mode of operations if they want).

In all the real-world Linux software I've ever experienced as a user, however, getopt is being called as getopt_long, so that long options are allowed, and also options can be placed after values. I.e., your point b) here is correct in theory, but in practice nobody asks getopt to do that, and everybody wants options and values to be interspersed freely.

Let is show these corner cases:

# Example1:
$ gzip -S -- file --help

gzip output:

gzip: file: No such file or directory
gzip: --help: No such file or directory
gzip didn't display help because it didn't receive --help from getopt.

What I get when I run gzip -S -- file --help is:

Usage: gzip [OPTION]... [FILE]...
Compress or uncompress FILEs (by default, compress FILES in-place).

Mandatory arguments to long options are mandatory for short options too.

  -c, --stdout      write on standard output, keep original files unchanged
  -d, --decompress  decompress
  -f, --force       force overwrite of output file and compress links
  -h, --help        give this help
  -k, --keep        keep (don't delete) input files
  -l, --list        list compressed file contents
  -L, --license     display software license
  -n, --no-name     do not save or restore the original name and time stamp
  -N, --name        save or restore the original name and time stamp
  -q, --quiet       suppress all warnings
  -r, --recursive   operate recursively on directories
  -S, --suffix=SUF  use suffix SUF on compressed files
  -t, --test        test compressed file integrity
  -v, --verbose     verbose mode
  -V, --version     display version number
  -1, --fast        compress faster
  -9, --best        compress better
  --rsyncable       Make rsync-friendly archive

With no FILE, or when FILE is -, read standard input.

Report bugs to <bug-gzip@gnu.org>.

And the exit code is 0. The above is the same text that gzip prints as a result of gzip --help. For the sake of keeping this comment as short as I can, I'm going to summarize this text as (help text) in all future examples.

Why:
gzip use -- as value for -S.
find value 'file' and stop processing options, and consider all the next as values including --help.
It didn't display help and didn't apply rule 4.2 for help.

In the gzip version on my system, gzip uses -- as the value for -S, finds the bare value file and does not stop processing options, so it then finds --help and prints the help text, exiting with a 0 exit code since printing the help text is a non-error situation.

# Example2:
$ gzip file -S -- --help

gzip output:

gzip: file: No such file or directory
gzip: -S: No such file or directory
gzip: --: No such file or directory
gzip: --help: No such file or directory

What I got when I ran gzip file -S -- --help:

(help text)

and exit code 0.

getopt find value at start and consider all the followed as values including -- and --help

Again, the version of gzip that I ran did not stop at the first value. So -S was handled as an option and then -- was treated as the argument to -S, so that --help was still processed as an option and printed the help text.

# example 3:
gzip -S -a --help

gzip output:

gzip use -a (although it's option ) as a value for -S, find --help and display help

Here I get the same behavior as you: gzip does not consider -a to be an option because it immediately follows -S, which means that -a is never processed. But then it encounters --help, so according to the GNU coding standards it ignores all other arguments and prints the help text.

note: CLP display error message missing values for both -S and -a and display help with errors.

This is against the GNU coding standards I just linked to one paragraph above, which say: "Other options and arguments should be ignored once this" (that is, the --help option) "is seen, and the program should not perform its normal function."

getopt allow -- to be a value for scalar option although gnu standard didn't mention that -- can be used as a value.

The question: Is CLP Required to do this?

I believe it is, because we are trying to mimic the behavior of getopt. The GNU coding standards don't mention anything about what valid values can be for an option, because that's not something that the programmer using the getopt library (the person for whom the GNU coding standards document was written) needs to care about. The GNU coding standards just say "use getopt_long", which means that your program will get all of getopt's normal behavior. Including the fact that the next argument after a value-taking option like -S, whatever it is, should be swallowed whole and not interpreted. That's what getopt does, and that's the behavior that CLP should mimic.

getopt didn't return --help and use it as a value and didn't apply gnu standard 4.2
The question: Is CLP Required to do this?

Same response as the paragraph above. Yes, CLP should follow this behavior, because that's what getopt is expected to do. After a value-taking option, the next argument should be treated as the value, no matter what it is.

getopt allow an option (-a) to be a value for another option -S.
The question: Is CLP Required to do this?

Same response as the paragraph above. Yes, CLP should follow this behavior, because that's what getopt is expected to do. After a value-taking option, the next argument should be treated as the value, no matter what it is.

Also, getopt has three modes of scanning and they are completely different:

REQUIRE_ORDER, PERMUTE, RETURN_IN_ORDER

These modes are controlled by an environment variable POSIXLY_CORRECT or + or - passed in front of the Short Option string.
-- and --help can be handled differently based on the active scanning mode that is left to the caller program.

It's a wisdom to be care in resolving the corner boundaries in using -- and --help and also following GNU standard with open minded.

Almost every program I've seen uses the PERMUTE option (which is the default of getopt_long), and the GNU standards say "Use getopt_long to decode arguments, unless the argument syntax makes this unreasonable." So we should definitely default to this behavior, allowing options and values to be mixed just like the default behavior of getopt_long does.

Notes:

1. Considering - (single dash)  as a value was one of the missed feature needed by developers and  can be implemented and didn't conflict with GNU standard or getopt corner cases.

Yes, that could be implemented separately from my PR, quite easily. I fixed it in my PR because it was very easy to do, but if you want to reject my PR, then allowing - as a value should still be done.

2. AutoHelp=false, give the freedom for developer to provide his helptext and not/use -h/--help for displaying help.

My PR honors AutoHelp=false. Or at least, it should; if there's any part of my code that fails to honor AutoHelp=false, that's a bug and I'll fix it.

3. If it's allowed to use -- as a value it should be declared as a setting in ParserSetting although it can passed as """--""" without change.

I don't understand what you mean by "it can passed as """--""" without change", so I'll have to skip commenting on that part of this point. As for the rest, using -- as a value after an option (or after one occurrence of -- treated specially) is the normal behavior of getopt_long, just as with any other value that starts with -, or indeed any text whatsoever. Since the normal behavior of CLP is intended to mimic getopt as closely as possible, I don't think it should be a setting in ParserSetting to follow getopt's normal behavior. (I'd actually like to make the EnableDashDash option the default, so that getopt is mimiced by default, but that would be a breaking change so it should be reserved for a release that does a major-version bump, e.g. version 3 of CLP).

4. If it's allowed to make an option to be a value like example 3, it should be avoided.

I would VERY strongly disagree. The getopt behavior is that any text (no matter what it is) that follows after a value-consuming option should be consumed. If it didn't work that way, then there are two scenarios that would be impossible or very difficult:

  1. I'm writing a program that runs another program, and I have an --extra-args option that the user can pass to tell me extra arguments for the other program. E.g., outerprog --verbose would call innerprog --foo, but outerprog --extra-args --bar --verbose would call innerprog --foo --bar. (Note that in this example, --bar is not a valid option to outerprog). Without the ability to use any text as the value to a string-consuming option, the end user calling outerprog would be puzzled why the --extra-args option wasn't working right. If your note 3 was implemented, the user could add a -- before the --bar like outerprog --extra-args -- --bar --verbose (or, wait, then --verbose would be a value so the user would have to rewrite that as outerprog --verbose --extra-args -- --bar -- which shows another problem with your note 3, because the whole point of getopt_long's default behavior of interspersing values and options is that users should not have to rewrite the order of their command-line options to satisfy the demands of the program). But what would you expect that particular command, outerprog --verbose --extra-args -- --bar, to do? What I think you'd expect that to do is that --bar would become a value to --extra-args. But that's not what I (as someone who's used Linux since 1998) would have come to expect. I'd expect that the -- would be swallowed as the value of --extra-args. And even if it wasn't, I'd expect that --bar would be treated not as a value of an option, but as an extra argument to the program (what CLP calls a Value). And I'm not the only one who would expect -- to stop processing option values; DashDash (--) doesn't work properly with multi-value options #605 is based on that expectation as well.

  2. The other scenario is as follows. Let's say that --bar was not a valid option to outerprog, and we're following the suggestion in your note 4. So outerprog --extra-args --bar sees --bar, sees that it's not a valid option, and treats it as the value of --extra-args. But now the developer of outerprog adds a --bar option in a new release. Suddenly the same outerprog --extra-args --bar command line that used to work (and pass --bar to innerprog) is now failing with an error, saying that --extra-args needs a value. The end user will be baffled by this change in behavior. "What?" they'll say. "I'm already passing it a value: --bar is the value!" The fact that --bar became a valid option to outerprogs will not make them expect that it would no longer be treated as a valid value after --extra-args. So by following getopt_long's normal behavior, i.e. swallowing the next argument as an option value no matter what it is, we achieve consistency between versions 1 and 2 of outerprog, because the treatment of --extra-args --bar will be the same no matter whether --bar is now a valid option to outerprog or not.

What is your suggestions based on the above behavior of getopt used by gzip?

My suggestion is to mimic the default behavior of getopt_long exactly, because that's the one that's used in every Linux command-line program I've ever seen. (Except, of course, for programs like msbuild and dotnet which follow Windows command-line behavior and not Linux command-line behavior, but for that reason I don't consider them to be "Linux command-line programs"; they are Windows command-line programs that were ported to Linux).

Note that the examples you gave above do not, except for example 3, follow the behavior of getopt_long, so I'm very, VERY curious to know which Linux distro those examples came from, and what the output of gzip --version is on your system.

That means that:

  1. -S -- --help should have -- be the value of the -S option, and print help text unless AutoHelp is false.
  2. -S -a --help should have -a be the value of the -S option, and print help text unless AutoHelp is false.
  3. -S --help should have --help be the value of the -S option, and NOT print help text.

References:

* The Open Group Base Specifications Issue 7, 2018 edition, IEEE Std 1003.1-2017 [Revision of IEEE Std 1003.1-2008](https://pubs.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap12.html)

This is the POSIX standard, which is what getopt follows if you use the POSIXLY_CORRECT environment variable. Modern Linux programs all tend to follow the GNU standard, though, which takes the POSIX standard and expands on it.

* Linux Programmer's Manual   [GETOPT(3)](https://www.man7.org/linux/man-pages/man3/getopt.3.html)

This actually says that the default behavior of getopt, not just getopt_long, is to "permute[] the contents of argv as it scans, so that eventually all the nonoptions are at the end." This is the behavior I was expecting, where options and values can be mixed and processing doesn't stop at the first non-option value. So that's actually different than the getopt source code here, which is what I had been looking at as I wrote the earlier parts of this.

* gzip source in using getopt[L433](https://github.com/Distrotech/gzip/blob/distrotech-gzip/gzip.c#L443)

I see this is calling getopt_long and not passing a + as the first character, so the default behavior should have been what I was expecting, and NOT the behavior you got in your examples 1 and 2 above. Which, again, makes me wonder what version of gzip (and what version of Linux) you were running when you got those results.

Summary

Sorry about how long this was. The summary is: AFAICT, the version of gzip that you used to produce those examples is buggy, and the way getopt is supposed to work is to allow mixed options and values, and to "swallow" anything after a string-taking argument (even another option or the text --). I.e., exactly the way I wrote PR #607 to behave.

@rmunn
Copy link
Contributor Author

rmunn commented Jun 9, 2020

@moh-hassan - I really do want to know what version of gzip you were using when you ran those command lines in your comment, and which Linux distribution it came from. Because if there are Linux distros out there whose standard tools do not allow interleaving options and values the way my gzip examples do (i.e., they stop processing options after the first value is encountered), then I should change the defaults on my PR. So if you got those examples from running a real gzip command, please let me know how to reproduce those results. (If you got them from reading the gzip source code and thinking that's how it would work, then I suspect you made a mistake about the defaults and that if you ran a real gzip command you would get the same results as me.) If I can reproduce the gzip results you got, I'll be better placed to judge whether my PR needs to change.

@moh-hassan
Copy link
Collaborator

moh-hassan commented Jun 9, 2020

@rmunn

What version of gzip were you using when you ran those command lines in your comment?

This is from the gzip package, version 1.6, in Ubuntu 18.04.2 Linux.
With enabling POSIXLY_CORRECT

user1@ubuntu:~$ gzip --version
gzip 1.6
Copyright (C) 2007, 2010, 2011 Free Software Foundation, Inc.
Copyright (C) 1993 Jean-loup Gailly.
This is free software.  You may redistribute copies of it under the terms of
the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.

Written by Jean-loup Gailly.

The summary is: AFAICT, the version of gzip that you used to produce those examples is buggy,

NO, it's absolutely correct, but gzip has no control at all to disable POSIXLY_CORRECT

Getopt(getopt_long) IGNORE VALIDATION of the next token, but CLP apply guards and validation (more than 16 validation rule) based on the Option class (including value and data type).

It's not logic to take the primitive behavior of getopt_xxx and enforce CLP to take the next token blindly.

See this example

gzip -S -b file.txt

It will generate a file: file.txt-b.
Is it logic to take the entry of commandline ASIS without validation?
getoptxxx treat commandline with good intent and suppose that user will not do a mistake, and if he did a mistake, he should bear it even the file can b named: file.txt-b as in the examble above.

CLP validate the token and fire error and it is controlled by parser setting .

For help, getoptxxx is not aware of Verbs and how to call help in verb scenario.

If you want to trigger help with --help, -h from any position(as getopt mimiced), this can be done with a minor change in HelpText class using Regex, but again take the syntax of help verbs into account.

(If you got them from reading the gzip source code and thinking that's how it would work, then I suspect you made a mistake about the defaults and that if you ran a real gzip command you would get the same results as me.)

Can you imagine that I can do this mistake and imagine output based on source code reading?
if you enabled POSIXLY_CORRECT , you ran a real gzip command you would get the same results as me.

Summary
CLP can mimic getopt in help with minor change in HelpText class, and getopt is not a ware of help for verbs or using verbs.
For --, CLP control it with EnableDashDash =true/false option in parser setting plus other options that control what to do with the next token and do validation and apply guard rules on every token.

The goal of this library is not to mimic the behavior of getopt, but to apply GNU standard for using short/long options (vs forward slash) with controlling parser behavior and apply validation rules on tokens and extra features.

@rmunn
Copy link
Contributor Author

rmunn commented Jun 10, 2020

Ah, so you were using POSIXLY_CORRECT in those options. I didn't understand that, since you didn't show it in the grep command lines you posted, and I don't know anyone who has it set by default in their .bashrc because the default getopt behavior is so much more useful than the POSIX standard behavior.

And you're arguing that CLP should mimic the POSIX standard by default, whereas I'm arguing that it should mimic getopt's default (non-POSIX) behavior by default.

Actually, it will be pretty easy to allow both; I'll tweak PR #607 to add a ParserSettings option called PosixlyCorrect that turns on the POSIX behavior (stop processing optons after first non-option argument), and I'll also make it honor the POSIXLY_CORRECT environment variable so that end users who expect that behavior can make it happen. (And after doing a bit of Googling on the subject myself, I've come to the conclusion that sometimes POSIXLY_CORRECT is what you want, but most of the time it's not since most people write Bash scripts with the assumption that getopt's default mixed-options-and-values behavior is what they're going to get. So allowing for both behaviors is definitely the right thing to do here. I'll leave it defaulting to mixed, since it seems that that's what most people expect, but there will be a ParserSettings option to change that (like putting a + in front of the options string of getopt).

As for the question of validation of option values, I am firmly convinced that CLP should do exactly as much validation as is needed to validate the types of the options, and nothing more. I.e., if -s is a string option and -n is a number (say an int) option, then -n foo should be rejected, but -n -1 should be accepted and put the value -1 (negative one) into the Number property. And -s foo should be accepted, and so should -s -1, because CLP cannot know the end user's intent. What if the end user preferred having tarballs with a .tar-gz extension instead of .tar.gz? If getopt worked the way CLP currently does, gzip -S -gz file.tar would throw an error, instead of producing the file.tar-gz file that the user wanted. But since opinion clearly does differ on this subject, I'll put in another ParserSettings option to change that, and forbid string values starting with a - (except for the bare - value which means "stdin/stdout", and should always be allowable as a string value). I have a feeling that most people will want to permit string values that start with -, so I think that most people will want to turn that particular option off, but in deference to CLP's current behavior I'll default that one to on so that the "no options that start with -" validation is kept by default.

AFAICT, the changes I made to the parser don't change the validation of ints or other types: -n foo will still produce a parser error when it tries to convert "foo" to an integer. So I only really need to care about this for string values, because integer values in particular need to be able to allow -1 and the like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants