Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argparse add_argument treats - in flags differently to - in positional arguments. #95100

Closed
CSDUMMI opened this issue Jul 21, 2022 · 24 comments
Closed
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@CSDUMMI
Copy link

CSDUMMI commented Jul 21, 2022

Bug report

If you add an optional argument to a parser with argparse containing dashes,
those are converted to _ automatically in the resulting Namespace object.

But if you add a positional argument containing a -, this is not converted and the resulting error message suggests the argument name containing the - instead of the _. Which is of course not possible (without getattr), because it's not a valid variable name in Python.

This behaviour is misleading and undocumented and I'd suggest to convert - to _ in positional arguments too.

Reproduction code:

import argparse

parser = argparse.ArgumentParser("example")

parser.add_argument("foo-bar", type=str)

args = parser.parse_args()

print("getattr", getattr(args, "foo-bar"))

print("- replaced by _", args.foo_bar)

Results in:

$ python3 main.py abc
getattr aoe
Traceback (most recent call last):
  File "/tmp/main.py", line 11, in <module>
    print("- replaced by _", args.foo_bar)
AttributeError: 'Namespace' object has no attribute 'foo_bar'. Did you mean: 'foo-bar'?

Compounding this issue is the fact, that you are prevented from using the dest option on add_argument to overwrite the name in the Namespace.

Your environment

  • CPython versions tested on: Python 3.10.5
  • Operating system and architecture: Linux
@CSDUMMI CSDUMMI added the type-bug An unexpected behavior, bug, or error label Jul 21, 2022
@ericvsmith
Copy link
Member

I’ve been bitten by this before, and I think it would be a nice feature. But wouldn’t it break code currently using getattr?

@CSDUMMI
Copy link
Author

CSDUMMI commented Jul 21, 2022

You are right.

Is that happening a lot?

In that case, a work-around could be added to make it possible to use getattr with the dash name.

But that's not a very nice, right?

I'd expect that this is really not a lot of work and that removing a work around and replacing it with the convention in existing code should benefit the code quality of any project currently using it.

@ericvsmith
Copy link
Member

I guess you could make it available under both names, but I'm not sure it's worth the hassle.

I don't think we should break existing code, even if it would be better off with the change.

@CSDUMMI
Copy link
Author

CSDUMMI commented Jul 22, 2022

PEP 387 lays out the rules for backward compatibility in Python.

Given a set of arguments, the return value, side effects, and raised exceptions of a function. This does not preclude changes from reasonable bug fixes.

I would consider this a reasonable bug fix, because this behavior is undocumented and diverges from the convention explicity documented for optional arguments in argparse.

@ericvsmith
Copy link
Member

In my years working on Python, I've learned that every change will break something.

@CSDUMMI
Copy link
Author

CSDUMMI commented Jul 22, 2022

In that case, breaking something should not be an argument against changing something.

@CSDUMMI
Copy link
Author

CSDUMMI commented Jul 22, 2022

I fear that adding a work around for supporting code using getattr and dash names, could lead to other problems.

For example, if you iterate over the resulting namespace, use vars or dir on it, a workaround to allow for both the - and _ version of a positional argument name would lead to the same argument being contained twice within the namespace.

With the work around enabled, this would happen:

iimport argparse

parser = argparse.ArgumentParser("example")

parser.add_argument("foo-bar", type=str)

args = parser.parse_args(["abc"])

print(vars(args)) # { "foo-bar" :  "abc", "foo_bar" : "abc" }

And that is yet another undocumented, unintuitive behavior with potential for breaking existing code.

@ericvsmith
Copy link
Member

In that case, breaking something should not be an argument against changing something.

I don’t think that follows.

But my point is the bar for change is high, and I don’t think this meets the criteria.

@CSDUMMI
Copy link
Author

CSDUMMI commented Jul 23, 2022

I consider it an undocumented, unintuitive and not obvious behavior.

Behavior that is not documented, is also behavior that is not guaranteed.

And I would also wager, that it was not intentional.

Thus this should be considered as a reasonable bug fix.

It must be either fixed or at the very least be documented in the argparse documentation. Not everybody will be familiar with getattr and know that it can be used to work around this bug.

@ericvsmith
Copy link
Member

I agree it should at least be documented.

@CSDUMMI
Copy link
Author

CSDUMMI commented Jul 23, 2022

So:

If your positional argument name contains a -, you must use getattr(args, "foo-bar") instead of args.foo_bar, because this might break backwards compatiblity if fixed.

@CSDUMMI
Copy link
Author

CSDUMMI commented Jul 23, 2022

Explaining that note in the documentation might be a little hard.

@CSDUMMI
Copy link
Author

CSDUMMI commented Jul 23, 2022

The problem here is that the longer it is not fixed, the harder it will be to fix it eventually.

And who really wants to have to explain this behavior indefinitely to people new to argparse?

@ericvsmith
Copy link
Member

The problem here is that the longer it is not fixed, the harder it will be to fix it eventually.

That's always true for all such changes.

I'm not saying it shouldn't be done. I'm saying we'll probably break working code, and that's a very high bar for a change. I occasionally make changes to argparse, but here I'll wait for other core devs to weigh in. In the meantime, a doc PR would be welcomed.

@CSDUMMI
Copy link
Author

CSDUMMI commented Jul 24, 2022

I'm having trouble finding the right place to add a warning about this behavior.

@CSDUMMI
Copy link
Author

CSDUMMI commented Jul 26, 2022

To be more specific:

Should the warning + work around be added as inline documentation of argparse module or as part of Doc/library/argparse.rst?

@erlend-aasland erlend-aasland added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Jul 30, 2022
@hpaulj
Copy link

hpaulj commented Nov 4, 2022

For optionals, there is a POSIX convention of accepting dashes in the flag strings, so conversion to underscore makes sense.

For positionals, there isn't any good reason to use dashes - unless you want to make life difficult for yourself. You are free to use any 'dest' string, even ones that start with numbers and contain odd characters. Internally, argparse uses the dest with setattr/getattr, so it isn't bothered by odd characters.

And if you must have dashes in the usage or help, use the metavar.

During the debugging phase it's a good idea to include a 'print(args)' line, so you aren't surprised by changes, or nonchanges to the 'dest.

In _get_optional_kwargs, the dash replace is done only when it is inferred from the long option string. It is not done when you provide an explicit 'dest' parameter.

For a positional, _get_positional_kwargs gets the first (and only) string as the 'dest'. It does not do any checking or replacement.

I think the 'dest' documentation is clear enough. "For positional argument actions, dest is normally supplied as the first argument ". The dash conversion is clearly identified as an optionals feature.

@zackw
Copy link

zackw commented Jun 27, 2024

  1. Yes, there is a good reason to want to use dashes for positional arguments: consistency with optional arguments. It looks bad, unprofessional, like I didn't bother copyediting my documentation, to have both "input_file" and "--output-file" in the --help output.

  2. The dash conversion is not clearly identified as optionals-only. The actual text of the documentation is

    For positional argument actions, dest is normally supplied as the first argument to add_argument(). For optional argument actions, the value of dest is normally inferred from the option strings.
    [two irrelevant sentences about the inference process]
    Any internal - characters will be converted to _ characters to make sure the string is a valid attribute name.

    I suppose it's possible to read the "Any internal - characters" sentence as part of the process "for optional argument actions", but it is equally possible to read it as applying to all argument actions, and it makes more sense for it to be applied to all arguments, because the rationale -- "to make sure the string is a valid attribute name" -- applies to all arguments.

  3. Aggravating the problem, the documentation, together with the behavior of argparse if you give both a first non-keyword argument and a dest= argument to add_argument for a positional (i.e. throwing an error) makes it seem like it is impossible to manually make the attribute and visible name of a positional be different. It is actually possible, but the only way to do it is by giving add_argument no positional arguments and specifying both dest= and metavar=, which I only realized was a valid thing to do when I found issue argparse: fix inconsistency in add_argument() API when using positional argument with dest= parameter #117834. I was actually about to start monkey-patching argparse to work around this issue before I thought to check for bug reports.

  4. I believe backward compatibility can be ensured by having ap.add_argument("input-file", ...) set both input-file and input_file in the namespace object, in the absence of an explicit dest= parameter.

@ericvsmith
Copy link
Member

4. I believe backward compatibility can be ensured by having ap.add_argument("input-file", ...) set both input-file and input_file in the namespace object, in the absence of an explicit dest= parameter.

Unless someone is iterating over the namespace and doesn't expect to find the same thing twice.

I'm not familiar with your point # 3. I'll have to read up on it.

@hpaulj
Copy link

hpaulj commented Jun 27, 2024 via email

@hpaulj
Copy link

hpaulj commented Jun 27, 2024 via email

@zackw
Copy link

zackw commented Jun 27, 2024

To me it seems obvious that if dest is specified explicitly then it should win over anything inferred from the names list, whether or not the argument is an option.

Since .add_argument("this-thing", dest="this_thing") currently throws an exception there is no backward compatibility cost to permitting it. I acknowledge that my point #‌4 may not work out in practice; I hadn't thought of iterating over the namespace.

@rindeal
Copy link
Contributor

rindeal commented Jun 28, 2024

IMO key normalization on access makes much more sense than storing duplicated values

--- a/Lib/argparse.py
+++ b/Lib/argparse.py
@@ -35,8 +35,32 @@ class Namespace(_AttributeHolder):
         return vars(self) == vars(other)
 
     def __contains__(self, key):
-        return key in self.__dict__
+        return self._get_normalized_key(key) is not None
+
+    def __getattr__(self, name):
+        normalized_key = self._get_normalized_key(name)
+        if normalized_key is not None:
+            return self.__dict__[normalized_key]
+        raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
+
+    def __getitem__(self, key):
+        return self.__getattr__(key)
+
+    def _get_normalized_key(self, key):
+        if key in self.__dict__:
+            return key
+        alt_key = key.replace('-', '_') if '-' in key else key.replace('_', '-')
+        return alt_key if alt_key in self.__dict__ else None

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Sep 28, 2024

This is a duplicate of an older issue #59330, which also has patches and PR. All proposed solutions have flaws and drawbacks.

@serhiy-storchaka serhiy-storchaka closed this as not planned Won't fix, can't repro, duplicate, stale Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
Status: Doc issues
Development

No branches or pull requests

8 participants