Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterating over ArgMatches #1206

Closed
neysofu opened this issue Mar 9, 2018 · 54 comments · Fixed by #4080
Closed

Iterating over ArgMatches #1206

neysofu opened this issue Mar 9, 2018 · 54 comments · Fixed by #4080
Labels
A-parsing Area: Parser's logic and needs it changed somehow. C-enhancement Category: Raise on the bar on expectations E-medium Call for participation: Experience needed to fix: Medium / intermediate M-breaking-change Meta: Implementing or merging this will introduce a breaking change.
Milestone

Comments

@neysofu
Copy link

neysofu commented Mar 9, 2018

Maintainer's notes


Hi, I am using Clap in a little side project and I would like to suggest a minor feature. I am ready to implement it myself and send a PR if needed, I just need a few pointers. The feature at hand is an ordered iterator over ArgMatches: as far as my understanding goes, Clap doesn't allow to freely iterate over the parsed arguments. What if my application APIs are dependent upon the order of the arguments? GNU find comes to my mind. Calling index_of for every argument can be expensive if you have a lot of them.

If this feature were to be implemented, what would the return type and name of the method be?

@kbknapp
Copy link
Member

kbknapp commented Mar 19, 2018

Sorry for the wait, I've been on vacation with my kids this past week :)

Do you mean iterating over only positional values, or over all args that have been used? Also, do you envision this iterating over non-matched args as well?

If you'd like to take a stab at implementing I'd be happy to mentor you! This is how I'd attack it:

  • Draw out how you see this working, including the questions I listed above (including any edge cases you can think of)
  • I start with some usage code (such as in a test) to make sure the ergonomics are correct/comfortable
  • Write some tests in tests/matches.rs for the various uses you've drawn up in step one
  • I then just test that file alone with cargo test --test matches or can further test just a single function in that file with cargo test --test matches -- name_of_function
  • Start the implementation (this will vary greatly depending on how you've designed it)

For the implementation once we have a good design I can give you some pointers as to where the changes you'll want to make are located / any particulars associated with those changes.

Once all your new tests pass, I'd run a full test suite cargo test --features "yaml vec_map wrap_help unstable" and make sure nothing broke.

If you need to see the debug output while you're building testing a single function use cargo test --test matches --features debug -- name_of_function

@neysofu
Copy link
Author

neysofu commented Mar 19, 2018

First of all, thanks for taking the time to address this! I hope you enjoyed your vacation.

The API I had in mind wouldn't be constrained only to positional arguments, but to all the arguments that have been used. Currently ArgMatches allows you to know which arguments have been used and their index in respect to all other arguments, yet it lacks a way to actually iterate over them.

I mean adding an API to access the order in which options are given. Ex.:

cmd --abc --xyz
cmd --xyz --abc

Let's say my command's behavior is specific to the order in which the options abc and xyz are given. As long as it's only two arguments I can get away with calling index_of for one of the two and figuring out which comes first. But what if I provide many another options? My idea is offer a method over ArgMatches that gives an iterator over the arguments in an ordered fashion:

arg_matches_1.iter_raw() == vec!["abc", "xyz"];
arg_matches_2.iter_raw() == vec!["xyz", "abc"];

This example is just off the top of my head and I am sure it could be improved.

Please correct me if I talked nonsense at any point because I am not very familiar with the API.

@kbknapp
Copy link
Member

kbknapp commented Mar 19, 2018

To make sure it's not an XY problem, how does your CLI work with the order of args? I've always seen conflicts/overrides as the main driving factor for when order matters. Both of which handled in the parsing logic.

The only other form I've seen is for things like include/exclude logic where there aren't really hard conflicts/overrides at the parsing level, but further up the execution stack there are. I've only seen this between a handlful of args at the most, and hence iterating over the indices isn't an issue.

This is of course not to say my list is exhaustive, I'm sure there are other use cases out there!

This actually wouldn't be a huge change for clap to implement this, but it would introduce an additional dep (such as indexmap ) and thus would need to be behind a non-default cargo feature.

The change you'd make is using cfg statements to change the hashmap to an indexmap and provide an ArgMatches method to simply return an Iterator of the requested info, perhaps as a tuple? (Such as (name, &[vals])?) This could be similar to how Arg::values_of works.

For the iterator itself, it'd fine to use the Values impl as a guide.

@kbknapp
Copy link
Member

kbknapp commented Mar 19, 2018

Another more lengthy option (but arguably more correct) is instead of returning an Iterator of some custom data type tuple, returning the MatchedArg itself. I say more lengthy because you'd have to document MatchedArg and provide the appropriate methods to get at it's internals such as values, occurrences, and indices. However, this is more flexible in the long run as you wouldn't need to add more ArgMatches methods in order to deal with different types of iteration...you'd just let the calling code deal with it has it wants.

@neysofu
Copy link
Author

neysofu commented Mar 20, 2018

My application has a custom version subcommand that works this way:

// Prints human readable version if no options are given
$ cmd version
Cmd 2.42.1-rc1 (fj23o83b)

// Switches to line-by-line format if options are given, to facilitate machine readability. The requested data is printed in the specified order.
$ cmd version --tags --major
rc1
2

However there are many other use cases. Tasks execution order in a Gulp-like application, apply filters in a specific order in GNU find, ecc.

@neysofu
Copy link
Author

neysofu commented Mar 20, 2018

This actually wouldn't be a huge change for clap to implement this, but it would introduce an additional dep (such as indexmap ) and thus would need to be behind a non-default cargo feature.

That's a pity. I thought it would be as simple as adding a method to ArgMatches, with no fundamental changes to the actual program.

@kbknapp
Copy link
Member

kbknapp commented Mar 21, 2018

Right now, clap uses a HashMap to hold the matches, which is inherently not in order. Switching to an IndexMap would be basically a drop in replacement, but because it's another dep I'd want people to have to opt-in to it.

So it basically would be just a cfg statement which adds the method on the matches struct and uses an indexmap instead of hashmap. And I'd create a struct to act as the iterator, and impl Iterator for it.

Something like (details left out for brevity):

#[cfg(not(feature = "indexmap"))]
struct ArgMatches {
    args: HashMap
}

#[cfg(feature = "indexmap")]
struct ArgMatches {
    args: IndexMap
}

impl ArgMatches {
    #[cfg(feature = "indexmap")]
    fn iter(&self) -> MatchIter {
        MatchIter { iter: self.args.iter() }
    }
}

#[cfg(feature = "indexmap")]
struct MatchIter {
    iter: // whatever IndexMap::iter returns
}

#[cfg(feature = "indexmap")]
impl Iterator for MatchIter {
    type Item = // whatever args.iter().next() returns

    fn next(&mut self) -> Self::Item {
        self.iter().next()
    }
}

So it's a pretty straight forward change.

@neysofu
Copy link
Author

neysofu commented Mar 21, 2018

Thanks for the tips, I'll start working on a prototype then. 👍

@neysofu
Copy link
Author

neysofu commented Mar 24, 2018

I feel hesitant adding an iter method to ArgMatches; it feels like a duplicate of indexmap's functionality. ArgMatches.args: IndexMap is already public, so what's wrong with the user doing matches.args.iter()?

@neysofu
Copy link
Author

neysofu commented Mar 24, 2018

You can take a look at the prototype here. There is some more work to do, specifically:

  • write documentation for both the feature and the iter method;
  • remove the feature from the default set (it was a quick hack to make it build every time without passing arguments);
  • add some more tests.

Please let me know how you like the API and if you want any changes. Congrats on the codebase BTW, I know this is a small change but still, it is a pleasure to dive in.

@kbknapp
Copy link
Member

kbknapp commented Mar 25, 2018

I feel hesitant adding an iter method to ArgMatches; it feels like a duplicate of indexmap's functionality. ArgMatches.args: IndexMap is already public, so what's wrong with the user doing matches.args.iter()?

My current policy is public, but undocumented internals are effectively private and should not be relied on. Sure, I can't stop anyone from using them directly, but there's the caveat that if I change them without warning it's considered a breaking change. Hence the #[doc(hidden)]

Most of the time public but undocumented is because it needs to be public because other mods (or eventually crates) need to touch it, but those mods (or crates) are maintained by myself or other clap members and thus it's essentially just a private API.

@kbknapp
Copy link
Member

kbknapp commented Mar 25, 2018

Looks like a good start! It's a small nit, but let's change the feature name to iter_matches because iter is somewhat ambiguous and may conflict with other types of iteration we may do in the future.

Let me know if you have any other questions 👍

@neysofu
Copy link
Author

neysofu commented Mar 25, 2018

I'll create a PR for discussing further details, sounds good?

@kbknapp
Copy link
Member

kbknapp commented Mar 25, 2018

Yep, I'd actually prefer it that way because it's easier to comment and review 😄 👍

@XX
Copy link

XX commented Apr 14, 2018

I also need this functionality

@bradwood
Copy link

Hi all, I'm also looking to use something like this. I'd like to process my args in a functional fashion. The PR, however, seems to have died a death. Given that 3.0 is on the horizon how much of a bad idea is it for me to just use matches.args.iter() in 2.x? Also, will this make it into 3.0?

PS, I don't need ordering.

@CreepySkeleton
Copy link
Contributor

@bradwood Not as bad as you'd think: 2.x branch is frozen. I'd be really surprised if we change something regarding publicity of those fields because there is plenty of code out there relying on those, ugh. So, if you aren't planning on switching to 3.0 and you're OK with the fact that every time you use private API god kills a kitten, you can do that, yes.

Beware: those will be truly private in 3.0.

I suspect it won't make it into 3.0. The main question is: what are you going to iterate through? You can't have &Arg from ArgMatches because it stores Ids, basically a hash of those. On doesn't simply restore the Arg from the hash.

@bradwood
Copy link

i am trying to call map() on each arg I find to invoke the appropriate handler function based on the argument passed...

And, as regards the kittens, I'm cool with that! :)

@CreepySkeleton
Copy link
Contributor

clap has Arg::validator that is called on every value passed in. And of course, you can always iterate over std::env::args() and map it as you please; you don't really need clap for that.

@pksunkara
Copy link
Member

I think I can see a certain use case for this where people want to do dependency injection pattern for evaluating their args.

@kbknapp
Copy link
Member

kbknapp commented Apr 27, 2020

[I] am trying to call map() on each arg I find to invoke the appropriate handler function based on the argument passed...

For a while I was floating around the idea of adding filter/map capabilities to arguments themselves, in a similar manner to the current validator. In essence it's just a chance for people to call a function when a particular argument/value is used and do something with its value persistently. Some use cases were, "I want my CLI to accept options yes|no but I want the value held my context struct to be some concrete type."

But others also wanted more a "handler" type functionality (which is something some python libraries offer). This can be done already by (hackily) using the validator functionality.

Currently, the mapping of values can be done pretty easily after the parsing in user code manually, but not during parsing.

I'm not 100% sure where I fall on the thinking of this one though, as is the added complexity and public API space worth the small benefit of doing something at parse time instead of manually afterwards?

Also perhaps more importantly, have users thought through what happens if a particular map/filter/handler whatever we'd call it has side affects, but parsing ultimately fails? Maybe that's just a, "hey probably best you don't have side affects, but if you do be aware it can have XYZ consequences." I mean you can already (mis)use the validators for handlers, so I'd be less inclined to include that functionality.

@CreepySkeleton
Copy link
Contributor

I want my CLI to accept options yes|no but I want the value held my context struct to be some concrete type

Resolved by derive (and partially by value_of_t* methods).

I'm not sure I get your "handlers" idea, I imagine it's something akin #1185:

a single Arg::custom_parse which accepts a function that takes the arg/value currently being parsed, and returns an Option where None is skip this value (i.e. parse it as a different arg), or Some(T) is "save this to the matches" it could be the value as passed in, or it could be something different entirely (i.e. a map).

Some thoughts:

  • When is it ran, exactly? Before default values/environment variables were populated or after? What about required/conflicts stuff? How does it interact with old-style validators? With groups? Possible values?

    I'm not sure I'm ready to answer these questions.

  • We've already got the flush-related bug. The problem here that clap exits with sys::exit() which implies that no destructors are being accounted for, no finallizers are being run, the text still awaiting to be processed by IO facilities is lost, etc... When I'm reading the description above, I'm given impression that "heavy" side-effect functions are OK, so will be users.

In nutshell: I think this kind of tool is too vague and too powerful for a command line parsing library to provide. I don't think clap should be trying to accommodate every single CLI in existence, the number of well-supported designs is quite enough already.

@bradwood
Copy link

FWIW, I'm not needing these map()s to run during arg parsing, I'm happy to pass an iterator of matched arguments into a function pipeline. I'll take a look at the validator, as I've not really looked properly at that. It might work for my use case.

@CreepySkeleton
Copy link
Contributor

CreepySkeleton commented Apr 27, 2020

@bradwood Out of morbid curiosity, what is the core problem you're trying to solve? Why do you think map is a good way to approach this kind of problem?

I suspect you're just "doing it wrong".

@kbknapp
Copy link
Member

kbknapp commented Apr 27, 2020

Resolved by derive (and partially by value_of_t* methods).

Exactly. I'm describing a problem/issue that existed prior to derive macros. The "handlers" was just a common theme that popped up every so often, especially when people were moving from Python based CLIs which offer a "when this argument is used, run this function." Although that's a totally different way to build/design a CLI and thus I don't think it's really applicable to how clap runs.

I don't think clap should be trying to accommodate every single CLI in existence, the number of well-supported designs is quite enough already.

100% agree. I'm just providing some historic context 😉


@bradwood I agree with @CreepySkeleton it sounds like you may be approaching the problem from a particular design angle and we may be seeing it as an XY problem.

Some of the principal issues with a feature mapping over arguments (and their values) is:

  • For us (clap) to provide it, it needs to be general enough to fit all (or most) use cases, and with feature in particular there are a lot of things to consider and edge cases
  • If we're talking about true map (i.e. T->U), that can essentially already be done easily as @CreepySkeleton mentioned
  • If we're talking about handlers (i.e. perform some action for each argument); unless these actions are only blind side affects they:
    • need some sort of context passed to/from them (other args, maybe application context, etc.)
    • cannot rely on deterministic order or hierarchy
    • most likely need the concept of fallibility (what happens on failure)

Those final three bullets are some of what making this feature a general clap feature difficult. If you don't have any of those requirements and are truly only wanting blind side affects, I'd question if there may be a better way to represent the CLI.

If you can give some examples of what you're trying to do, we may be able to either point you in a direction or give you a way to accomplish that task with existing features.

@epage epage added the S-triage Status: New; needs maintainer attention. label Dec 9, 2021
@epage epage mentioned this issue Jan 12, 2022
2 tasks
@antonshkurenko
Copy link

Writing this comment because of #3286: absence of iterator blocks our update from 2x to 3x

@epage
Copy link
Member

epage commented Jan 18, 2022

@tonyshkurenko can you expand on the use case for why this blocks updating?

@antonshkurenko
Copy link

I mean we literally used field

pub args: HashMap<&'a str, MatchedArg>,

Part of our app was based on this. Now it's not available (correct me, if I'm wrong) without any alternative. That's why we can't update

@epage
Copy link
Member

epage commented Jan 25, 2022

Yes, I am asking for your use case or why you use the field. What problem are you trying to solve by accessing args?

@fosskers
Copy link

I was attempting this about a year ago, and my particular use case was either:

  • to be able to filter back out matched flags under certain circumstances
  • to be able to convert all the matched flags back into their original string forms

@epage
Copy link
Member

epage commented Feb 1, 2022

I was looking at exposing a present_arg_ids() -> impl Iterator but it looks like this is not something we'd fix in clap3 but have to wait for clap4.

We only store an Id for args, so this is the most of what we can return. We can make it so users can check that its a specific arg. The problem is then using this to do any dynamic operations on ArgMatches. The problem is that ArgMatches functions take a impl Key. To properly initialize a new Id from the Key, we have to add a impl Display for Id. Id only has a reasonable Display in debug mode. If we expose Id and impl Display, there is a good chance users will leverage the Display and being able to catch that not working depends on how thorough they test. Alternatively, we could look to drop the Display bounds in release modes and only impl Display for Id in debug mode.

In general, resolving this is getting hairy and something we are unlikely to want to take on. Our current plan is to wait around 6 months at minimum between major breaking releases (which this change would probably be part of), so we are looking around June for resolving this.

@epage epage added this to the 4.0 milestone Feb 1, 2022
@epage epage added M-breaking-change Meta: Implementing or merging this will introduce a breaking change. E-medium Call for participation: Experience needed to fix: Medium / intermediate and removed S-triage Status: New; needs maintainer attention. labels Feb 1, 2022
@epage
Copy link
Member

epage commented Feb 1, 2022

Also, when implementing this, my expectation is that we'll provide something like:

pub fn present_arg_ids(&self) -> Iterator<Item=&Id>
  • The order will be undefined. If at the time, we are still using IndexMap internally, we should intentionally mix up the values via a HashSet in debug builds.
    • Use of IndexMap is an implementation detail and we are considering removing it to reduce our dependencies
    • We aren't using IndexMap correctly for ensuring ordering is preserved, so seeing it there is misleading
    • Ordering of the first occurrence of each arg id is only meaningful if multiple occurrences is not used but in most schemes where ordering matters, multiple occurrences will likely be used. Providing ordering is a trap for people to not consider this
  • I suggest the function's example creates a Vec<(Id, &str)> to show how to order all occurrences of all args by index.

@omarabid
Copy link

omarabid commented Feb 6, 2022

So if I want to iterate over matched arguments (flags) how should I approach this today?

@epage
Copy link
Member

epage commented Feb 7, 2022

You'll need to define a slice of all of you argument IDs (names) and iterate through that.

@omarabid
Copy link

omarabid commented Feb 7, 2022

You'll need to define a slice of all of you argument IDs (names) and iterate through that.

Does that mean I have to check if the flag was set. Because I have dozens of flags, so it seems to me to be a waste to check for all flags when I only want to know which flags were set.

@epage
Copy link
Member

epage commented Feb 7, 2022

Yes. This is just a workaround. Like I said, we cannot fix this until clap 4.0.

nyurik added a commit to nyurik/viperus that referenced this issue May 29, 2022
A few minor cleanups to make it easier to submit PRs:
* migrate to 2021 edition
* run `cargo fmt`
* bump some dependencies to their latest
  * Note that Clap is locked to 2.x until clap-rs/clap#1206 is fixed in v4 (?)
* spelling fixes

TODO for another PR:
* fix many clippy [from_over_into](https://rust-lang.github.io/rust-clippy/master/index.html#from_over_into) warnings in the src/map/map_value.rs file.
* README.md still has a few bugs. Uncomment a section in the lib.rs to show them during the test run.
@nyurik
Copy link
Contributor

nyurik commented Jun 3, 2022

@epage thx for all the hard work on this! Is there a PR or some other work being done in this space? Per #2763 (comment), it seems all layered configuration projects need to be able to enumerate over ArgMatches to support Clap3+derive. Thanks!

@epage
Copy link
Member

epage commented Jun 3, 2022

From my note I added in the issue

This is blocked on #1041 / #2150 where we move away from opaque, non-public Ids for arguments to something the user can programmatically act on

We cannot make these changes in clap v3 and but this should be available in clap v4.0.0. My expectation is that I'm going to soon wrap up the work I'm doing for clap 3.2, give time to collect and handle any feedback, and then start on v4.0.0. v3.2 is a big change, see #3732. My intention is to make this big change right before 4.0 so we don't have to live a long time with any mistakes identified through feedback.

epage added a commit to epage/clap that referenced this issue Aug 15, 2022
For now, we are focusing only on iterating over the argument ids and not
the values.

This provides a building block for more obscure use cases like iterating
over argument values, in order.  We are not providing it out of the box
at the moment both to not overly incentize a less common case, because
it would abstract away a performance hit, and because we want to let
people experiment with this and if a common path emerges we can consider
it then if there is enough users.

Fixes clap-rs#1206
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parsing Area: Parser's logic and needs it changed somehow. C-enhancement Category: Raise on the bar on expectations E-medium Call for participation: Experience needed to fix: Medium / intermediate M-breaking-change Meta: Implementing or merging this will introduce a breaking change.
Projects
None yet
Development

Successfully merging a pull request may close this issue.