-
Notifications
You must be signed in to change notification settings - Fork 83
very slow on large argument string #195
Comments
This is not the cause of the issue, but the parser becomes confused by the lack of a blank line between the "Usage:" lines and the "Options:" lines. This in turn causes the parser to interpret every line in the documentation as list of options/commands. Perhaps worth opening a separate issue for (or not, it's pretty minor, but it is different from the reference implementation). |
Nope, I was wrong, this is the same behavior as in the reference implementation: Example |
Regarding the actual issue: From what I can tell, for n repeats of one argument and m repeats of another, the method Some options from the top of my head:
Below is the final output from
As you can see, we try every possibility before we check the validity, which is why an iterator would probably help. In this code:
If |
@anka-213 Thank you for the analysis. :-) To give my thoughts here: I'd like to scrap the parser and rewrite it from the ground up. I am personally not a big fan of parser frameworks like Sorry I don't have the time right now to do any more digging with you, but it sounds like you've got it figured out. :-) |
By the way, I'm curious about the next few lines in the code. They seem very redundant:
Why not just call |
Ok. Sounds like a good idea. Might be a lot of work though, especially since there is no formal specification. Just curious, why do you dislike big parser frameworks? You do get a lot for free. |
In this specific case, we have 185* |
Here is a benchmark that shows the issue. Each additional letter added doubles the running time:
|
I just realized that the example that makes backtracking necessary is not supported by the original script. I vote for removing backtracking completely. People should put mandatory positional arguments before repeatable positional arguments. Are there any software that uses the feature? Alternatively, we could keep track of how many mandatory positional arguments are left and not consume more than that. That way we can keep But that will probably not work in more complex cases. On the other hand, more complex cases are often not very useful anyways, since the structure gets lost. (But that example doesn't need backtracking either). Here is how to do it for the examples mentioned by keleshev:
A hybrid approach would be to keep backtracking, but only do it when explicitly needed by future patterns (eg. command above). This is more complicated and can't guarantee linear time. Regardless, if we want to keep backtracking, we should only do it for positional arguments. Flags are inherently unordered and it doesn't make sense to enforce multiple copies of the same flag, while at the same time having the flag be repeatable. TL;DR I don't think backtracking is ever necessary, at least not if we limit ourselves to a sensible subset of commands. |
Correct.
Yes. I think your analysis about doing backtracking only when necessary (e.g., only on positional arguments) might be worthwhile, but does certainly seem like a pretty complex endeavor. |
Actually, thinking a bit more, I wouldn't call the algorithm backtracking. It's more like brute force, since we enumerate (and even save!) every single possibility. (We would need a lazy list like in Haskell for this to be backtracking.) Changing it to do actual backtracking should make the best (and hopefully average) case linear, but I doubt we will ever get rid of an exponential (more specifically n^k, where n is number of supplied arguments and k is the number of repeatable options) time in the worst case if we want to support any possible combination.
However, ignoring all that, the fix for this particular issue is very simple. There is already a special case for flags inside optional. If we just expand it to support repeated flags inside optional as well, this particular issue goes away completely. I'll submit a pull request shortly. |
@anka-213 Awesome! Thank you much. You have no idea how much I appreciate your analysis on this! |
So, I guess we could close this issue now and report back at rust-lang/rust-bindgen#46? |
Yup! Sorry I forgot about this. Looks like @Yamakaky gave them a heads up. |
Rust program:
Argv string:
Not only does Docopt take a very long time, but it consumes a lot of memory as well.
Originally reported in: rust-lang/rust-bindgen#46
The text was updated successfully, but these errors were encountered: