Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: EXPERIMENT: Move from &'b str to Cow<'static, str> #1728

Closed
wants to merge 3 commits into from

Conversation

CreepySkeleton
Copy link
Contributor

@CreepySkeleton CreepySkeleton commented Mar 5, 2020

OK, this is the very early stage of the ambitious "get rid of lifetimes" plan.

As @pksunkara and I have discussed, it should be possible to remove lifetimes entirely by making use of Cow<'static, str/OsStr> instead of plain references. This would be very useful for those who generate their data at runtime, as described in #1041.

The plan is simple: turn fn method(self, data: &'b str) interface into fn method<T: Into<Cow<'static, str>>>(self, data: T). Since all of &str, &OsStr, String, OsString implement Into<Cow>, the transition should be more or less painless for our users.

There are three ways people could possibly build clap::App:

  • Use statically borrowed data (&'static str), explicitly written as app.help("some string"). Those would be turned into Cow::Borrowed, involving no clones.
  • Use dynamically owned data (String), the data is not needed anywhere else. Those would be turned into Cow::Owned(s) by moving the string into the app, no cloning would be involved either.
  • Use dynamically owned data (String), the data is needed somewhere else. The "out of luck" case, users would need to explicitly clone the data. For what it's worth, I've never seen such a usage in the wild.

Well, let's make it build and see the benches.

Closes #1041


List of significant changes I've made:

  • Turn slice-based API (&[&str]) into iterator based

List of TODOs:

  • Sanitize std::borrow::Cow

@kbknapp
Copy link
Member

kbknapp commented Mar 5, 2020

I agree the owned the data which is needed elsewhere is a rare case (and kind of goes against Rust's ownership/borrowing principals anyways), so it shouldn't be a priority. I'd be fine saying this case needs to clone the data.

I'm trying to remember off the top of my head, but I believe the problem I ran into last time I tried moving to Cow was that it wasn't very ergonomic to pass them around throughout the clap parsing process. However, this was a while back and Rust has moved a lot since then, so the story may have changed for the better.

My only concern would be using 'static as the lifetime as it precludes any shorter lifetime in it's place. There was/is an open issue for dynamically generating --no-* flags automatically. I.e. if I make a --skip-foo flag, there may be times where I also want a --no-skip-foo flag to negate it. This is something clap should be able to do easily, but due to lifetimes and ownership it's always been out of reach. Moving to Cow would solve this, but only if we ended up keeping a supertype of 'static. Because anything generated at runtime cannot be 'static if the user has `'static' help data there will be complaints about the dynamic data not living long enough or lifetime mismatches.

Just some thoughts 😄

@Dylan-DPC-zz
Copy link

I agree with Kevin here, Cow would make the API slightly trickier to use (from previous experience of using Cow on another unrelated project). I'd like to take a try at this and see if there's a different way, but post 3.0 release

@pksunkara
Copy link
Member

My only concern would be using 'static as the lifetime as it precludes any shorter lifetime in it's place. There was/is an open issue for dynamically generating --no-* flags automatically. I.e. if I make a --skip-foo flag, there may be times where I also want a --no-skip-foo flag to negate it. This is something clap should be able to do easily, but due to lifetimes and ownership it's always been out of reach. Moving to Cow would solve this, but only if we ended up keeping a supertype of 'static. Because anything generated at runtime cannot be 'static if the user has `'static' help data there will be complaints about the dynamic data not living long enough or lifetime mismatches.

So, we will still need one lifetime at the least and use it as Cow<'a, str>?

I think we can achieve this transition. AFAIK this will improve performance. I don't see why we want to do this post 3.0

@CreepySkeleton
Copy link
Contributor Author

I'm trying to remember off the top of my head, but I believe the problem I ran into last time I tried moving to Cow was that it wasn't very ergonomic to pass them around throughout the clap parsing process.

Yeah, I'm having certain difficulties transiting the internal machinery to Cow. For example, some parts of code work with &[&str], and one can't simply turn a slice over &str into &[Cow<'a, str>]. Those are irritating, but not fundamental. It's just I'll to have to adjust more code than I initially hoped I would 🤷‍♂ .

My only concern would be using 'static as the lifetime as it precludes any shorter lifetime in it's place. There was/is an open issue for dynamically generating --no-* flags automatically. I.e. if I make a --skip-foo flag, there may be times where I also want a --no-skip-foo flag to negate it.

So, we will still need one lifetime at the least and use it as Cow<'a, str>?

(This problem actually relates to TeXitoi/structopt#114)

No, we won't need shorter lifetimes. I've looked over #815 and I don't understand why you think we will.

I'm given impression that you're thinking about preceding the stored arguments with "--no-", like arg.long.insert(0, "--no-"); no, we won't need that, and it wouldn't be the best way to do that. Even if we chose to go this way, our code would look like this:

// somewhere in app._build()
for arg in self.args {
    if arg.ovveridable {
        let override = arg.clone();
        override.long.to_mut().insert_str(0, "--no-");
        self.args.push(override);
    }
}

It's not even bad or too-ineffective option. long names are short, and small allocations are fairly inexpensive.

No shorter lifetimes involved.

Alternative (and more sophisticated) way to implement it would be tweaking the parsing logic a bit:

// this code is placed just after the `opt starts with --` check
if opt.starts_with("no-") {
    let real_opt = parse_opt_name(opt[3..]);
    if let Some(flag) = flags.find(real_opt) {
        flag.set_to_off();
    } else {
        // resort back to normal parsing
    }
}

no- is only 3 bytes long, the check should be almost free.

Again, no shorter lifetimes in use.

(The TeXitoi/structopt#114 prefix problem is somewhat more interesting, but I believe we'll be able to process it just like --no- flags.)

Cow would make the API slightly trickier to use (from previous experience of using Cow on another unrelated project)

Hm, I assume the Into<Cow> (emphasis on the into part) mitigates the cumbersomeness of Cow, e.g users would still be able to use "str" literals with our API. Am I wrong?

@Dylan-DPC Would you kindly be more descriptive regarding to your experience with Cow so we could anticipate the possible issues?

but post 3.0 release

In any way, the change is breaking, so we either land it before 3.0.0, or not at all.

@pksunkara
Copy link
Member

Hm, I assume the Into (emphasis on the into part) mitigates the cumbersomeness of Cow, e.g users would still be able to use "str" literals with our API. Am I wrong?

Yup, using Into definitely helps.

@CreepySkeleton
Copy link
Contributor Author

The diff is going to be quite large and most of it will be trivial changes not worthy to waste reviewers' time on, so I'm going to mark really significant parts, the code I'll have had to actually rewrite, with // NEED_REVIEW. Once it's ready for review, please, pay some extra attention to those places.

@CreepySkeleton
Copy link
Contributor Author

CreepySkeleton commented Mar 9, 2020

let name = &self.usage[self.start..self.pos];
if self.prev == UsageToken::Unknown {
debugln!("UsageParser::name: setting name...{}", name);
arg.id = name.key();
arg.name = name;

I must confess, the progress has stalled. I encountered a snippet I can't easily deal with.

When Arg is being build from a usage string, the code tends to take subslices of the usage string and put them into the resulting arg (the snippet highlights one of those places).

But, if it would take Cow as input, it wouldn't be able to take a subslice of it and put the subslice into the resulting arg. Arg expects cow, but there's no clear way to "transform a subslice of cow into another cow" if the donor cow is Cow::Owned. What to do in this case?

I can think of a few options:

  • Limit UsageParser to &'stratic str so it would always result in Cow::Borrowed.
  • Always reallocate the subslice, effectively cloning this particular part of owned Cow.
  • Leak the owned data, transforming it into &'static str

So I ask of you, the wisest of clap, which path a humble contributor shall walk?

@pksunkara
Copy link
Member

Always reallocate the subslice, effectively cloning this particular part of owned Cow

This would be the safer way, wouldn't it?

@CreepySkeleton
Copy link
Contributor Author

Yes, but also the most inefficient. I would go for the first one if it was just up to me (I've never seen Arg::from_usage("...") calls with non-static strings).

@pksunkara
Copy link
Member

Wouldn't loading from YAML do that?

@CreepySkeleton
Copy link
Contributor Author

YAML loader has nothing to do with UsageParser AFAIK.

@pickfire
Copy link
Contributor

Always reallocate the subslice, effectively cloning this particular part of owned Cow

This would be the safer way, wouldn't it?

If the owned Cow never changed, why not just let the borrowed Cow to point to the same part of memory?

None
};
pub fn arg<A: Into<Arg>>(mut self, a: A) -> Self {
let help_heading: Option<Cow<'static, str>> =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be good to tag as_deref to be used for rust 1.40.0 later on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what as_deref has to do with the bit of code you highlighted. Could you please elaborate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

         let help_heading: Option<Cow<'static, str>> = self.help_headings.last().map(Deref::deref);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, thanks!

pub fn long<T: Into<Cow<'static, str>>>(mut self, l: T) -> Self {
// NEED_REVIEW
self.long = Some(match l.into() {
Cow::Borrowed(s) => Cow::Borrowed(s.trim_start_matches(|c| c == '-')),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just directly trim "--" from the start since long must be -- IIRC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the deal: you give me the working code that does exactly what you said, and I put your credentials to the codebase, accompanied by praises and immodest compliments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @pickfire means s.trim_start_matches("--")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CreepySkeleton credentials? Wow! Looks interesting but I don't get what you meant.

@pksunkara Correct, s.trim_start_matches("--"), just wondering why not this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering why not this one?

Because we would like to strip all the dashes. Your variant will leave one dash in case there's odd number of them.

credentials? Wow! Looks interesting but I don't get what you meant.

Your nickname, email, name, whatever you'd prefer to be adressed with; I recommend something in the spirit of "The Emperor of Programming and Button-pressing in general".

@CreepySkeleton
Copy link
Contributor Author

If the owned Cow never changed, why not just let the borrowed Cow to point to the same part of memory?

I've got a challenge for you: try to make this code compile. That should give you the understanding of the problem.

@pickfire
Copy link
Contributor

pickfire commented Mar 16, 2020

If the owned Cow never changed, why not just let the borrowed Cow to point to the same part of memory?

I've got a challenge for you: try to make this code compile. That should give you the understanding of the problem.

Looks hard. Sorry, now I understand why. But I did it by changing the parameter https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=799567fdeee5beaa9c997fe4d8c5b6f3

@CreepySkeleton
Copy link
Contributor Author

Looks hard. Sorry, now I understand why. But I did it by changing the parameter https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=799567fdeee5beaa9c997fe4d8c5b6f3

Unfortunately, we can't afford &'static self because we need to make and drop a UsageParser on the fly. The problem can be demonstrated even shorter. To put it simply: you can't borrow from an owned cow and then drop it.

I believe the solutions I presented above are the only solutions we can work with, and each of them has its own drawbacks:

  • Restricting ourselves to &'stratic str is safe and effective, but it is also the least flexible option since it would restrict the API.
  • Reallocating and copying is safe and flexible, but it would penalize owned Cows (no penalty to borrowed ones!).
  • Leaking the owned cow, effectively turning them into &'static str. Flexible and effective, but it is also a powderkeg in a sense that, if user calls Arg::from("...") in a loop, they have a potentially unlimited memory leak.

We just need to decide which penalty we can live with, and both @pksunkara and I think that copying is the best bet.

@TeXitoi
Copy link
Contributor

TeXitoi commented Mar 17, 2020

I vote copying, if you want perf, it's simple enough to avoid this particular feature.

@pickfire
Copy link
Contributor

@CreepySkeleton Isn't cloning instead of copying?

@CreepySkeleton
Copy link
Contributor Author

I used copying in the general sense, the meaning is that you copy bytes in memory, and the number of bytes depends on input.

@pickfire
Copy link
Contributor

@CreepySkeleton Yes, but copying are usually much cheaper than cloning.

@pickfire
Copy link
Contributor

@CreepySkeleton Would it be good to also have support for https://github.com/maciejhirsz/beef to have faster and smaller alternative to Cow?

@pickfire
Copy link
Contributor

Or if we want to reduce memory usage to prevent too many strings of the same version, we could use string interner. https://matklad.github.io/2020/03/22/fast-simple-rust-interner.html

@CreepySkeleton
Copy link
Contributor Author

@pickfire I wasn't using "copying" in Rust "Copy vs Clone" sense, I was using it in the common computer science "copy bytes from one location in memory to another location" meaning. I wasn't speaking Rust specific terminology which has apparently confused you.

Well, maybe beef is worthy looking into, but memory consumption is not our biggest concern right now. Remember - make it work, make it correct, make it fast; order is important. Also, I would like to keep std::borrow::Cow in the public API, so beef::Cow would be an implementation detail.

Strings interning is heavily specialized for "there are lots of copies of the same sting, and all we ever need is to check if two strings are equal". The former is not the case at all here.

Anyway, the bad streak in my life is almost over; I'm going to get back to coding on April, 2. But I feel like I should be striving for bugfixing first and pushing the new release.

@pksunkara pksunkara marked this pull request as draft April 20, 2020 09:15
@CreepySkeleton
Copy link
Contributor Author

This is long abandoned. Maybe I'll make more successful attempt one day.

@ldm0
Copy link
Member

ldm0 commented Sep 17, 2020

If the owned Cow never changed, why not just let the borrowed Cow to point to the same part of memory?

I've got a challenge for you: try to make this code compile. That should give you the understanding of the problem.

Accidently drive by, would code like this helps?

@CreepySkeleton
Copy link
Contributor Author

Yes, this is the most straightforward solution, and the one I would probably go with. my point was that we can't really share owned variants, even if they never change, we have to reallocate and copy. And this is what you do.

@ldm0
Copy link
Member

ldm0 commented Sep 17, 2020

my point was that we can't really share owned variants, even if they never change, we have to reallocate and copy.

"they never change" doesn't mean "they never drop" right? To share such owned variants, additional `'b' lifetime will be introduced, which is exactly what we want to forbid. So reallocating and copying is unforbiddable I guess?

So is there any other road-blocker?

@CreepySkeleton
Copy link
Contributor Author

"they never change" doesn't mean "they never drop" right? To share such owned variants, additional `'b' lifetime will be introduced, which is exactly what we want to forbid. So reallocating and copying is unforbiddable I guess?

Yeah, right. It's not a big deal anyway.

So is there any other road-blocker?

Apart from that the fact this is going to be a very big amount of very mundane work? None.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lifetimes of App, Arg, ArgGroup: removing or relaxing
7 participants