Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lifetimes — refs into StrInput<'i> are bound by &'i #172

Merged
merged 0 commits into from
Jan 18, 2018
Merged

Add lifetimes — refs into StrInput<'i> are bound by &'i #172

merged 0 commits into from
Jan 18, 2018

Conversation

kivikakk
Copy link
Contributor

@kivikakk kivikakk commented Nov 19, 2017

Here's a preliminary attempt at resolving #141. We bind the input Input by a lifetime; for StrInput, that's the lifetime of the string it references.

It works, happily! If you have pair: Pair<'i, Rule, StrInput<'i>> , then pair.as_str() correctly returns a &'i str (rather than &str bound by pair's lifetime). There isn't actually a test case added yet that demonstrates this, but I'd add one if we were merging. (Right now testing against a local library that I'm using pest with.)

There's one problem I haven't resolved and without which this cannot be merged: what to do about StringInput? Right now I've added two hacky transmute calls just so it'd compile and I could get on with the work ([1], [2]), but this needs to be resolved as it's currently super-unsound.

Thoughts welcome! If this isn't the direction you'd like to go (SO MANY LIFETIME REFERENCES), that's of course understandable; I just wanted to give this a hack, and no hard feelings if you don't merge!

(Fixes #141. Closes #6.)

/cc @sunng87 @dragostis

@kivikakk
Copy link
Contributor Author

(I thought of just dropping StringInput entirely, but then still have to deal with FileInput … I feel like carrying the memory of the input string around is something we could claim isn't part of the parser's responsibility? But it's too convenient to drop, probably.)

@sunng87
Copy link
Contributor

sunng87 commented Nov 19, 2017

This is exact what I wanted for #141 after adding StrInput.

I feel like carrying the memory of the input string around is something we could claim isn't part of the parser's responsibility?

Agreed.

@kivikakk
Copy link
Contributor Author

kivikakk commented Nov 19, 2017

I feel like carrying the memory of the input string around is something we could claim isn't part of the parser's responsibility?

Agreed.

Likewise, while it's cute that pest can do the "work" of reading an AsRef<Path> in for us, I feel like it's specific enough that it's not that interesting? (we're not saving that much work compared to someone doing the read in themselves, and most people will have needs that mean that can't use FileInput anyway, or that make doing so less efficient.)

@kivikakk
Copy link
Contributor Author

469bd8d represents what it'd look like if we did drop StringInput and FileInput. We don't have to do that, of course, but we'd need to work out what to do with StringInput's lifetime thing if not.

@kivikakk
Copy link
Contributor Author

Test added in 41224d0 which also demonstrates the reference returned from StrInput lives for as long as the reference it was initialised with, not just the reference to the StrInput itself.

The same test refuses to compile on master:

error[E0597]: `input2` does not live long enough
   --> src/inputs/string_input.rs:360:9
    |
358 |             unsafe { input2.slice(1, 3) }
    |                      ------ borrow occurs here
359 |
360 |         };
    |         ^ `input2` dropped here while still borrowed
...
363 |     }
    |     - borrowed value needs to live until here

@dragostis
Copy link
Contributor

I'm generally in favor of this change. The reason why I didn't begin with this was because I was unsure what the final API would look like and I was afraid it would be hindered by the huge amount of lifetimes, but seeing it now, it doesn't seem bad at all.

One big question remains, however. Should we drop Input? The original idea behind this was to have a batteries-included way to parse both simple strings and buffered files, but I see how this very design makes things pretty complicated. Dropping Input would mean that the pairs data structures could simply keep a reference to the &str they're parsing.

@kivikakk
Copy link
Contributor Author

One big question remains, however. Should we drop Input?

Super good question! I'll give it a shot (might not be until tomorrow) so we can see how much simpler it'd end up in practice.

@dragostis
Copy link
Contributor

@kivikakk Sounds great! Thank you so much for your time. I'm really excited for the future of the project.

@dragostis dragostis closed this Nov 19, 2017
@dragostis dragostis reopened this Nov 19, 2017
@kivikakk
Copy link
Contributor Author

kivikakk commented Nov 20, 2017

06bdd8f 93fb73c has the Input trait removed entirely, as well as most I type parameters. In most places where we pass <I> in master, we now pass <'i> instead as the lifetime of the input string reference.

@dragostis
Copy link
Contributor

This looks really exciting! I've been thinking about this for a bit and I'm wondering what exact the input for the parser should be. Here are the few solutions that I have in mind:

  • a simple &str. This means that all the unsafe code found in StrInput would probably move to Position/Span/Pair, with the exception of the line/col functions. These could come in the form of a separate trait implementing them in a similar spirit to what we have now.
  • a Chars-like Iterator over u8. (iterating over chars implies a UTF-8 boundary penalty, I'm afraid) This would probably require a proof-of-concept implementation just to make sure that performance is in check. (for example, the current version performs some basic vectorization through memcmps) This could keep track of lines and columns on-the-fly with what I hope are small setbacks in terms of performance. However, this would change the time characteristic of the line/col functions from O(n) to O(1) and also be easily pluggable in terms of files and buffered files.
  • keep StrInput.

@sunng87
Copy link
Contributor

sunng87 commented Nov 23, 2017

a Chars-like Iterator over u8

If the input is an iterator, can we generate the result slice from it (without overhead)?

@dragostis
Copy link
Contributor

@sunng87, we could have a trait that extends Iterator and does exactly that.

@dragostis
Copy link
Contributor

I'm still not convinced whether there is any strong use case of pest outside &strs. Removing all the extra abstraction would simplify the API and implementation substantially, with all the logic from StrInput moved to Position, Span, and Error. The single major downside this has is when the string you want to parse does not fit in memory. Opinions?

@kivikakk
Copy link
Contributor Author

The more I think about this, the more I think just using &str on its own is fine. I haven't done this exploratory refactor yet as I haven't had the free time, but I may well this week! 🙂

@kivikakk
Copy link
Contributor Author

kivikakk commented Dec 1, 2017

The refactor is complete (226314f). I'll try to make codecov happy.

@dragostis
Copy link
Contributor

@kivikakk, nice work! I'll review the PR then.

@kivikakk
Copy link
Contributor Author

kivikakk commented Dec 2, 2017

All greens (finally!) 👍

Copy link
Contributor

@dragostis dragostis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splendid work! 😃 Only a few papercuts in the code, but there seems to be a small performance regression against master in the pest_grammar json bench.


use pest::inputs::{Input, Position};
use pest::inputs::Position;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be more appropriate to rename this to input now. Maybe even something else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see Position and Span as so related any more. The former does matching operations and transformations; the latter is more of a useful container. I think I'll pull them out of a module, as there's not really one concept that binds them together.

@@ -7,14 +7,8 @@

//! A `mod` containing the `Input`-related constructs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be amended to its new name/purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -4,48 +4,44 @@
// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at http://mozilla.org/MPL/2.0/.

#[allow(unused_imports)] use std::ascii::AsciiExt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the attribute until the next version of Rust is stable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

pub struct Position<'i> {
input: &'i str,
pos: usize,
__phantom: ::std::marker::PhantomData<&'i str>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

use super::span;
use super::super::util::hash_str;

/// A `struct` containing a position that is tied to an `Input` which provides useful methods to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All documentation mentioning Input should be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

start: usize,
end: usize
end: usize,
__phantom: ::std::marker::PhantomData<&'i str>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed.

start: usize
input: &'i str,
start: usize,
__phantom: ::std::marker::PhantomData<&'i str>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed.

start: usize,
end: usize
end: usize,
__phantom: ::std::marker::PhantomData<&'i str>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed.

index: usize,
start: usize,
end: usize
end: usize,
__phantom: ::std::marker::PhantomData<&'i str>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed.

@@ -146,7 +145,7 @@ impl<R: RuleType> PrecClimber<R> {
/// let primary = |pair| {
/// consume(pair, climber)
/// };
/// let infix = |lhs: i32, op: Pair<Rule, StringInput>, rhs: i32| {
/// let infix = |lhs: i32, op: Pair<Rule, StrInput>, rhs: i32| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to use the new 'i signature. Best solution would probably be to grep after Input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@kivikakk
Copy link
Contributor Author

kivikakk commented Dec 4, 2017

I do note the performance regression as well; this surprises me, particularly as the net effect is removing a layer of indirection!

@kivikakk
Copy link
Contributor Author

kivikakk commented Dec 4, 2017

ashqjfkldechrtushxshuocrcsn;jh code coverage

@kivikakk
Copy link
Contributor Author

kivikakk commented Dec 4, 2017

lifetimes branch master branch
screen shot 2017-12-04 at 3 35 20 pm screen shot 2017-12-04 at 3 35 43 pm

One thing I'm noticing, drilling down into this, is that parse::rules::escape is the number 1 item (37.53%) in the lifetimes branch perf report, whereas it doesn't appear at all in the master branch perf report.

The entire diff between master...kivikakk:lifetimes in the pest_grammars subtree is essentially trivial. Continuing to look into it.

@kivikakk
Copy link
Contributor Author

kivikakk commented Dec 4, 2017

Oh, there's not a parse::rules::escape symbol in the master version of the binary. I guess it gets inlined or somehow optimised out.

@kivikakk
Copy link
Contributor Author

kivikakk commented Dec 4, 2017

Bisecting shows that we take our first perf hit when removing the Input trait in a71f15e, and then another big chunk when taking out StrInput in eb75d36 …?!

@kivikakk
Copy link
Contributor Author

kivikakk commented Dec 4, 2017

I'm pretty mystified by this! Any direction from anyone with a clue would be appreciated.

@dragostis
Copy link
Contributor

dragostis commented Dec 7, 2017

The inlining seems particularly weird; if anything, inlining should be easier now that the API is simpler. One particular culprit could be the fact that, unlike Rc, &str is a fat pointer. How big is the first performance hit?

In the meantime, I'll try to figure out whether the fat pointer is impacting performance in any possible way.

@dragostis
Copy link
Contributor

dragostis commented Dec 7, 2017

I think that the key here would be to add a few microbenchmarks in pest that measure Position (creation, matching, repetition, etc.) & Pair (creation, conversion) performance. That way it would be easier to triage the exact issue. Lemme know if you need any help with this.

@emoon
Copy link
Contributor

emoon commented Dec 7, 2017

The inlining seems particularly weird; if anything, inlining should be easier now that the API is simpler. One particular culprit could be the fact that, unlike Rc, &str is a fat pointer. How big is the first performance hit?

While I haven't looked that much at Rust performance (I have done it a lot for C/C++) I'm not sure why &str would be a fat pointer here? A fat pointer should only happen for Trait Objects afaik but &str should be a concrete type here. Please correct me if this isn't true for some reason :)

@dragostis
Copy link
Contributor

@emoon &str is a fat pointer because it also contains size information.

@emoon
Copy link
Contributor

emoon commented Dec 7, 2017

@emoon &str is a fat pointer because it also contains size information.

Ah right. Still I find it hard to think that is the issue here but thank for clarifying that :)

@Keats
Copy link
Contributor

Keats commented Jan 11, 2018

Posted it on gitter but on my laptop with rustc 1.25.0-nightly (b5392f545 2018-01-08):

  • master: 106,789 ns/iter (+/- 3,082)
  • this pr: 81,645 ns/iter (+/- 1,589)

so it might be worth trying the bench again, maybe it was a perf regression in the nightly used at the time to run the benches.

@dragostis
Copy link
Contributor

@Keats Seems to be a regression in rustc.

@kivikakk
Copy link
Contributor Author

This is exciting news! 😄

@dragostis
Copy link
Contributor

dragostis commented Jan 12, 2018

I've dug a little bit. The performance regression of this PR seems to be caused by some extra heap allocations. Simplest way to see the difference is to revert to nightly-2017-12-04 (or with latest nightly with one single codegen unit) and to run string rule on a sample string in the benchmark instead of the JSON file.

callgrind on master:

Total Instructions...1,206,891

125,776 (10.4%) dl-lookup.c:do_lookup_x
-----------------------------------------------------------------------
58,961 (4.9%) dl-lookup.c:_dl_lookup_symbol_x
-----------------------------------------------------------------------
56,298 (4.7%) vfscanf.c:_IO_vfscanf
-----------------------------------------------------------------------
55,550 (4.6%) arena.c:je_arena_tcache_fill_small
-----------------------------------------------------------------------
55,065 (4.6%) strcmp.S:strcmp'2
-----------------------------------------------------------------------
47,514 (3.9%) dl-machine.h:_dl_relocate_object
-----------------------------------------------------------------------
33,502 (2.8%) arena.c:je_arena_dalloc_bin_junked_locked
-----------------------------------------------------------------------
29,160 (2.4%) bitmap.h:je_arena_tcache_fill_small
-----------------------------------------------------------------------
28,996 (2.4%) dl-misc.c:_dl_name_match_p
-----------------------------------------------------------------------
27,664 (2.3%) arena.h:je_arena_tcache_fill_small

vs. lifetimes:

Total Instructions...1,389,901

125,776 (9.0%) dl-lookup.c:do_lookup_x
-----------------------------------------------------------------------
61,202 (4.4%) arena.c:je_arena_tcache_fill_small
-----------------------------------------------------------------------
58,961 (4.2%) dl-lookup.c:_dl_lookup_symbol_x
-----------------------------------------------------------------------
55,179 (4.0%) vfscanf.c:_IO_vfscanf
-----------------------------------------------------------------------
55,065 (4.0%) strcmp.S:strcmp'2
-----------------------------------------------------------------------
47,514 (3.4%) dl-machine.h:_dl_relocate_object
-----------------------------------------------------------------------
43,502 (3.1%) arena.c:je_arena_dalloc_bin_junked_locked
-----------------------------------------------------------------------
36,720 (2.6%) bitmap.h:je_arena_tcache_fill_small
-----------------------------------------------------------------------
31,791 (2.3%) arena.h:je_arena_dalloc_bin_junked_locked
-----------------------------------------------------------------------
30,446 (2.2%) arena.h:je_arena_tcache_fill_small

The difference is small, but it's probably the main issue. By checking the code it's very hard to see exactly where this difference might be coming from, since there are no apparent extra allocations. A next step would probably be to try and run this experiment with all the debug information present, provided the results will show a similar difference.

@Keats
Copy link
Contributor

Keats commented Jan 16, 2018

Looking at callgrind, I get the following as heavy costs for the data bench:

2018-01-16-180258_1914x1029_scrot

This basically just confirms what @kivikakk found that escape doesn't get inlined for some reasons.

The total instruction fetch cost is roughly similar there.

Running it for the benchmark only on strings:

#[bench]
fn string(b: &mut Bencher) {
    b.iter(|| {
         // parse_str -> parse on lifetimes branch
        JsonParser::parse_str(Rule::string, r#""hello world""#).unwrap().next().unwrap()
    });
}

I get:
2018-01-16-182844_1912x1031_scrot

All rules seem to allocate a bit more in the lifetimes branch except the <>::parse::rules::string::{{closure}}::{{closure}}::{{closure}}::{{closure}}::{{closure}}::{{closure}} for some reason

@daboross
Copy link

Could the inlining differences be because a lot of code which used to be generic is now concrete? If I recall correctly, rustc will include the source for all generic code (stuff which was all parameterized around <I>), and let it be inlined more.

With this commit, all of that is now concrete, and only generic over a lifetime - which no longer requires source to be included.

If this is the case, maybe the regressions could be fixed with #[inline] markers on things that used to be generic?

@Keats
Copy link
Contributor

Keats commented Jan 17, 2018

We tried that yesterday, adding #[inline(always)] (just #[inline] does nothing) to all generated rules do speed things up but doing the same on master does the same speedup so there is still a gap:

// lifetimes branch
test data   ... bench:      60,454 ns/iter (+/- 3,940)

// master
test data ... bench:      42,066 ns/iter (+/- 4,044)

@dragostis
Copy link
Contributor

I've managed to fix the performance issue.

Big, big shoutout to @kivikakk for the amazing work done! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lifetime issue with span.as_str() Add buffered file input.
6 participants