Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CompleteByteSlice/CompleteStr are invasive and make parsers not reusable #795

Closed
RalfJung opened this issue Jun 17, 2018 · 12 comments
Closed
Milestone

Comments

@RalfJung
Copy link

Prerequisites

Here are a few things you should provide to help me understand the issue:

  • Rust version : 2018-06-15
  • nom version : 4.0.0
  • nom compilation features used: none

Test case

Say you have a parser like this

named!(doubles_and_int<&[u8],(f64,i64)>, do_parse!(
    tag!("DI") >>
    d: double >>
    space >>
    n: digit >>
    (d, str::from_utf8(n).unwrap().parse().unwrap())
));

And (for example because of #790) you want to change that parser to use CompleteByteSlice. It turns out that is quite hard, because (a) the code at the end of do_parse also needs changing to (un)wrap the CompleteByteSlice everywhere, and (b) much worse, double works for &[u8] only. Here's the solution I came up with eventually:

fn result_map<'a, T, I1, I2, F: FnOnce(I1) -> I2>(x: IResult<I1, T>, f: F) -> IResult<I2, T> {
    use nom::Context::Code;
    use nom::Err::*;
    match x {
        Ok((rest, t)) => Ok((f(rest), t)),
        Err(Incomplete(n)) => Err(Incomplete(n)),
        Err(Error(Code(i, k))) => Err(Error(Code(f(i), k))),
        Err(Failure(Code(i, k))) => Err(Failure(Code(f(i), k))),
    }
}
fn double_complete(i: CompleteByteSlice) -> IResult<CompleteByteSlice, f64> {
    result_map(double(i.0), CompleteByteSlice)
}
named!(doubles_and_int<CompleteByteSlice,(f64,i64)>, do_parse!(
    tag!("DI") >>
    d: double_complete >>
    space >>
    n: digit >>
    (d, str::from_utf8(n.0).unwrap().parse().unwrap())
));

This is clearly bad. It would help if there was a variant of double that works on CompleteByteSlice shipped with nom, but the problem actually goes deeper: I may have both streaming and non-streaming parsing in my project, and I may want to re-use some of the parsers. E.g., imagine doubles_and_int was used both on a stream, and in another parser combinator that involves many0!. Right now, the only way I found to make that work is to literally duplicate doubles_and_int, have one version that works on CompleteByteSlice and one that works on &[u8]. I tried wrapping the &[u8] one similar to how I wrapped double above, but that doesn't actually work: For combinators like take_until_and_consume! (and, I presume, tag! though I did not test that), they have to be used directly on a CompleteByteSlice or they will not behave correctly.

I get the impression that using the type to encode the EOF behavior was the wrong choice (though there may be a trick I am missing here). Whether more data can come later is a property of the value (the byte stream) I am parsing; the parsers themselves should mostly be generic and usable on both prefixes and entire inputs.

@RalfJung RalfJung changed the title CompleteByteSlice/CompleteStr are invasive and not reusable CompleteByteSlice/CompleteStr are invasive and make parsers not reusable Jun 17, 2018
@djc
Copy link
Contributor

djc commented Jun 17, 2018

Even if the type is the right way to do it, I was left wondering if it could be made ergonomic. Having to start specifying the input type for all the named! parsers seems unnecessarily invasive. What about having a trait that abstracts over CompleteByteSlice and &[u8] which is used by default?

@RReverser
Copy link
Contributor

@djc Personally I just ended up creating own macro which, in turns, calls named! with the types I need. That seems to work pretty well and straightforward.

@RalfJung
Copy link
Author

RalfJung commented Jul 2, 2018

@RReverser What do you do about combinators like double that are tied to the non-complete types?

@ian-p-cooke
Copy link

ian-p-cooke commented Jul 6, 2018

I noticed this issue while researching #809. @RalfJung you could use my floating_point parser instead of double. Note also that the trait T could be pulled out to a trait alias and then used for your own parsers that need to accept &[u8], &str, CompleteByteSlice, or CompleteStr. I determined T's constraints by experimentation so I know they work for floating_point but fewer or more constraints might be required for other parsers.

@ian-p-cooke
Copy link

here's a gist that shows what I meant in my previous comment. It's based on your doubles_and_int parser example.

@RalfJung
Copy link
Author

RalfJung commented Jul 8, 2018

@ian-p-cooke So I couldn't use named! but would instead write generic methods myself?

I'm also a little surprised about these nom::ParseTo<f64> + nom::ParseTo<i64> bounds. I expected floating_point would take care of both recognizing a float and parsing the input to a float, seems like related problems to me.

@ian-p-cooke
Copy link

@RalfJung yes, that was my idea... I think ideally the trait that abstracts over all the input types would be defined in nom:: somewhere and then named! would use that trait itself... then you could go back to using named!

I agree, the Trait bounds are weird. I didn't see another way to satisfy the nom::ParseTo requirement of the implementation of floating_point (which uses parse_to!). Maybe someone else has a better way.

@alexbool
Copy link

alexbool commented Jul 9, 2018

This issue was driving me so mad that I switched to combine even though it’s a bit slower for me. I will be very glad if nom’s mainteners find some way to make CompleteByteSlice more ergonomic

@ahmedcharles
Copy link
Contributor

The nom::types::Input type should allow implementing streaming and non-streaming input (note, at some point, streaming parsers have to reach eof for an arbitrary parser to complete). But Input doesn't implement InputTakeAtPosition.

It should also be possible to implement a generic parser over &[u8] or CompleteByteSlice, which is what I tried before I realized I could use Input, but that requires that FindToken<T> has impl's where T: AsChar.

Though, it seems that you can write one using where clauses like: InputIter<Item=u8>. And eventually it will work if you add enough clauses. It'd be nice to have a variation of named! that was generic and just allowed this to 'work'.

akshayknarayan added a commit to ccp-project/portus that referenced this issue Sep 4, 2018
As a workaround to rust-bakery/nom#544, migrate to
nom 4 to ensure that the verbose-errors feature becomes additive and
therefore portus compiles when used as a dependency.

There were two classes of changes to the parser structure:
1. Error handling, as predicted and outlined here:
https://github.com/Geal/nom/blob/master/doc/upgrading_to_nom_4.md#replacing-parser-result-matchers

2. Migration to "CompleteByteSlice".
This was predicted in the migration notes:
https://github.com/Geal/nom/blob/master/doc/upgrading_to_nom_4.md#dealing-with-incomplete-usage
but caused underlying errors as reported here:
rust-bakery/nom#790 (comment)

To address this, shadow nom's `named!` macro with our own,
`named_complete!`, which replaces the types appropriately. This is the
solution proposed here:
rust-bakery/nom#795 (comment)
akshayknarayan added a commit to ccp-project/portus that referenced this issue Sep 6, 2018
As a workaround to rust-bakery/nom#544, migrate to
nom 4 to ensure that the verbose-errors feature becomes additive and
therefore portus compiles when used as a dependency.

There were two classes of changes to the parser structure:
1. Error handling, as predicted and outlined here:
https://github.com/Geal/nom/blob/master/doc/upgrading_to_nom_4.md#replacing-parser-result-matchers

2. Migration to "CompleteByteSlice".
This was predicted in the migration notes:
https://github.com/Geal/nom/blob/master/doc/upgrading_to_nom_4.md#dealing-with-incomplete-usage
but caused underlying errors as reported here:
rust-bakery/nom#790 (comment)

To address this, shadow nom's `named!` macro with our own,
`named_complete!`, which replaces the types appropriately. This is the
solution proposed here:
rust-bakery/nom#795 (comment)
akshayknarayan added a commit to ccp-project/portus that referenced this issue Sep 7, 2018
As a workaround to rust-bakery/nom#544, migrate to
nom 4 to ensure that the verbose-errors feature becomes additive and
therefore portus compiles when used as a dependency.

There were two classes of changes to the parser structure:
1. Error handling, as predicted and outlined here:
https://github.com/Geal/nom/blob/master/doc/upgrading_to_nom_4.md#replacing-parser-result-matchers

2. Migration to "CompleteByteSlice".
This was predicted in the migration notes:
https://github.com/Geal/nom/blob/master/doc/upgrading_to_nom_4.md#dealing-with-incomplete-usage
but caused underlying errors as reported here:
rust-bakery/nom#790 (comment)

To address this, shadow nom's `named!` macro with our own,
`named_complete!`, which replaces the types appropriately. This is the
solution proposed here:
rust-bakery/nom#795 (comment)
@NamsooCho
Copy link
Contributor

NamsooCho commented Jan 8, 2019

Maybe does it help to fix this problem?

https://github.com/NamsooCho/nom3_test.git

nom3 is working well with many_m_n!().

https://github.com/NamsooCho/nom4_test.git

same code fails with nom4.

Because I used trait, It is not easy to change to CompleteBytesSlice.

@Geal
Copy link
Collaborator

Geal commented May 5, 2019

CompleteByteSlice and CompleteStr are removed in nom 5. Now there are complete/streaming versions of some parsers that make things much easier

@Geal Geal added this to the 5.0 milestone May 5, 2019
@Geal
Copy link
Collaborator

Geal commented Jun 17, 2019

nom 5's release is imminent, closing this now

@Geal Geal closed this as completed Jun 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants