CompleteByteSlice/CompleteStr are invasive and make parsers not reusable #795

RalfJung · 2018-06-17T15:00:42Z

Prerequisites

Here are a few things you should provide to help me understand the issue:

Rust version : 2018-06-15
nom version : 4.0.0
nom compilation features used: none

Test case

Say you have a parser like this

named!(doubles_and_int<&[u8],(f64,i64)>, do_parse!(
    tag!("DI") >>
    d: double >>
    space >>
    n: digit >>
    (d, str::from_utf8(n).unwrap().parse().unwrap())
));

And (for example because of #790) you want to change that parser to use CompleteByteSlice. It turns out that is quite hard, because (a) the code at the end of do_parse also needs changing to (un)wrap the CompleteByteSlice everywhere, and (b) much worse, double works for &[u8] only. Here's the solution I came up with eventually:

fn result_map<'a, T, I1, I2, F: FnOnce(I1) -> I2>(x: IResult<I1, T>, f: F) -> IResult<I2, T> {
    use nom::Context::Code;
    use nom::Err::*;
    match x {
        Ok((rest, t)) => Ok((f(rest), t)),
        Err(Incomplete(n)) => Err(Incomplete(n)),
        Err(Error(Code(i, k))) => Err(Error(Code(f(i), k))),
        Err(Failure(Code(i, k))) => Err(Failure(Code(f(i), k))),
    }
}
fn double_complete(i: CompleteByteSlice) -> IResult<CompleteByteSlice, f64> {
    result_map(double(i.0), CompleteByteSlice)
}
named!(doubles_and_int<CompleteByteSlice,(f64,i64)>, do_parse!(
    tag!("DI") >>
    d: double_complete >>
    space >>
    n: digit >>
    (d, str::from_utf8(n.0).unwrap().parse().unwrap())
));

This is clearly bad. It would help if there was a variant of double that works on CompleteByteSlice shipped with nom, but the problem actually goes deeper: I may have both streaming and non-streaming parsing in my project, and I may want to re-use some of the parsers. E.g., imagine doubles_and_int was used both on a stream, and in another parser combinator that involves many0!. Right now, the only way I found to make that work is to literally duplicate doubles_and_int, have one version that works on CompleteByteSlice and one that works on &[u8]. I tried wrapping the &[u8] one similar to how I wrapped double above, but that doesn't actually work: For combinators like take_until_and_consume! (and, I presume, tag! though I did not test that), they have to be used directly on a CompleteByteSlice or they will not behave correctly.

I get the impression that using the type to encode the EOF behavior was the wrong choice (though there may be a trick I am missing here). Whether more data can come later is a property of the value (the byte stream) I am parsing; the parsers themselves should mostly be generic and usable on both prefixes and entire inputs.

The text was updated successfully, but these errors were encountered:

djc · 2018-06-17T19:44:28Z

Even if the type is the right way to do it, I was left wondering if it could be made ergonomic. Having to start specifying the input type for all the named! parsers seems unnecessarily invasive. What about having a trait that abstracts over CompleteByteSlice and &[u8] which is used by default?

RReverser · 2018-06-19T15:57:47Z

@djc Personally I just ended up creating own macro which, in turns, calls named! with the types I need. That seems to work pretty well and straightforward.

RalfJung · 2018-07-02T07:39:34Z

@RReverser What do you do about combinators like double that are tied to the non-complete types?

ian-p-cooke · 2018-07-06T12:28:11Z

I noticed this issue while researching #809. @RalfJung you could use my floating_point parser instead of double. Note also that the trait T could be pulled out to a trait alias and then used for your own parsers that need to accept &[u8], &str, CompleteByteSlice, or CompleteStr. I determined T's constraints by experimentation so I know they work for floating_point but fewer or more constraints might be required for other parsers.

ian-p-cooke · 2018-07-06T14:21:36Z

here's a gist that shows what I meant in my previous comment. It's based on your doubles_and_int parser example.

RalfJung · 2018-07-08T16:23:27Z

@ian-p-cooke So I couldn't use named! but would instead write generic methods myself?

I'm also a little surprised about these nom::ParseTo<f64> + nom::ParseTo<i64> bounds. I expected floating_point would take care of both recognizing a float and parsing the input to a float, seems like related problems to me.

ian-p-cooke · 2018-07-09T11:22:52Z

@RalfJung yes, that was my idea... I think ideally the trait that abstracts over all the input types would be defined in nom:: somewhere and then named! would use that trait itself... then you could go back to using named!

I agree, the Trait bounds are weird. I didn't see another way to satisfy the nom::ParseTo requirement of the implementation of floating_point (which uses parse_to!). Maybe someone else has a better way.

alexbool · 2018-07-09T15:50:27Z

This issue was driving me so mad that I switched to combine even though it’s a bit slower for me. I will be very glad if nom’s mainteners find some way to make CompleteByteSlice more ergonomic

ahmedcharles · 2018-08-12T06:40:00Z

The nom::types::Input type should allow implementing streaming and non-streaming input (note, at some point, streaming parsers have to reach eof for an arbitrary parser to complete). But Input doesn't implement InputTakeAtPosition.

It should also be possible to implement a generic parser over &[u8] or CompleteByteSlice, which is what I tried before I realized I could use Input, but that requires that FindToken<T> has impl's where T: AsChar.

Though, it seems that you can write one using where clauses like: InputIter<Item=u8>. And eventually it will work if you add enough clauses. It'd be nice to have a variation of named! that was generic and just allowed this to 'work'.

As a workaround to rust-bakery/nom#544, migrate to nom 4 to ensure that the verbose-errors feature becomes additive and therefore portus compiles when used as a dependency. There were two classes of changes to the parser structure: 1. Error handling, as predicted and outlined here: https://github.com/Geal/nom/blob/master/doc/upgrading_to_nom_4.md#replacing-parser-result-matchers 2. Migration to "CompleteByteSlice". This was predicted in the migration notes: https://github.com/Geal/nom/blob/master/doc/upgrading_to_nom_4.md#dealing-with-incomplete-usage but caused underlying errors as reported here: rust-bakery/nom#790 (comment) To address this, shadow nom's `named!` macro with our own, `named_complete!`, which replaces the types appropriately. This is the solution proposed here: rust-bakery/nom#795 (comment)

NamsooCho · 2019-01-08T22:50:29Z

Maybe does it help to fix this problem?

https://github.com/NamsooCho/nom3_test.git

nom3 is working well with many_m_n!().

https://github.com/NamsooCho/nom4_test.git

same code fails with nom4.

Because I used trait, It is not easy to change to CompleteBytesSlice.

Geal · 2019-05-05T10:48:48Z

CompleteByteSlice and CompleteStr are removed in nom 5. Now there are complete/streaming versions of some parsers that make things much easier

Geal · 2019-06-17T14:38:15Z

nom 5's release is imminent, closing this now

RalfJung mentioned this issue Jun 17, 2018

many0 returns Incomplete in weird cases #790

Closed

RalfJung changed the title ~~CompleteByteSlice/CompleteStr are invasive and not reusable~~ CompleteByteSlice/CompleteStr are invasive and make parsers not reusable Jun 17, 2018

dkharlan mentioned this issue Jul 24, 2018

Replace the reader with a more flexible implementation dkharlan/schematic#11

Open

Geal mentioned this issue Oct 6, 2018

usability: address changes introduced with Complete* and many* combinators #839

Closed

Geal added this to the 5.0 milestone May 5, 2019

Geal closed this as completed Jun 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CompleteByteSlice/CompleteStr are invasive and make parsers not reusable #795

CompleteByteSlice/CompleteStr are invasive and make parsers not reusable #795

RalfJung commented Jun 17, 2018

djc commented Jun 17, 2018

RReverser commented Jun 19, 2018

RalfJung commented Jul 2, 2018

ian-p-cooke commented Jul 6, 2018 •

edited

Loading

ian-p-cooke commented Jul 6, 2018

RalfJung commented Jul 8, 2018

ian-p-cooke commented Jul 9, 2018

alexbool commented Jul 9, 2018

ahmedcharles commented Aug 12, 2018

NamsooCho commented Jan 8, 2019 •

edited

Loading

Geal commented May 5, 2019

Geal commented Jun 17, 2019

CompleteByteSlice/CompleteStr are invasive and make parsers not reusable #795

CompleteByteSlice/CompleteStr are invasive and make parsers not reusable #795

Comments

RalfJung commented Jun 17, 2018

Prerequisites

Test case

djc commented Jun 17, 2018

RReverser commented Jun 19, 2018

RalfJung commented Jul 2, 2018

ian-p-cooke commented Jul 6, 2018 • edited Loading

ian-p-cooke commented Jul 6, 2018

RalfJung commented Jul 8, 2018

ian-p-cooke commented Jul 9, 2018

alexbool commented Jul 9, 2018

ahmedcharles commented Aug 12, 2018

NamsooCho commented Jan 8, 2019 • edited Loading

Geal commented May 5, 2019

Geal commented Jun 17, 2019

ian-p-cooke commented Jul 6, 2018 •

edited

Loading

NamsooCho commented Jan 8, 2019 •

edited

Loading