take_until but parses the input it takes #199

Lythenas · 2018-10-02T13:31:14Z

Is it possible to parse the input consumed by take with another parser instead of just accumulating it in String? Specifically I want to parse something that looks like:

#+BEGIN_name
... some content
#+END_name

Since name is dynamic and has to be the same in the start and end line I can't use something like between. Also I don't really want the parser that's responsible for the content to know about name.

Currently what I'm thinking about doing is using and_then to just take_until the end line, collect the content in a String and create a new stream from the collected string and parse it.

But I'm wondering if there is a better way of doing this.

The text was updated successfully, but these errors were encountered:

Marwes · 2018-10-02T14:19:39Z

I'd use BEGIN_name.then(|b| (content, END_name(b)) https://docs.rs/combine/3.5.2/combine/trait.Parser.html#method.then

Lythenas · 2018-10-02T18:32:54Z

I just realized the content doesn't need to be parsed I just need it as a string. This makes things a lot easier.

But in general if I wanted to parse content further but didn't want the content parser to know anything about when to stop. Say the parser parses a list of lines for example. Would a parser that does the following be possible:

look ahead to find the end position of the content to be parsed
restrict the content parser to parse between the current position and the end position

Marwes · 2018-10-02T19:23:10Z

It is possible using https://docs.rs/combine/3.5.2/combine/trait.Parser.html#method.flat_map the only real problem is that the reported error position will point into the sub-input so it would need to be fixed if that is an issue (Probably possible using https://docs.rs/combine/3.5.2/combine/fn.position.html to get the position before the sub input)

Lythenas · 2018-10-02T22:09:01Z

This took me a lot of fiddling around but it works now:

captures(&*RE_START)
            .map(|vec: Vec<&str>| vec[2].to_string())
            .then(|name| {
                let re =
                    Regex::new(&format!(r"([ \t]*)#\+END_{}\n?", regex::escape(&name))).unwrap();
                (
                    value(name),
                    position(),
                    recognize(skip_until(find(re.clone()))),
                )
                    .flat_map(|(name, position, content_str): (String, usize, &str)| {
                        use combine::stream::state::{IndexPositioner, State};
                        let input = State::with_positioner(
                            content_str,
                            IndexPositioner::new_with_position(position),
                        );
                        content_data()
                            .easy_parse(input)
                            .map(|(content_data, _rest)| (name, content_data))
                    })
                    .skip(find(re))
            }),

I even got the correct position to work. The only thing wrong with error is that it both contains: "end of input" and "unexpected token #". But I think that is OK since it is an unexpected end of the content.

Err(
    Errors {
        position: 18,
        errors: [
            Unexpected(
                Borrowed(
                    "end of input"
                )
            ),
            Expected(
                Token(
                    'x'
                )
            ),
            Unexpected(
                Token(
                    '#'
                )
            )
    }
)

Lythenas · 2018-10-02T22:09:05Z

Btw do you think it would be faster to use the regex above or use something like

(spaces(), range(format!("#+BEGIN_{}", name)))

Marwes · 2018-10-03T09:28:10Z

Btw do you think it would be faster to use the regex above or use something like

I'd expect that to be faster, compiling a regex is fairly expensive (compared to matching against a single string) so generally regexes should be compiled once and used many times for them to be efficient.

Marwes · 2018-10-03T09:34:00Z

I even got the correct position to work. The only thing wrong with error is that it both contains: "end of input" and "unexpected token #". But I think that is OK since it is an unexpected end of the content.

Since you are explicitly using easy::Errors you could always filter out that from the Vec if you want as well. Can't really think about a better way combine itself could handle it automatically for this exact use.

Lythenas · 2018-10-03T10:40:04Z

Yes this is fine. Thanks for your help.

Lythenas closed this as completed Oct 3, 2018

ckiee mentioned this issue Feb 21, 2022

Native/abstracted sub-parsers #340

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

take_until but parses the input it takes #199

take_until but parses the input it takes #199

Lythenas commented Oct 2, 2018

Marwes commented Oct 2, 2018

Lythenas commented Oct 2, 2018

Marwes commented Oct 2, 2018

Lythenas commented Oct 2, 2018

Lythenas commented Oct 2, 2018 •

edited

Loading

Marwes commented Oct 3, 2018

Marwes commented Oct 3, 2018

Lythenas commented Oct 3, 2018

take_until but parses the input it takes #199

take_until but parses the input it takes #199

Comments

Lythenas commented Oct 2, 2018

Marwes commented Oct 2, 2018

Lythenas commented Oct 2, 2018

Marwes commented Oct 2, 2018

Lythenas commented Oct 2, 2018

Lythenas commented Oct 2, 2018 • edited Loading

Marwes commented Oct 3, 2018

Marwes commented Oct 3, 2018

Lythenas commented Oct 3, 2018

Lythenas commented Oct 2, 2018 •

edited

Loading