Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

take_until but parses the input it takes #199

Closed
Lythenas opened this issue Oct 2, 2018 · 8 comments
Closed

take_until but parses the input it takes #199

Lythenas opened this issue Oct 2, 2018 · 8 comments

Comments

@Lythenas
Copy link

Lythenas commented Oct 2, 2018

Is it possible to parse the input consumed by take with another parser instead of just accumulating it in String? Specifically I want to parse something that looks like:

#+BEGIN_name
... some content
#+END_name

Since name is dynamic and has to be the same in the start and end line I can't use something like between. Also I don't really want the parser that's responsible for the content to know about name.

Currently what I'm thinking about doing is using and_then to just take_until the end line, collect the content in a String and create a new stream from the collected string and parse it.

But I'm wondering if there is a better way of doing this.

@Marwes
Copy link
Owner

Marwes commented Oct 2, 2018

I'd use BEGIN_name.then(|b| (content, END_name(b)) https://docs.rs/combine/3.5.2/combine/trait.Parser.html#method.then

@Lythenas
Copy link
Author

Lythenas commented Oct 2, 2018

I just realized the content doesn't need to be parsed I just need it as a string. This makes things a lot easier.

But in general if I wanted to parse content further but didn't want the content parser to know anything about when to stop. Say the parser parses a list of lines for example. Would a parser that does the following be possible:

  1. look ahead to find the end position of the content to be parsed
  2. restrict the content parser to parse between the current position and the end position

@Marwes
Copy link
Owner

Marwes commented Oct 2, 2018

It is possible using https://docs.rs/combine/3.5.2/combine/trait.Parser.html#method.flat_map the only real problem is that the reported error position will point into the sub-input so it would need to be fixed if that is an issue (Probably possible using https://docs.rs/combine/3.5.2/combine/fn.position.html to get the position before the sub input)

@Lythenas
Copy link
Author

Lythenas commented Oct 2, 2018

This took me a lot of fiddling around but it works now:

captures(&*RE_START)
            .map(|vec: Vec<&str>| vec[2].to_string())
            .then(|name| {
                let re =
                    Regex::new(&format!(r"([ \t]*)#\+END_{}\n?", regex::escape(&name))).unwrap();
                (
                    value(name),
                    position(),
                    recognize(skip_until(find(re.clone()))),
                )
                    .flat_map(|(name, position, content_str): (String, usize, &str)| {
                        use combine::stream::state::{IndexPositioner, State};
                        let input = State::with_positioner(
                            content_str,
                            IndexPositioner::new_with_position(position),
                        );
                        content_data()
                            .easy_parse(input)
                            .map(|(content_data, _rest)| (name, content_data))
                    })
                    .skip(find(re))
            }),

I even got the correct position to work. The only thing wrong with error is that it both contains: "end of input" and "unexpected token #". But I think that is OK since it is an unexpected end of the content.

Err(
    Errors {
        position: 18,
        errors: [
            Unexpected(
                Borrowed(
                    "end of input"
                )
            ),
            Expected(
                Token(
                    'x'
                )
            ),
            Unexpected(
                Token(
                    '#'
                )
            )
    }
)

@Lythenas
Copy link
Author

Lythenas commented Oct 2, 2018

Btw do you think it would be faster to use the regex above or use something like

(spaces(), range(format!("#+BEGIN_{}", name)))

@Marwes
Copy link
Owner

Marwes commented Oct 3, 2018

Btw do you think it would be faster to use the regex above or use something like

I'd expect that to be faster, compiling a regex is fairly expensive (compared to matching against a single string) so generally regexes should be compiled once and used many times for them to be efficient.

@Marwes
Copy link
Owner

Marwes commented Oct 3, 2018

I even got the correct position to work. The only thing wrong with error is that it both contains: "end of input" and "unexpected token #". But I think that is OK since it is an unexpected end of the content.

Since you are explicitly using easy::Errors you could always filter out that from the Vec if you want as well. Can't really think about a better way combine itself could handle it automatically for this exact use.

@Lythenas
Copy link
Author

Lythenas commented Oct 3, 2018

Yes this is fine. Thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants