Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime stack overflow when lexing certain strings #424

Open
Philogy opened this issue Sep 22, 2024 · 4 comments
Open

Runtime stack overflow when lexing certain strings #424

Philogy opened this issue Sep 22, 2024 · 4 comments
Labels
bug Something isn't working help wanted Extra attention is needed question Further information is requested

Comments

@Philogy
Copy link

Philogy commented Sep 22, 2024

I was trying to get C-style multiline comment working in logos, the state machine for it is quite simple but I can't seem to get it working in logos. This regex seemed to work but it causes a panic when parsing certain things.

The code for report:

use logos::Logos;

#[derive(Logos, Debug, PartialEq)]
#[logos(skip r"[ \t\n\f\r]+")] // Ignore this regex pattern between tokens
enum Token {
    // Tokens can be literal strings, of any length.
    #[token("#define")]
    Define,

    #[token("macro")]
    Macro,

    #[regex("//[^\n]*\n?", logos::skip)]
    LineComment,

    #[regex("/\\*([^\\*]*(\\*[^/])?)*\\*/", logos::skip)]
    MultiLineComment,

    // Or regular expressions.
    #[regex("0x[0-9a-fA-F]+")]
    HexLiteral,

    #[regex("[a-zA-Z_]\\w*:")]
    Label,

    #[regex("[a-zA-Z_]\\w*")]
    Ident,
}

fn main() {
    let src = "
/*wow amazing!!!!!*** /*  **/
// wow very nice
#define macro hi: very nice

";

    let mut lexer = Token::lexer(src);
    while let Some(token) = lexer.next() {
        println!("{:?} {}", token, lexer.slice());
    }
}

The error I'm getting:

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
[1]    5609 abort      cargo run
@Philogy Philogy changed the title Runtime panic when parsing lexing Runtime panic when lexing based on certain regex Sep 22, 2024
@Philogy Philogy changed the title Runtime panic when lexing based on certain regex Runtime stack overflow when lexing certain strings Sep 22, 2024
@jeertmans jeertmans added question Further information is requested bug Something isn't working labels Sep 26, 2024
@jeertmans
Copy link
Collaborator

Hello @Philogy, I currently haven't much time to invest in this issue, but I would recommend to you the same thing as I do for all comment-style lexing: just create a token that matches the start of a comment, and then process the comment with a callback. This is usually much better, as comments can contain almost any characters, like escaped /, which makes it super hard to write a regex that handles all specific cases.

See #421 (comment) for an example on XML comments, which is very similar to multiline strings.

@jeertmans
Copy link
Collaborator

jeertmans commented Sep 26, 2024

Looks like this is a duplicate of #400, so closing this anyway :-)

@conradludgate
Copy link

conradludgate commented Oct 22, 2024

Looks like this is a duplicate of #400, so closing this anyway :-)

Looks different to me. #400 seems to error in the derive, whereas this errors at runtime.

For what it's worth, I also encountered a stack overflow/infinite loop at runtime with a small test case:

#[derive(Logos, Debug, PartialEq)]
enum TestToken {
    #[regex("c(a*b?)*c")]
    Token
}

#[cfg(test)]
mod logos_test {
    use logos::Logos;

    use crate::TestToken;

    #[test]
    fn overflow() {
        let _ = TestToken::lexer("c").next();
    }
}

@jeertmans jeertmans reopened this Nov 16, 2024
@jeertmans jeertmans added the help wanted Extra attention is needed label Nov 16, 2024
@Melyodas
Copy link

I toyed a bit with adding a Mermaid output from the lexer.

Looking at the graph from the test above, the issue seems to be that the graph does not handle a loop between nodes when it misses.

flowchart LR
1("::Token")
3("rope#3")
  3 -- "a" --> 5
  3 --x 6
4("rope#4")
  4 -- "b" --> 3
  4 --x 3
5("rope#5")
  5 -- "a" --> 5
  5 --x 4
6("fork#6")
  6 -- "b" --> 3
  6 -- "c" --> 1
  6 --x 3
8("Start")
  8 -- "c" --> 3
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants