Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow at ~20,000 characters #384

Open
rscarson opened this issue Mar 29, 2024 · 5 comments
Open

Stack overflow at ~20,000 characters #384

rscarson opened this issue Mar 29, 2024 · 5 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@rscarson
Copy link

rscarson commented Mar 29, 2024

I have a StringLiteral token with the following expressions on it:

    #[regex(r#"(?:/(?:\\.|[^\\/])+/[a-zA-Z]*)"#)] // Regex literal
    #[regex(r#"(?:"(?:(?:[^"\\])|(?:\\.))*")"#)] // " string literal "
    #[regex(r#"(?:'(?:(?:[^'\\])|(?:\\.))*')"#)] // ' string literal '

And encountering a string literal above ~20,000 characters stack overflows within logos generated code
Is there an error in one of these expressions?

@maciejhirsz
Copy link
Owner

maciejhirsz commented Mar 31, 2024

Groups are never capturing in logos so you don't need ?:, and you don't need groups around |, so string literal can be rewritten as #[regex(r#""([^"\\]|\\.)*""#)]. If that still overflows try #[regex(r#""([^"\\]+|\\.)*""#)]. I believe it loops internally by doing tail call recursion so the greedily looping all non-escaped non-quote characters inside the group could help (but I have not tested it).

This is one of those cases that really needs the codegen rewrite (#291) into a loop with enum branching working as faux-goto instead of function calls.

@rscarson
Copy link
Author

I had already rewritten it as #[regex(r#""([^"\\]|\\.)*""#)] with no luck but #[regex(r#""([^"\\]+|\\.)*""#)] did the trick!
Thanks!

@rscarson
Copy link
Author

rscarson commented Mar 31, 2024

it still overflows at some point though, at a larger number

Should I leave this issue open for tracking?

@maciejhirsz
Copy link
Owner

Should I leave this issue open for tracking?

Ye, let's do that.

@maciejhirsz maciejhirsz reopened this Mar 31, 2024
@jeertmans jeertmans added bug Something isn't working help wanted Extra attention is needed labels Apr 8, 2024
mikkeldenker added a commit to StractOrg/stract that referenced this issue Jul 22, 2024
this should fix a reported stack overflow (might be related to maciejhirsz/logos#384) and should also make it easier to add additional scripts besides latin in the future
@0x2a-42
Copy link
Contributor

0x2a-42 commented Oct 17, 2024

You can also avoid a stack overflow by compiling with optimizations. Apparently opt-level 1 is enough for tail call optimization to occur.

[profile.dev]
opt-level = 1

There currently seems to be no way to selectively optimize the generated lex function. Maye rust-lang/rust#54882 can be used, if it is stabilized before the logos code generator is rewritten.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants