Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agda-style Lexing #203

Open
Ericson2314 opened this issue Jan 23, 2022 · 3 comments
Open

Agda-style Lexing #203

Ericson2314 opened this issue Jan 23, 2022 · 3 comments

Comments

@Ericson2314
Copy link
Collaborator

Ericson2314 commented Jan 23, 2022

I have been mulling this for a while, but the difficulties in fixing #197 made it feel more urgent.

As a (rare) user of Adga, I have been very fond of it's lexing, which seems very simple, and more concerned with the boundaries between tokens rather than the contents of tokens themselves. (You can seem me singing its praises in, e.g. ghc-proposals/ghc-proposals#444 (comment)).

I have a few questions on this.

  1. Do the people implementing Agda agree with this premise, that lexing in Agda is significantly different and/or simpler than that in other languages? Or am I reading to much into it as a user guessing how it works?

  2. If the premise is valid (per question 1), is there anything Alex might do to make this easier / a more obvious way to do things? I suppose I should study https://github.com/agda/agda/blob/master/src/full/Agda/Syntax/Parser/Lexer.x

  3. Should we transition Alex itself to lex more in this style, basically requiring more things to be space-separated?

CC @andreasabel who conveniently works on both Alex and Agda, and @int-index who spearheaded the similar left right lexing context rules for Haskell.

@andreasabel
Copy link
Member

In Agda, identifiers and operators need to be white-space separated.
The only tokens that need not be white-space separated are (, ), {, }, ; (maybe I am forgetting one).
However, if you are suggesting that Agda is doing some post-processing on tokens to e.g. split 2+3 into 2 + 3, this is not the case, so 2+3 is simply an identifier and has nothing to do with numbers or summation whatsoever.

Frankly, I do not understand what you are intending here, or how Alex should be changed. At its core Alex implements traditional 1960s style lexing (classic "formal languages and automata" stuff).

@Ericson2314
Copy link
Collaborator Author

@andreasabel Well, for example, does the Agda lexer use copious right contexts to to find those whitespace boundaries? The current Alex docs warn that right contexts can make things slow, but I suspect either the warning is overly pessimistic, or the situation can be improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants