Faster parsing by changing source location representation #746

expipiplus1 · 2020-11-01T04:16:47Z

I noticed that the documentation for megaparsec says that getSourcePos "is not cheap, do not call it e.g. on matching of every token, that's a bad idea".

We currently call this twice for every expression. I wonder if it would be possible to change the source position representation in the parser to be a token (character) offset instead, which is very cheap (getOffset). If the user has requested a parse without source positions then this is discarded, else it's transformed into full line and column representation (sufficient laziness could even make this automatic, the location translation isn't even performed until the pretty error message needs to be printed for example).

This would also benefit downstream users who actually want the character offset and who currently have to reconstruct this from the line and column position!

Defining getSourcePos = pure (SourcePos "" (mkPos 0) (mkPos 0)) does speed things up significantly:

The text was updated successfully, but these errors were encountered:

layus · 2020-11-16T13:21:41Z

That's a neat improvement :-). How much work does this represent ?

expipiplus1 · 2020-11-16T15:52:16Z

I suspect that given the very pleasing type of expressions in hnix it shouldn't be too much trouble at all.

Very easy to switch to the character offsets, just changing a handful of lines in the parser.
Writing a function to convert these to line/col representation, would just be a recursive traversal of the expression tree keeping state of where we are in the file.

I think writing tests would probably be the most time-consuming part :D

Not sure if there are any performance penalties lurking where I haven't thought about though.

Anton-Latukha · 2022-01-12T17:49:41Z

In #1026 I also note that the current Pos type produces in Expr.Types 10 orphan instances, notification on which currently suppressed.

This is work towards: haskell-nix#1026 & haskell-nix#746.

This is a type & type boundary lifting groundwork to do #1026 & #746 design in this release. The `NPos` & `NSourcePos` next can be freely shaped into what comes out of the #1026 & #746.

Anton-Latukha · 2022-01-21T21:16:44Z

What do you think about https://hackage.haskell.org/package/monoid-subclasses-1.1.2/docs/Data-Monoid-Instances-Positioned.html.

Anton-Latukha · 2022-01-21T21:36:46Z

Or just use: https://hackage.haskell.org/package/megaparsec-9.0.1/docs/Text-Megaparsec.html#v:PosState

expipiplus1 · 2022-01-22T00:42:22Z

Yeah, the latter might be good, but it's still calculating line/col offsets strictly which might be the slow bit.

…

On Sat, 22 Jan 2022, 5:37 am Anton Latukha, ***@***.***> wrote: Or just use: https://hackage.haskell.org/package/megaparsec-9.0.1/docs/Text-Megaparsec.html#v:PosState — Reply to this email directly, view it on GitHub <#746 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGRJXEG6VYHQTVY4DX24EDUXHGXVANCNFSM4TGI6NAA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

Anton-Latukha mentioned this issue Jan 21, 2021

Increase the performance of parsing and evaluation speed #200

Open

This was referenced Jan 12, 2022

Please, use other/create new {SourcePos, Pos} datatypes #940

Closed

Please, use new {Source,}Pos types #1026

Open

Anton-Latukha added a commit to Anton-Latukha/hnix that referenced this issue Jan 21, 2022

treewide: migrate to use of N{Pos,SourcePos}

936f8fc

This is work towards: haskell-nix#1026 & haskell-nix#746.

Anton-Latukha mentioned this issue Jan 21, 2022

Expr.Types: add & use NPos & NSourcePos #1038

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster parsing by changing source location representation #746

Faster parsing by changing source location representation #746

expipiplus1 commented Nov 1, 2020

layus commented Nov 16, 2020

expipiplus1 commented Nov 16, 2020

Anton-Latukha commented Jan 12, 2022 •

edited

Loading

Anton-Latukha commented Jan 21, 2022

Anton-Latukha commented Jan 21, 2022

expipiplus1 commented Jan 22, 2022 via email

Faster parsing by changing source location representation #746

Faster parsing by changing source location representation #746

Comments

expipiplus1 commented Nov 1, 2020

layus commented Nov 16, 2020

expipiplus1 commented Nov 16, 2020

Anton-Latukha commented Jan 12, 2022 • edited Loading

Anton-Latukha commented Jan 21, 2022

Anton-Latukha commented Jan 21, 2022

expipiplus1 commented Jan 22, 2022 via email

Anton-Latukha commented Jan 12, 2022 •

edited

Loading