You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Eta is getting the unicode "General Category" wrong. For example, eta thinks '<' is a control character.
This matters because the lexer in Haskell-src-exts uses 'isSymbol' during lexing, and so it refuses to parse "\x -> x" because eta is miscategorizing '-' and '>'. So many quasi-quotations don't work.
Description
Expected Behavior
isSymbol '<' == True
Actual Behavior
isSymbol '<' == False
Possible Fix
Debugging a little, it looks like in Java has the following behavior:
As the code in https://github.com/typelead/eta/blob/master/libraries/base/GHC/Unicode.hs, suggests, in eta we find these have '-' -> CurrencySymbol and '<' -> Control.
Interestingly enough, both of these are exactly 7 away in the enum from their intended targets, DashPunctuation and MathSymbol.
generalCategory c = toEnum $ fromIntegral $ wgencat $ fromIntegral $ ord c seems to be suspicious. Naively, we could from Java by 7. But what does that mean for the other 7 values? What does 25 really mean coming from Java? I do not know.
Steps to Reproduce
import Data.Char
print $ isSymbol '<'
Context
I cannot use any quasiquoters that touch haskell code. String interpolation, for example, is now painful to use. See https://github.com/haskell-suite/haskell-src-exts/blob/master/src/Language/Haskell/Exts/InternalLexer.hs at isHSymbol for the actual usage.
Your Environment
Code is run with Java 8. The issue in the code is present in master/head. So I'm guessing all version of eta suffer from this.
The text was updated successfully, but these errors were encountered:
The Unicode categorization implementation is untested, so thanks for testing this out!
As you noted, in GHC.Unicode, we call out to the Java methods. One way you could debug this is to make a test case that prints out the category for all the categories and run that code both in Eta and GHC and see the output difference and tweak the implementations accordingly.
Would you be interested in contributing a patch? Would be happy to guide if you get stuck anywhere.
Eta is getting the unicode "General Category" wrong. For example, eta thinks '<' is a control character.
This matters because the lexer in Haskell-src-exts uses 'isSymbol' during lexing, and so it refuses to parse "\x -> x" because eta is miscategorizing '-' and '>'. So many quasi-quotations don't work.
Description
Expected Behavior
isSymbol '<' == True
Actual Behavior
isSymbol '<' == False
Possible Fix
Debugging a little, it looks like in Java has the following behavior:
As the code in
https://github.com/typelead/eta/blob/master/libraries/base/GHC/Unicode.hs
, suggests, in eta we find these have'-' -> CurrencySymbol
and'<' -> Control
.Interestingly enough, both of these are exactly 7 away in the enum from their intended targets, DashPunctuation and MathSymbol.
generalCategory c = toEnum $ fromIntegral $ wgencat $ fromIntegral $ ord c
seems to be suspicious. Naively, we could from Java by 7. But what does that mean for the other 7 values? What does 25 really mean coming from Java? I do not know.Steps to Reproduce
Context
I cannot use any quasiquoters that touch haskell code. String interpolation, for example, is now painful to use. See
https://github.com/haskell-suite/haskell-src-exts/blob/master/src/Language/Haskell/Exts/InternalLexer.hs
at isHSymbol for the actual usage.Your Environment
Code is run with Java 8. The issue in the code is present in master/head. So I'm guessing all version of eta suffer from this.
The text was updated successfully, but these errors were encountered: