You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The lexer currently handles ASCII and non-ASCII characters differently, though Rust accepts UTF-8 source.
The main problem in our lexer is mix use of skip_codepoint_input() (1~4 byte skip) and skip_input(int n) (one byte skip) (also, peek_codepont_input() and peek_input(int n)) , which makes difficult to support Unicode such as identifiers, and whitespaces in our lexer simply.
To deal with this problem, we need
to modify peek_input(int n) and skip_input(int n) to return and skip a UTF-8 character,
Related to #2287
The lexer currently handles ASCII and non-ASCII characters differently, though Rust accepts UTF-8 source.
The main problem in our lexer is mix use of
skip_codepoint_input()
(1~4 byte skip) andskip_input(int n)
(one byte skip) (also,peek_codepont_input()
andpeek_input(int n)
) , which makes difficult to support Unicode such as identifiers, and whitespaces in our lexer simply.To deal with this problem, we need
peek_input(int n)
andskip_input(int n)
to return and skip a UTF-8 character,peek_codepoint_input()
andskip_codepoint_input()
withpeek_input
andskip_input
respectively,Rust::Lexer
#2347get_codepoint_input_length()
andcurrent_char32
field in Lexer,Rust::Lexer
#2347The text was updated successfully, but these errors were encountered: