Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of byte index vs. character index #72

Closed
knu opened this issue Nov 2, 2020 · 3 comments
Closed

Use of byte index vs. character index #72

knu opened this issue Nov 2, 2020 · 3 comments

Comments

@knu
Copy link

knu commented Nov 2, 2020

It seems the ts and te values are byte index, not character index even if you feed a multibyte string to the parser. It can be hard to have to convert index values around for one to use this parser because you normally parse a regexp as a multibyte text.

cf. rubocop/rubocop#8989

Is there any plan to optionally provide character index in addition to or instead of byte index? Thanks!

@jaynetics
Copy link
Collaborator

@knu thank you for pointing this out. ts and te are indeed not very useful. these byte indexes are provided by Ragel which this gem is based on. Ragel doesn't provide the char index, but we could calculate it in Ruby and make it available. i'll look into that shortly.

jaynetics added a commit that referenced this issue Nov 11, 2020
Ragel runs with byte-based indices (ts, te). These are of little value to end-users, so I suggest we keep track of char-based indices and emit those instead.

c.f. #72
@jaynetics
Copy link
Collaborator

@knu i've just released v2.0.0 where the indices are now character-based

@knu
Copy link
Author

knu commented Nov 25, 2020

@jaynetics That's great news! Thank you so much for your hard work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants