Use of byte index vs. character index #72

knu · 2020-11-02T18:31:28Z

It seems the ts and te values are byte index, not character index even if you feed a multibyte string to the parser. It can be hard to have to convert index values around for one to use this parser because you normally parse a regexp as a multibyte text.

cf. rubocop/rubocop#8989

Is there any plan to optionally provide character index in addition to or instead of byte index? Thanks!

The text was updated successfully, but these errors were encountered:

jaynetics · 2020-11-05T10:52:28Z

@knu thank you for pointing this out. ts and te are indeed not very useful. these byte indexes are provided by Ragel which this gem is based on. Ragel doesn't provide the char index, but we could calculate it in Ruby and make it available. i'll look into that shortly.

Ragel runs with byte-based indices (ts, te). These are of little value to end-users, so I suggest we keep track of char-based indices and emit those instead. c.f. #72

jaynetics · 2020-11-25T12:32:37Z

@knu i've just released v2.0.0 where the indices are now character-based

knu · 2020-11-25T13:30:29Z

@jaynetics That's great news! Thank you so much for your hard work!

jaynetics added a commit that referenced this issue Nov 11, 2020

Provide character- instead of byte-based indices ...

808368b

Ragel runs with byte-based indices (ts, te). These are of little value to end-users, so I suggest we keep track of char-based indices and emit those instead. c.f. #72

jaynetics mentioned this issue Nov 15, 2020

Provide character- instead of byte-based indices ... #73

Merged

jaynetics closed this as completed Nov 25, 2020

jaynetics mentioned this issue Nov 25, 2020

incompatible character encodings when calling .to_s on tree parsed from single regex #74

Closed

knu mentioned this issue Nov 26, 2020

Upgrade regexp_parser to 2.0 rubocop/rubocop#9102

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of byte index vs. character index #72

Use of byte index vs. character index #72

knu commented Nov 2, 2020

jaynetics commented Nov 5, 2020

jaynetics commented Nov 25, 2020

knu commented Nov 25, 2020

Use of byte index vs. character index #72

Use of byte index vs. character index #72

Comments

knu commented Nov 2, 2020

jaynetics commented Nov 5, 2020

jaynetics commented Nov 25, 2020

knu commented Nov 25, 2020