Use byte indices in Rust core #21

bminixhofer · 2021-02-06T12:32:00Z

It would be more idiomatic to Rust to use byte indices everywhere internally (and everywhere in the public Rust API) and only convert to char indices at the boundary to Python.

drahnr · 2021-02-11T14:45:52Z

I'd be in favour of providing both index bases, if they are already available internally. This avoids re-parsing bytes to characters on the user side.

bminixhofer · 2021-02-11T16:06:06Z

Do you actually use character indices or byte indices in cargo-spellcheck? Would you have to convert from byte indices to char indices if the Suggestion .start and .end indices were byte indices?

if they are already available internally.

For internal nlprule computation char indices are never needed (I think) so converting from char to byte as early as possible (i.e. when building the binaries) is possible. Using byte indices everywhere in Rust and only converting from byte to char at the boundary to Python made sense to me.

But you're right, it's worth thinking about providing both in the public API. I agree that that would make it nicer for a user. But for computation in nlprule I would like to be consistent in whether byte or char indices are used and ideally use bytes.

drahnr · 2021-02-11T17:09:14Z

Do you actually use character indices or byte indices in cargo-spellcheck? Would you have to convert from byte indices to char indices if the Suggestion .start and .end indices were byte indices?

Yes, it's converted early on into character indices, and soon™ will be grapheme aware as well, but that can be a layer on-top of characters. It simplifies iterations significantly and is also required to properly align to spans provided by syn and ra iirc.

For internal nlprule computation char indices are never needed (I think) so converting from char to byte as early as possible (i.e. when building the binaries) is possible. Using byte indices everywhere in Rust and only converting from byte to char at the boundary to Python made sense to me.

I am just saying that having character based APIs is a nice feature, since it won't break with simple emojis which are multibyte characters.

But you're right, it's worth thinking about providing both in the public API. I agree that that would make it nicer for a user. But for computation in nlprule I would like to be consistent in whether byte or char indices are used and ideally use bytes.

👍 makes sense

bminixhofer added enhancement New feature or request P3 Low Priority labels Feb 6, 2021

bminixhofer self-assigned this Feb 6, 2021

bminixhofer mentioned this issue Feb 26, 2021

Quality of the core #44

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use byte indices in Rust core #21

Use byte indices in Rust core #21

bminixhofer commented Feb 6, 2021

drahnr commented Feb 11, 2021 •

edited

Loading

bminixhofer commented Feb 11, 2021

drahnr commented Feb 11, 2021 •

edited

Loading

Use byte indices in Rust core #21

Use byte indices in Rust core #21

Comments

bminixhofer commented Feb 6, 2021

drahnr commented Feb 11, 2021 • edited Loading

bminixhofer commented Feb 11, 2021

drahnr commented Feb 11, 2021 • edited Loading

drahnr commented Feb 11, 2021 •

edited

Loading

drahnr commented Feb 11, 2021 •

edited

Loading