Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Made substring kernel on utf8 take chars into account. #568

Merged
merged 3 commits into from
Nov 4, 2021

Conversation

ritchie46
Copy link
Collaborator

The substring kernel, sliced [u8] data, not taking account utf8 char boundaries. This PR fixes that.

@codecov
Copy link

codecov bot commented Nov 3, 2021

Codecov Report

Merging #568 (4851f34) into main (ed8836f) will decrease coverage by 0.01%.
The diff coverage is 80.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #568      +/-   ##
==========================================
- Coverage   78.90%   78.88%   -0.02%     
==========================================
  Files         395      395              
  Lines       24542    24680     +138     
==========================================
+ Hits        19364    19470     +106     
- Misses       5178     5210      +32     
Impacted Files Coverage Δ
src/compute/substring.rs 91.66% <80.00%> (-5.60%) ⬇️
tests/it/compute/substring.rs 89.35% <80.00%> (-0.23%) ⬇️
src/io/ipc/write/serialize.rs 64.08% <0.00%> (-16.30%) ⬇️
src/io/ipc/compression.rs 84.21% <0.00%> (-15.79%) ⬇️
src/compute/cast/dictionary_to.rs 25.00% <0.00%> (-2.03%) ⬇️
src/io/ipc/read/read_basic.rs 73.95% <0.00%> (-1.05%) ⬇️
tests/it/io/ipc/write/file.rs 100.00% <0.00%> (ø)
src/io/ipc/read/array/boolean.rs 50.00% <0.00%> (ø)
src/io/ipc/read/array/dictionary.rs 50.00% <0.00%> (ø)
tests/it/compute/cast.rs 99.35% <0.00%> (+0.05%) ⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed8836f...4851f34. Read the comment docs.

@jorgecarleitao
Copy link
Owner

Thanks! PRed a proposal to your PR to remove unsafe :P

@ritchie46
Copy link
Collaborator Author

Thanks! PRed a proposal to your PR to remove unsafe :P

Way better! I've left some questions.

Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>
@jorgecarleitao
Copy link
Owner

CI is unrelated and being fixed by apache/arrow#11609.

@jorgecarleitao jorgecarleitao merged commit fab0bc1 into jorgecarleitao:main Nov 4, 2021
@jorgecarleitao jorgecarleitao added the bug Something isn't working label Nov 4, 2021
@jorgecarleitao jorgecarleitao changed the title make substring kernel work on utf8 data Made substring kernel on utf8 take chars into account. Nov 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants