Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Add _count_utf8_continuation_bytes() #3529

Conversation

martinvuyk
Copy link
Contributor

Add _count_utf8_continuation_bytes()

@martinvuyk martinvuyk requested a review from a team as a code owner September 23, 2024 01:20
@soraros
Copy link
Contributor

soraros commented Sep 23, 2024

Can we have it take a byte Span?

@martinvuyk martinvuyk marked this pull request as draft September 23, 2024 02:45
@martinvuyk martinvuyk marked this pull request as ready for review September 24, 2024 13:40
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@martinvuyk martinvuyk force-pushed the add-count-utf8-continuation-bytes branch from 640916a to a3ddffb Compare September 24, 2024 13:52
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
martinvuyk and others added 5 commits September 25, 2024 16:13
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Comment on lines -1338 to +1351
var utf8_sequence_lengths = List(5, 12, 9, 5, 7, 6, 5, 5, 2, 3, 12)
var items_amount_characters = List(5, 12, 9, 5, 7, 6, 5, 5, 2, 3, 12)
for item_idx in range(len(items)):
var item = items[item_idx]
var utf8_sequence_len = 0
var ptr = item.unsafe_ptr()
var amnt_characters = 0
var byte_idx = 0
for v in item:
var byte_len = v.byte_length()
assert_equal(item[byte_idx : byte_idx + byte_len], v)
for i in range(byte_len):
assert_equal(ptr[byte_idx + i], v.unsafe_ptr()[i])
byte_idx += byte_len
utf8_sequence_len += 1
assert_equal(utf8_sequence_len, utf8_sequence_lengths[item_idx])
amnt_characters += 1

assert_equal(amnt_characters, items_amount_characters[item_idx])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated to the main topic of this PR, I just realized this is another place where indexing was assuming byte offset so I fixed it and also renamed some variables with unclear names.

@JoeLoser JoeLoser self-assigned this Oct 13, 2024
martinvuyk and others added 8 commits October 13, 2024 19:47
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@JoeLoser
Copy link
Collaborator

!sync

@modularbot modularbot added the imported-internally Signals that a given pull request has been imported internally. label Oct 14, 2024
@modularbot
Copy link
Collaborator

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

@modularbot modularbot added the merged-internally Indicates that this pull request has been merged internally label Oct 14, 2024
@modularbot
Copy link
Collaborator

Landed in 495fa9f! Thank you for your contribution 🎉

@modularbot modularbot added the merged-externally Merged externally in public mojo repo label Oct 16, 2024
modularbot added a commit that referenced this pull request Oct 16, 2024
[External] [stdlib] Add `_count_utf8_continuation_bytes()`

Add `_count_utf8_continuation_bytes()`

ORIGINAL_AUTHOR=martinvuyk
<110240700+martinvuyk@users.noreply.github.com>
PUBLIC_PR_LINK=#3529

Co-authored-by: martinvuyk <110240700+martinvuyk@users.noreply.github.com>
Closes #3529
MODULAR_ORIG_COMMIT_REV_ID: 994f648ac650ccd29096946d29b290e855bce057
@modularbot modularbot closed this Oct 16, 2024
@martinvuyk martinvuyk deleted the add-count-utf8-continuation-bytes branch October 16, 2024 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported-internally Signals that a given pull request has been imported internally. merged-externally Merged externally in public mojo repo merged-internally Indicates that this pull request has been merged internally
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants