[stdlib] Move `StringRef` `find()` implementation to `Span` #3548

martinvuyk · 2024-09-25T03:09:53Z

Move StringRef find() implementation to Span

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

JoeLoser · 2024-10-02T15:17:03Z

stdlib/src/utils/span.mojo

@@ -335,3 +338,194 @@ struct Span[
        return Span[T, _lit_mut_cast[lifetime, False].result](
            unsafe_ptr=self._data, len=self._len
        )
+
+    fn find[


Question It seems a bit weird from an API design perspective to have this "find" function in span, can you help me understand why we'd want this? It feels more coupled to string algorithms, and Span isn't just for a view over string data.

It can be used if you want to find occurrences of a specific scalar value sequence. My main thought was that we'll lose the ability to do that once StringSlice.find() is fixed to use unicode codepoints and it introduces quite some overhead for algorithms that are faster using raw bytes. We could also make this private WDYT?

Also, once PR's #3577 DType.get_dtype() gets merged, List.index() can also delegate to this

@JoeLoser gentle ping and also to add something, since it seems like we are moving toward unifying Python's bytes into Span[UInt8]in PR #3636, request #3634 and many other places. Python's bytes.find is actually a method, and many other things that we put in StringSlice and StringRef are actually operations from bytes.

I think this is a great change for overall performance when manipulating slices. It opens a lot of possibilities.

…-impl-to-span

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

…-impl-to-span

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

…-impl-to-span

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

…-impl-to-span

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

martinvuyk · 2024-10-29T21:20:16Z

@JoeLoser CI is failing because of some bugs I'll solve later.

I refactored normalize_index and used it here. I also refactored memrmem and memrchr to be vectorized like their counterparts. These are algorithms that apply not only to string data, you can use them to find anything. I had tried to make a similar algorithm for Array in my repo but failed to make it as efficient as these, kudos to whoever implemented them. Can we please have this? They will be used a lot. I've also found that .replace() and .count() will use this function besides .split(), .splitlines(), List.index(), and others.

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

…-impl-to-span

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

…-impl-to-span

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

…-impl-to-span

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

martinvuyk · 2024-11-14T22:44:04Z

@JoeLoser would it be possible to land this before the next stable release? I'd like to use this to be able to bypass string's .find() for algorithms which work on raw bytes. And using that bypass, I'd like to make the breaking change to switch over to full unicode:

String.__len__() should return unicode length and not byte length
String.find() should work on unicode offsets
String.__getitem__() should work on unicode offsets

I would like to make the change as soon as possible because it will break a lot of projects, and many more are coming because the community is growing.

martinvuyk added 3 commits September 24, 2024 23:08

move StrRef find() implementation to Span

45f9bec

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix details

070ac79

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix details

9950729

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

martinvuyk marked this pull request as ready for review September 25, 2024 03:21

martinvuyk requested a review from a team as a code owner September 25, 2024 03:21

martinvuyk changed the title ~~[stdlib] Move StrRef find() implementation to Span~~ [stdlib] Move StringRef find() implementation to Span Sep 25, 2024

martinvuyk and others added 7 commits September 24, 2024 23:23

fix details

5a1cf5d

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix details

c0dac09

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix details

de6e116

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix details

d03ded0

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

Merge branch 'nightly' into move-strref-find-impl-to-span

a4c2a57

Merge branch 'nightly' into move-strref-find-impl-to-span

77a333f

Merge branch 'nightly' into move-strref-find-impl-to-span

f9a2eed

JoeLoser reviewed Oct 2, 2024

View reviewed changes

martinvuyk added 4 commits October 3, 2024 19:38

Merge remote-tracking branch 'upstream/nightly' into move-strref-find…

5e8fb25

…-impl-to-span

fix stringref find and add fixme for later

6ca281e

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix detail

6535e8b

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix detail

f6f9144

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

martinvuyk mentioned this pull request Oct 4, 2024

[Feature Request] Add unsafe_ptr and unsafe_cstr_ptr to StringRef and migrate the stdlib to use them #3601

Open

1 task

martinvuyk and others added 11 commits October 6, 2024 16:48

Merge branch 'nightly' into move-strref-find-impl-to-span

964f4d5

Merge branch 'nightly' into move-strref-find-impl-to-span

828490b

Merge remote-tracking branch 'upstream/nightly' into move-strref-find…

015e5d4

…-impl-to-span

fix detail

d84530e

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix detail

2f37b51

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix detail

f1cb5d1

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

Merge remote-tracking branch 'upstream/nightly' into move-strref-find…

702cb00

…-impl-to-span

remove fixme comment

5a105a2

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

Merge branch 'nightly' into move-strref-find-impl-to-span

e4b495e

Merge remote-tracking branch 'upstream/nightly' into move-strref-find…

650d63c

…-impl-to-span

fix detail

f9a7329

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

martinvuyk requested a review from JoeLoser October 29, 2024 21:12

martinvuyk added 11 commits October 29, 2024 19:13

fix details

a57d446

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix details

5f69209

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix details

feee988

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix details

ec78800

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix details

1d27970

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

Merge remote-tracking branch 'upstream/nightly' into move-strref-find…

de85d7e

…-impl-to-span

fix bugs in memrchr and memrmem

d311761

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

Merge remote-tracking branch 'upstream/nightly' into move-strref-find…

c5be607

…-impl-to-span

fix details

3f89314

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

update changelog

09d92b7

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

Merge remote-tracking branch 'upstream/nightly' into move-strref-find…

646bb90

…-impl-to-span

martinvuyk mentioned this pull request Nov 1, 2024

[stdlib] Fix String.split() implementations #3528

Draft

martinvuyk and others added 15 commits November 4, 2024 09:07

Merge remote-tracking branch 'upstream/nightly' into move-strref-find…

162332a

…-impl-to-span

update to use pack_bits

22857e2

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

add overloads

f9e8fda

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix detail

8cd3b56

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix detail

a619944

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

use var again

7213788

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

use var again

cfe5a5f

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix detail

c3c0cd3

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix detail

8b88b38

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix detail

3fd3529

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

fix detail

e323392

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

Merge branch 'nightly' into move-strref-find-impl-to-span

dca7728

Merge branch 'nightly' into move-strref-find-impl-to-span

8c29f12

fix unsafe ptr constructor

aa791cd

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>

Merge branch 'nightly' into move-strref-find-impl-to-span

de191aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stdlib] Move `StringRef` `find()` implementation to `Span` #3548

[stdlib] Move `StringRef` `find()` implementation to `Span` #3548

martinvuyk commented Sep 25, 2024 •

edited

Loading

JoeLoser Oct 2, 2024

martinvuyk Oct 2, 2024 •

edited

Loading

martinvuyk Oct 10, 2024

msaelices Oct 12, 2024

martinvuyk commented Oct 29, 2024 •

edited

Loading

martinvuyk commented Nov 14, 2024

[stdlib] Move StringRef find() implementation to Span #3548

Are you sure you want to change the base?

[stdlib] Move StringRef find() implementation to Span #3548

Conversation

martinvuyk commented Sep 25, 2024 • edited Loading

JoeLoser Oct 2, 2024

Choose a reason for hiding this comment

martinvuyk Oct 2, 2024 • edited Loading

Choose a reason for hiding this comment

martinvuyk Oct 10, 2024

Choose a reason for hiding this comment

msaelices Oct 12, 2024

Choose a reason for hiding this comment

martinvuyk commented Oct 29, 2024 • edited Loading

martinvuyk commented Nov 14, 2024

[stdlib] Move `StringRef` `find()` implementation to `Span` #3548

[stdlib] Move `StringRef` `find()` implementation to `Span` #3548

martinvuyk commented Sep 25, 2024 •

edited

Loading

martinvuyk Oct 2, 2024 •

edited

Loading

martinvuyk commented Oct 29, 2024 •

edited

Loading