Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update nextind, prevind #31321

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

eulerkochy
Copy link
Contributor

Fixes #31319

@StefanKarpinski
Copy link
Member

The pull request is much appreciated but I think this one is going to be quite performance-sensitive so it's going to take some careful work to make sure that this doesn't cause anything to regress.

@eulerkochy
Copy link
Contributor Author

I noticed that you had assigned it to yourself, so I close this PR? Or can you guide me?

@StefanKarpinski
Copy link
Member

I'm a bit confused by why your change modifies the bounds checking logic. Why did you change that?

This patch seems like a good and efficient way to implement this change:

diff --git a/base/strings/basic.jl b/base/strings/basic.jl
index d7c9cb279f..56e6623f2b 100644
--- a/base/strings/basic.jl
+++ b/base/strings/basic.jl
@@ -458,7 +458,7 @@ prevind(s::AbstractString, i::Int)                 = prevind(s, i, 1)
 function prevind(s::AbstractString, i::Int, n::Int)
     n < 0 && throw(ArgumentError("n cannot be negative: $n"))
     z = ncodeunits(s) + 1
-    @boundscheck 0 < i ≤ z || throw(BoundsError(s, i))
+    @boundscheck (n != 0) ≤ i ≤ z || throw(BoundsError(s, i))
     n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
     while n > 0 && 1 < i
         @inbounds n -= isvalid(s, i -= 1)
@@ -518,7 +518,7 @@ nextind(s::AbstractString, i::Int)                 = nextind(s, i, 1)
 function nextind(s::AbstractString, i::Int, n::Int)
     n < 0 && throw(ArgumentError("n cannot be negative: $n"))
     z = ncodeunits(s)
-    @boundscheck 0 ≤ i ≤ z || throw(BoundsError(s, i))
+    @boundscheck 0 ≤ i ≤ z + (n == 0) || throw(BoundsError(s, i))
     n == 0 && return thisind(s, i) == i ? i : string_index_err(s, i)
     while n > 0 && i < z
         @inbounds n -= isvalid(s, i += 1)

Features that make it appealing:

  • It has no effect when bounds checking is disabled;
  • It is branch-free, which tends to be more efficient for integer ops like this.

Additional work needed to make the change ready for merging:

  • Make sure Julia still builds and passes tests
  • Test that nextind(s, nextind(s, lastindex(s)), 0) == nextind(s, lastindex(s))
  • Test that prevind(s, prevind(s, firstindex(s)), 0) == prevind(s, firstindex(s))
  • Revert the code change part of allow chop to take an empty string #31312 but not the test part
  • Update the docs for nextind and prevind
  • Add a NEWS entry for this change

If you want to do all of that, that would be wonderful.

@eulerkochy
Copy link
Contributor Author

Reason for tweaking the logic, I wanted to make sure, nextind and prevind returns the answer whenever thisind for the string at that required index exists.

Basically, what you did, is what I planned to do, but I couldn't execute it. 😿

If you want to do all of that, that would be wonderful.

I would be glad to do that

@StefanKarpinski
Copy link
Member

Great! Thanks for taking it on.

@StefanKarpinski
Copy link
Member

Next step here: there are a bunch of test failures, they should be changed to match the new behavior (checking that each one is actually intended; from a quick perusal, they seem to be). While you're trying to get the tests to pass you probably want to run tests locally. You can do that by recompiling Julia and running julia test/runtests.jl strings.

@eulerkochy
Copy link
Contributor Author

Umm, how to resolve that chop("") thing? Should I add a @test_throws for handling that case?

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Mar 14, 2019

No, that shouldn't happen. The new issue is the prevind("", 0, 1) call which this change still doesn't allow. This is why I don't like these fussy bounds checks. If you don't check the bounds, then the simple chop implementation just works. I still like the model I originally implemented where strings behave like there are valid code point indexes forever before and after the beginning and end.

@eulerkochy
Copy link
Contributor Author

chop("") without #31312 is failing the test, which is inevitable (there is no prevind("", 0, 1)). How to resolve that after reverting the code of #31312, cuz apparently the tests for empty string were failing before that too.

@mbauman mbauman added the strings "Strings!" label Mar 14, 2019
@StefanKarpinski
Copy link
Member

It's very tempting to just relax the bounds checks to allow any index between 0 ≤ i ≤ n+1 for all of the index computation functions. The current bounds limits are based on the following reasoning: if you rewind from an index before the beginning of the string, that's likely to be an error; for the empty string, however, the last valid index is before the start of the string. I suppose we could replace the prevind(s, lastindex(s), tail) call with prevind(s, ncodeunits(s)+1, tail)-1 but that seems a bit nasty.

@eulerkochy
Copy link
Contributor Author

imo, intuitively prevind(s, lastindex(s), tail) and prevind(s, ncodeunits(s)+1, tail)-1 are very similar, so why can't we just change the implemenation to the latter, what's the bottleneck?

@eulerkochy
Copy link
Contributor Author

Why are the tests for Sockets failing?

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Mar 17, 2019

It's not a matter of a bottleneck—the compiler can clearly eliminate overhead from a bit of ±1 arithmetic. The point is that this is a case where prevind(s, 0, 1) is both natural and useful, indicating that the previous reasoning about how to bound i is imo too strict. I'll have to think about it a bit more, but I'm currently inclined to unconditionally accept 0 ≤ i ≤ ncodeunits(s)+1 for all string index arithmetic functions.

@eulerkochy
Copy link
Contributor Author

Is the changes good to go for this PR? What should be the NEWS.md update?

@StefanKarpinski
Copy link
Member

No, I need to think about this some more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants