add faster done for strings #18259

KristofferC · 2016-08-27T07:42:36Z

Uses @TotalVerb's suggestion in #18081 (comment)

nalimilan · 2016-08-27T07:55:16Z

Candidate for backport too?

KristofferC · 2016-08-27T08:53:52Z

there are not many string benchmarks but anyway... @nanosoldier runbenchmarks("string" && "problem", vs = ":master")

nanosoldier · 2016-08-27T10:42:01Z

Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @jrevels

KristofferC · 2016-08-27T10:43:26Z

Oh.. Used the wrong invocation..

KristofferC · 2016-08-27T10:43:53Z

@nanosoldier runbenchmarks("string" || "problem", vs = ":master")

nanosoldier · 2016-08-27T12:33:43Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

KristofferC · 2016-08-28T09:26:38Z

I cannot reproduce the join regression and it has been noisy on many other benchmark runs. I can reproduce the performance gain in for example lstrip as in #18081.

tkelman · 2016-08-29T07:56:21Z

Does this do the right thing for unicode? Could really use more tests on the correctness here in corner cases.

KristofferC · 2016-08-29T08:06:21Z

https://github.com/JuliaLang/julia/blob/master/test/strings/basic.jl#L255?

tkelman · 2016-08-29T09:14:28Z

Won't endof(s) and endof(s.data) behave differently if the string ends in continuation characters?

julia/base/strings/string.jl

Lines 36 to 43 in 28f1306

    
           function endof(s::String) 
        
               d = s.data 
        
               i = length(d) 
        
               @inbounds while i > 0 && is_valid_continuation(d[i]) 
        
                   i -= 1 
        
               end 
        
               i 
        
           end

KristofferC · 2016-08-29T09:43:12Z

ok

julia> for x in String([0xcf, 0x83, 0x83, 0x83, 0x83])
             print(x)
             end
σ
------------------------------------------------------------------------------------------
UnicodeError                                            Stacktrace (most recent call last)
[#3] — anonymous
       ⌙ at <missing>:?

[#2] — next;
       ⌙ at string.jl:92 [inlined]

[#1] — slow_utf8_next(::Array{UInt8,1}, ::UInt8, ::Int64)
       ⌙ at string.jl:67

UnicodeError: invalid character index

KristofferC · 2016-08-29T10:18:23Z

Should have a fix.

KristofferC · 2016-08-29T10:27:34Z

I thought that an iterator that carried along the endof value would work but it seems that there are many places in base that assumes that start(::String) is an int. So let's revert + add the test that failed.

This reverts commit 30ee10b.

nalimilan · 2016-08-29T12:50:28Z

Looks like (part of) the issue is that in lstrip, endof is called on each iteration even if done is inlined. OTC, with a simple function, it's called only once (which means the compiler effectively does the hoisting you proposed to do manually):

function f(s::String)
    for c in s
        c == 'a' && return true
    end
    return false
end

We should investigate the reason for the differences. Maybe the compiler cannot guarantee that s isn't modified in complex functions? Could @pure help?

* Revert "add faster done for strings (#18259)" This reverts commit 30ee10b. * add test that failed with #18259

TotalVerb · 2016-08-29T17:42:46Z

Do we really need this to work on invalid UTF8? The added test has an overlong encoding, which is not permitted.

TotalVerb · 2016-08-29T18:01:03Z

Passing in additional information with next would probably be detrimental to performance. It's almost certainly better to go with this approach. The only requirement is that next return an index past the end of the underlying data, which the only sane implementations I can think of (including the current implementation) indeed do. I think we should make this guarantee (perhaps undocumented) and test it.

* Revert "add faster done for strings (JuliaLang#18259)" This reverts commit 30ee10b. * add test that failed with JuliaLang#18259

TotalVerb · 2016-09-29T02:07:00Z

@nalimilan It is not possible to force hoisting here because endof is not actually inlined, so LLVM can't do anything. The function is actually quite complex, involving vector dereference in a loop. We could try to force @inline it, but I don't know if that is profitable. In the interim, we should just merge #18280.

nalimilan · 2016-09-29T08:27:18Z

I'm not opposed to merging #18280, but I still think we should investigate that hoisting issue fpr other string types. I didn't say that endof was inlined, but that done was (or sometimes should), which allows the compiler to notice that endof(s) doesn't need to be called in each loop since it's constant (s is immutable).

TotalVerb · 2016-09-29T16:01:47Z

The problem is not related to done, which is inlined without issue; if endof is not inlined, then LLVM only sees call i64 @jlsys_endof_63780(%jl_value_t* %0), which it cannot hoist out of the loop because the effects are unpredictable. The only hope for loop hoisting would be for endof to be inlined first.

As far as I can tell, there is no loop hoisting in the example you gave,

function f(s::String)
    for c in s
        c == 'a' && return true
    end
    return false
end

as the LLVM code's loop condition contains a call to endof:

L2:                                               ; preds = %L22, %top
  %"#temp#.0" = phi i64 [ 1, %top ], [ %"#temp#1.sroa.4.0", %L22 ]
  %12 = call i64 @jlsys_endof_63780(%jl_value_t* %0)
  %13 = icmp slt i64 %12, %"#temp#.0"
  br i1 %13, label %L33, label %if

nalimilan · 2016-09-29T16:22:48Z

Woops, looks like I misread the generated code. I don't remember how, though.

The only hope for loop hoisting would be for endof to be inlined first.

Yet shouldn't @pure have this effect? Or maybe it doesn't work because the contents of String are actually mutable?

TotalVerb · 2016-09-29T16:46:30Z

As far as I know, LLVM does not get any information assuring that the function call is pure. I don't know if it supports callsite purity annotations, but even if does, it may be non-trivial to add them.

vtjnash · 2016-09-29T16:48:56Z

Yes, @pure could have this effect. LLVM has many callsite purity annotations and it's easy to add them (the most non-trivial part is probably picking the right ones :P). PRs are welcome.

nalimilan · 2016-09-29T18:37:27Z

Easier said than done (for me at least :-).

* Revert "Revert #182599 "add faster done for strings" (#18275)" This reverts commit 6d179b3. * Test that next(s, endof(s)) > endof(s.data)

* Revert "Revert #182599 "add faster done for strings" (#18275)" This reverts commit 6d179b3. * Test that next(s, endof(s)) > endof(s.data) (cherry picked from commit 62c3bfa)

add faster done for strings

28f1306

KristofferC added performance Must go faster backport pending 0.5 labels Aug 27, 2016

tkelman added the potential benchmark Could make a good benchmark in BaseBenchmarks label Aug 28, 2016

KristofferC merged commit 30ee10b into master Aug 29, 2016

KristofferC deleted the kc/string_done branch August 29, 2016 06:20

KristofferC removed the backport pending 0.5 label Aug 29, 2016

KristofferC added a commit that referenced this pull request Aug 29, 2016

Revert "add faster done for strings (#18259)"

0c8d70b

This reverts commit 30ee10b.

KristofferC added a commit that referenced this pull request Aug 29, 2016

add test that failed with #18259

3dc1493

ViralBShah pushed a commit that referenced this pull request Aug 29, 2016

Revert #182599 "add faster done for strings" (#18275)

6d179b3

* Revert "add faster done for strings (#18259)" This reverts commit 30ee10b. * add test that failed with #18259

nalimilan mentioned this pull request Aug 29, 2016

Unrevert #18259 #18280

Merged

mfasi pushed a commit to mfasi/julia that referenced this pull request Sep 5, 2016

add faster done for strings (JuliaLang#18259)

4710fed

mfasi pushed a commit to mfasi/julia that referenced this pull request Sep 5, 2016

Revert #182599 "add faster done for strings" (JuliaLang#18275)

382de0b

* Revert "add faster done for strings (JuliaLang#18259)" This reverts commit 30ee10b. * add test that failed with JuliaLang#18259

stevengj pushed a commit that referenced this pull request Dec 8, 2016

Unrevert #18259 (#18280)

62c3bfa

* Revert "Revert #182599 "add faster done for strings" (#18275)" This reverts commit 6d179b3. * Test that next(s, endof(s)) > endof(s.data)

tkelman pushed a commit that referenced this pull request Mar 1, 2017

Unrevert #18259 (#18280)

eac77de

* Revert "Revert #182599 "add faster done for strings" (#18275)" This reverts commit 6d179b3. * Test that next(s, endof(s)) > endof(s.data) (cherry picked from commit 62c3bfa)

KristofferC removed the potential benchmark Could make a good benchmark in BaseBenchmarks label Oct 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add faster done for strings #18259

add faster done for strings #18259

KristofferC commented Aug 27, 2016

nalimilan commented Aug 27, 2016

KristofferC commented Aug 27, 2016

nanosoldier commented Aug 27, 2016

KristofferC commented Aug 27, 2016 •

edited

Loading

KristofferC commented Aug 27, 2016

nanosoldier commented Aug 27, 2016

KristofferC commented Aug 28, 2016

tkelman commented Aug 29, 2016

KristofferC commented Aug 29, 2016

tkelman commented Aug 29, 2016

KristofferC commented Aug 29, 2016

KristofferC commented Aug 29, 2016

KristofferC commented Aug 29, 2016

nalimilan commented Aug 29, 2016

TotalVerb commented Aug 29, 2016 •

edited

Loading

TotalVerb commented Aug 29, 2016

TotalVerb commented Sep 29, 2016

nalimilan commented Sep 29, 2016

TotalVerb commented Sep 29, 2016 •

edited

Loading

nalimilan commented Sep 29, 2016

TotalVerb commented Sep 29, 2016

vtjnash commented Sep 29, 2016

nalimilan commented Sep 29, 2016

add faster done for strings #18259

add faster done for strings #18259

Conversation

KristofferC commented Aug 27, 2016

nalimilan commented Aug 27, 2016

KristofferC commented Aug 27, 2016

nanosoldier commented Aug 27, 2016

KristofferC commented Aug 27, 2016 • edited Loading

KristofferC commented Aug 27, 2016

nanosoldier commented Aug 27, 2016

KristofferC commented Aug 28, 2016

tkelman commented Aug 29, 2016

KristofferC commented Aug 29, 2016

tkelman commented Aug 29, 2016

KristofferC commented Aug 29, 2016

KristofferC commented Aug 29, 2016

KristofferC commented Aug 29, 2016

nalimilan commented Aug 29, 2016

TotalVerb commented Aug 29, 2016 • edited Loading

TotalVerb commented Aug 29, 2016

TotalVerb commented Sep 29, 2016

nalimilan commented Sep 29, 2016

TotalVerb commented Sep 29, 2016 • edited Loading

nalimilan commented Sep 29, 2016

TotalVerb commented Sep 29, 2016

vtjnash commented Sep 29, 2016

nalimilan commented Sep 29, 2016

KristofferC commented Aug 27, 2016 •

edited

Loading

TotalVerb commented Aug 29, 2016 •

edited

Loading

TotalVerb commented Sep 29, 2016 •

edited

Loading