Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Widen type signature of bytes2hex #39710

Merged
merged 6 commits into from
Apr 8, 2021
Merged

Conversation

Seelengrab
Copy link
Contributor

Fixes #39284

This only implements the change required for iterators with HasLength().

base/strings/util.jl Outdated Show resolved Hide resolved
@Seelengrab
Copy link
Contributor Author

The failures seem to be due to me messing up the copy & paste 🤦‍♂️

I've just noticed that hex2bytes has the same limitation on its input arguments, should this be relaxed as well?

@Seelengrab
Copy link
Contributor Author

Seelengrab commented Feb 21, 2021

Ok, I've pushed a reimplementation of hex2bytes (which seems to be ever so slightly faster on my machine).

julia> for i in [100, 1_000, 10_000, 100_000, 1_000_000, 10_000_000]
         old[i] = @benchmarkable hex2bytes(str) setup=(str=randstring("0123456789abcdefABCDEF", $i)) evals=1
         new[i] = @benchmarkable h2b(str) setup=(str=randstring("0123456789abcdefABCDEF", $i)) evals=1
       end

julia> new_res, old_res = run(new), run(old); for f in (minimum, mean, median, maximum)
        println("\n$f:"); display(judge(f(new_res), f(old_res)))
       end

minimum:
6-element BenchmarkTools.BenchmarkGroup:
  tags: []
  10000 => TrialJudgement(-0.30% => invariant)
  100000 => TrialJudgement(-0.99% => invariant)
  1000000 => TrialJudgement(-0.63% => invariant)
  10000000 => TrialJudgement(-0.39% => invariant)
  1000 => TrialJudgement(+0.00% => invariant)
  100 => TrialJudgement(+0.00% => invariant)

mean:
6-element BenchmarkTools.BenchmarkGroup:
  tags: []
  10000 => TrialJudgement(-0.60% => invariant)
  100000 => TrialJudgement(-1.60% => invariant)
  1000000 => TrialJudgement(-0.48% => invariant)
  10000000 => TrialJudgement(-1.08% => invariant)
  1000 => TrialJudgement(-1.04% => invariant)
  100 => TrialJudgement(+0.18% => invariant)

median:
6-element BenchmarkTools.BenchmarkGroup:
  tags: []
  10000 => TrialJudgement(-0.14% => invariant)
  100000 => TrialJudgement(-2.22% => invariant)
  1000000 => TrialJudgement(-0.19% => invariant)
  10000000 => TrialJudgement(-0.17% => invariant)
  1000 => TrialJudgement(+0.00% => invariant)
  100 => TrialJudgement(+0.00% => invariant)

maximum:
6-element BenchmarkTools.BenchmarkGroup:
  tags: []
  10000 => TrialJudgement(+3.41% => invariant)
  100000 => TrialJudgement(+113.45% => regression)
  1000000 => TrialJudgement(+5.30% => regression)
  10000000 => TrialJudgement(-3.38% => invariant)
  1000 => TrialJudgement(+82.57% => regression)
  100 => TrialJudgement(+16.59% => regression)

The maximum varies wildly between runs and sometimes even shows improvement, so I think that's just artifacts. The rest of the measurements are stable though.

@Seelengrab
Copy link
Contributor Author

Seelengrab commented Feb 21, 2021

I tried getting rid of the branches in number_from_hex by linearising, but it was always slower! If anyone cares to give a try, here's my attempt:

@inline nfh(c::Char) = nfh(UInt8(c))
@inline function nfh(c::UInt8)
    ret = (c >= UInt8('0')) & (c <= UInt8('9'))
    c |= 0b0100000
    ret |= (c > UInt8('`')) & (c <= UInt8('f'))
    !ret && throw(ArgumentError("byte is not an ASCII hexadecimal digit"))
    return c - (c & 0xf1) - (c & 0x10 ) * 0x0a
end

I suppose we could create a lookup table for this - in any case, branch predictors are a truly marvellous thing.

@Seelengrab
Copy link
Contributor Author

Unsure why tester_linuxaarch64 failed, some PipelineError? Seems unrelated to this PR 🤷‍♂️

base/strings/util.jl Outdated Show resolved Hide resolved
base/strings/util.jl Outdated Show resolved Hide resolved
base/strings/util.jl Outdated Show resolved Hide resolved
@Seelengrab
Copy link
Contributor Author

It's the little gains in functions like this that multiply if called often enough. Plus I just enjoy figuring out what is fast and why! This function had been optimized a little in the past (Ref. #23267), which is why I want to make sure we don't leave any performance on the table here :)

Copy link
Contributor

@kimikage kimikage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#39710 (comment) is all I have to say about my concerns. I think this is a good improvement in terms of both generality and performance. 👍

Edit: It might be helpful to have !!! compat notes in the docstrings.

@Seelengrab
Copy link
Contributor Author

Alright, thank you for the continued feedback! :)

I'll add the !!! compat notice and the other remaining things during the weekend.

@Seelengrab
Copy link
Contributor Author

Sorry, it's been a while! I think I've got everything now.

Not 100% sure about the decision to remove hex2bytes(s::AbstractString) = hex2bytes(String(s)) though, since ascii a little further down the file does the same thing 🤔

@Seelengrab
Copy link
Contributor Author

If I read the CI log correctly, strings/util (where the hex2bytes/bytes2hex tests are) passed and it seems to fail in SuiteSparse, so I think the failure is unrelated?

@Seelengrab
Copy link
Contributor Author

Can this be merged?

@Seelengrab
Copy link
Contributor Author

bump?

@DilumAluthge DilumAluthge requested review from JeffBezanson and removed request for JeffBezanson March 18, 2021 13:36
@Seelengrab
Copy link
Contributor Author

Force pushed for rebase & retriggering CI.

@Seelengrab
Copy link
Contributor Author

Seelengrab commented Apr 2, 2021

Failure is in Distributed, string tests passed, so failure is unrelated to this PR.

This has happened twice during the life of this PR now, is master not stable?

Regardless, I've added a NEWS.md entry and from my POV this PR is otherwise done, so should be mergeable.

@StefanKarpinski
Copy link
Member

Rerunning CI to see if this can be merged. Please ping if it passes and doesn't get merged.

@Seelengrab
Copy link
Contributor Author

Seems to have failed in SuiteSparse. strings/util (where the tests for this PR are located) passed though, so I think this is unrelated.

@vtjnash vtjnash merged commit 6117388 into JuliaLang:master Apr 8, 2021
ElOceanografo pushed a commit to ElOceanografo/julia that referenced this pull request May 4, 2021
and make it slightly faster!

Also improves error message on hex2bytes! when passing a non-ASCII string,
And ADD compat notice, fix implementation to be more generic in regards to AbstractString.
antoine-levitt pushed a commit to antoine-levitt/julia that referenced this pull request May 9, 2021
and make it slightly faster!

Also improves error message on hex2bytes! when passing a non-ASCII string,
And ADD compat notice, fix implementation to be more generic in regards to AbstractString.
johanmon pushed a commit to johanmon/julia that referenced this pull request Jul 5, 2021
and make it slightly faster!

Also improves error message on hex2bytes! when passing a non-ASCII string,
And ADD compat notice, fix implementation to be more generic in regards to AbstractString.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bytes2hex(::Union{NTuple{Any, UInt8}, AbstractArray{UInt8}}) is needlessly restrictive
5 participants