Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Format parsing in Printf (escaped %) #37807

Closed
wants to merge 6 commits into from
Closed

Fix Format parsing in Printf (escaped %) #37807

wants to merge 6 commits into from

Conversation

petvana
Copy link
Member

@petvana petvana commented Sep 29, 2020

This PR aims to fix #37784 by updating Format(f::AbstractString) function. It moves format detection into a separate inlined function (only refactoring, no changes) for better readability. There are also new tests for the issue. Notice performance has not been properly tested yet.

@petvana
Copy link
Member Author

petvana commented Sep 29, 2020

Sorry, no idea what caused the error on 32-bit during bootstrap on the new Printf._detectFormat! function.

...
Markdown  ���������������������������������  2.073647 seconds
error during bootstrap:
LoadError("sysimg.jl", 19, LoadError("/buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.6/LibGit2/src/LibGit2.jl", 3, LoadError("/buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.6/LibGit2/src/signature.jl", 58, LoadError("/buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.6/LibGit2/src/signature.jl", 62, MethodError(Printf._detectFormat!, ("%+03i:%02i", UInt8[0x25, 0x2b, 0x30, 0x33, 0x69, 0x3a, 0x25, 0x30, 0x32, 0x69], 10, 2, Printf.Spec[]), 0x000042a9)))))
jl_method_error_bare at /buildworker/worker/package_linux32/build/src/gf.c:1738
jl_method_error at /buildworker/worker/package_linux32/build/src/gf.c:1756
jl_lookup_generic_ at /buildworker/worker/package_linux32/build/src/gf.c:2326 [inlined]
...

petvana and others added 2 commits September 29, 2020 19:53
Fix 32-bit version

Co-authored-by: Simeon Schaub <simeondavidschaub99@gmail.com>
@quinnj
Copy link
Member

quinnj commented Sep 29, 2020

Could you walk-through the changes here? It's a bit hard to tell what changed with the refactoring. What was the core issue? What is the fix?

@petvana
Copy link
Member Author

petvana commented Sep 29, 2020

Could you walk-through the changes here? It's a bit hard to tell what changed with the refactoring. What was the core issue? What is the fix?

It's hard to follow mainly because diff at GitHub failed to detect that I have moved ~80% of code unchanged to a separate function _detectFormat! with a different indention. The idea behind the code is as follows. The problem was when "%%]" occurred. The former code considered ] as an unsupported format and threw an error. It would have been possible to fix the code without refactoring into a separate function, but it would lead to messy and hard-to-maintain code.

Now, the proposed code in Format(f::AbstractString) parses the format until the end is reached (and the process is terminated by break) while each unescaped % is processed by _detectFormat! function.

Variables start and pos stands for the current segment that is being parsed, and the variable 'escaped' tells if there is an odd number of % in a row - then the format should be detected.

@StefanKarpinski
Copy link
Member

t's hard to follow mainly because diff at GitHub failed to detect that I have moved ~80% of code unchanged to a separate function _detectFormat! with a different indention.

I would suggest viewing with the "ignore whitespace" box checked: https://github.com/JuliaLang/julia/pull/37807/files?diff=unified&w=1

@petvana
Copy link
Member Author

petvana commented Sep 30, 2020

I have tried to create a micro-benchmark focused mainly on format parsing.

Computational time in nanoseconds measured by `@belapsed`, `NaN` -> failed

PrintfVersionT1T2T3T4T5T6
StringFloat64Float64Float64Float64Float64Float64
1JuliaLang/julia/v1.5.251.2989240.779123.127134.039469.16854.7898
2JuliaLang/julia/master22.6024307.74648.4565343.642580.486NaN
3JuliaLang/julia/jq/3778426.7329418.9548.2624351.687580.322NaN
4petvana/julia/jq/3778423.5562418.62348.2551353.719597.91547.3559
5petvana/julia/Printf-no-unroll23.6084418.21648.3725343.617539.50843.0071

The test instances were as follows

# T1
Printf.@sprintf("short");
# T2
Printf.@sprintf("longlonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglong");
# T3 
Printf.@sprintf(" %15d ", 10^6);
# T4
Printf.@sprintf(" %d %d %d %d %d ", 1, 2, 3, 4, 5);
# T5
Printf.@sprintf(" %f %f %f %f %f ", pi, pi, pi, pi, pi);
# T6
Printf.@sprintf(" %% %% %% %% %% %% %% ");

Surprisingly cases T2, T4, T5 are slower than in v1.5.2 (at least on my CPU). During playing around, I also found that unrolling in output function format(buf::Vector{UInt8}, pos::Integer, f::Format, args...) seems to have a negative influence on the performance, the unrolled version is in branch petvana/julia/Printf-no-unroll (DIFF). @quinnj Do you have any example in which the unrolling is superior?

All the branches have been tested on the same Julia binary:

Julia Version 1.6.0-DEV.1087
Commit 8a9666ae22 (2020-09-29 22:33 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-10.0.1 (ORCJIT, icelake-client)
Full Julia source-code to replicate results
module BenchmarkPrintf

using BenchmarkTools
using DataFrames
using Test

# where the tested codes are stored
dir = "versions"

branches = [
    "JuliaLang/julia/v1.5.2",
    "JuliaLang/julia/master",
    "JuliaLang/julia/jq/37784",
    "petvana/julia/jq/37784",
    "petvana/julia/Printf-no-unroll",
]

#rm(dir, recursive=true, force=true)

data = DataFrame()
data.PrintfVersion = branches
data.T1 = NaN
data.T2 = NaN
data.T3 = NaN
data.T4 = NaN
data.T5 = NaN
data.T6 = NaN

for (idx, name) in enumerate(branches)
    @info "Benchmarking $(name)"
    actdir = dir * "/" * replace(name, "/" => "-")
    source_file = actdir * "/Printf.jl"
    mkpath(actdir)

    if name == "local"
        # cp(local_file, source_file, force=true)
    else
        url = "https://raw.githubusercontent.com/$(name)/stdlib/Printf/src/Printf.jl"
        if !isfile(source_file)
            @info "Downloading from $(url)"
            download(url, source_file)
            run(`sed -i 's/using Base.Grisu/using Grisu/g' $(source_file)`)
        end
    end

    include(source_file)
    using .Printf

    instance = :T1
    try
        t = @belapsed  Printf.@sprintf("short");
        data[idx, instance] = t * 10^9
    catch
    end

    instance = :T2
    try
        t = @belapsed Printf.@sprintf("longlonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglong");
        data[idx, instance] = t * 10^9
    catch
    end

    instance = :T3
    try
        t = @belapsed  Printf.@sprintf(" %15d ", 10^6);
        data[idx, instance] = t * 10^9
    catch
    end

    instance = :T4
    try
        t = @belapsed  Printf.@sprintf(" %d %d %d %d %d ", 1, 2, 3, 4, 5);
        data[idx, instance] = t * 10^9
    catch
    end

    instance = :T5
    try
        t = @belapsed  Printf.@sprintf(" %f %f %f %f %f ", pi, pi, pi, pi, pi);
        data[idx, instance] = t * 10^9
    catch
    end   

    instance = :T6
    try
        t = @belapsed  Printf.@sprintf(" %% %% %% %% %% %% %% ");
        data[idx, instance] = t * 10^9
    catch
    end 

    println(data)
end

println("")
show(stdout, MIME("text/html"), data)
println("")
println("")

end

Petr Vana and others added 2 commits October 4, 2020 14:11
Remove unnecessary unroll, and add comments to the code.
@petvana
Copy link
Member Author

petvana commented Oct 4, 2020

To take it more seriously, I have prepared a benchmark closer to real usage. It prints 1 or 10^7 lines into \dev\null containing random float/decimal values. The results indicate that the proposed PR provides at least the same performance as the master branch. I have also removed unrolling because it seems not to give any performance gain, and it is much slower to print the first line (up to 3.4s for F4 that is 10-times slower). I considered to open a separate PR for removing the unroll, but I didn't want to split the discussion into two threads. Source codes to reproduce the results are in petvana/BenchmarkPrintf.

@quinnj What do you think about the PR and results?

Utilized formats:

# F1
" %f \n",
# F2
" %f %f \n"
# F3
" %f %f %f %f \n",
# F4
" %f %f %f %f %f %f %f %f \n"
# F5
" Influence %f [%%], data %f \n"
# F6
" Influence %f [%%], data %f, %f, %f, %f, %f, %f, %f, \n"
# F7
" Very very very very very very very very very very very very very very very very very very very very long text %f \n"

# D1
" %d \n",
# D2
" %d %d \n"
# D3
" %d %d %d %d \n",
# D4
" %d %d %d %d %d %d %d %d \n"
# D5
" Influence %d [%%], data %d \n"
# D6
" Influence %d [%%], data %d, %d, %d, %d, %d, %d, %d, \n"
# D7
" Very very very very very very very very very very very very very very very very very very very very long text %d \n"

Print 1 line with float value(s) [s]

PrintfVersionF1F2F3F4F5F6F7
StringFloat64Float64Float64Float64Float64Float64Float64
1JuliaLang/julia/v1.5.20.8982960.08254180.08018780.1402420.04766420.1374460.0301876
2JuliaLang/julia/master0.3665070.5310891.34613.415910.01730580.02334050.0144985
3JuliaLang/julia/jq/377840.3113740.5335661.261683.119480.01746010.02014810.0154846
4petvana/julia/jq/377840.3083010.3177260.3180670.334220.01639310.02019420.014353

Print 10^7 lines with float value(s) [s]

PrintfVersionF1F2F3F4F5F6F7
StringFloat64Float64Float64Float64Float64Float64Float64
1JuliaLang/julia/v1.5.25.7442710.920719.530633.287910.805433.83524.74343
2JuliaLang/julia/master3.77297.6896717.529625.84377.2916923.35814.76526
3JuliaLang/julia/jq/377843.754477.6619317.308225.83877.4666123.09175.39984
4petvana/julia/jq/377844.097087.5058816.759822.01867.7820522.01535.62088

Print 1 line with integer value(s) [s]

PrintfVersionD1D2D3D4D5D6D7
StringFloat64Float64Float64Float64Float64Float64Float64
1JuliaLang/julia/v1.5.20.04411590.03481190.05083560.08187820.03451890.07962830.0235023
2JuliaLang/julia/master0.1312360.1087990.1954270.3651460.01775980.02132970.0159883
3JuliaLang/julia/jq/377840.1469310.1237610.2233650.3995270.01938410.02323060.0163871
4petvana/julia/jq/377840.1470110.09216460.111060.1198310.01813440.02198180.0167515

Print 10^7 lines with integer value(s) [s]

PrintfVersionD1D2D3D4D5D6D7
StringFloat64Float64Float64Float64Float64Float64Float64
1JuliaLang/julia/v1.5.26.885916.058328.953454.310915.550353.40287.74338
2JuliaLang/julia/master2.554794.6026810.906417.26395.2293517.14263.69654
3JuliaLang/julia/jq/377842.61544.8760510.982717.5555.3112117.384.3217
4petvana/julia/jq/377843.043814.926711.033216.02635.0153116.57884.00052

@quinnj
Copy link
Member

quinnj commented Oct 6, 2020

Going to try and dig into this now; sorry for the delay. Those benchmarks are.......a little weird since they seem like very obscure/corner case uses of printf. Most of the benchmarking I've done is just regular floats/ints/strings and sometimes mixed. The unrolling code should be much faster in the mixed type args case. The only benchmark you posted that really concerns me is T4 where printing 5 integers seems to be slower than 1.5.2.

@quinnj
Copy link
Member

quinnj commented Oct 6, 2020

Ok, I pushed another simple commit that fixes the rest of the escaping test cases (and adds those tests) (d2f65db).

I do appreciate the performance benchmarking you've done across all these cases @petvana; I think that's a bit outside of the scope of the specific fix for the original issue (the fix doesn't itself introduce any performance issues). So let's merge the fixes in my branch and maybe open a new issue for the problematic performance issues you've found.

@petvana
Copy link
Member Author

petvana commented Oct 6, 2020

Thank you for the fix @quinnj. I have gone through the code, and it seems to solve all possible cases.

I have also tested performance with mixed types, and I can confirm the unrolling code is much faster in such a scenario. It seems the compiler can optimize (unroll) the code automatically if all the types are the same.

Now, the only performance drop (10-13%) of your branch compare to master is for very long substring ranges (F7, D7), but I guess they are sporadic. An alternative would be to pre-process the format string and remove escaped '%' symbols during Format parsing (compile time), but it would be necessary to store the modified format as a copy.

The reason for the performance drop in T4 (compare to 1.5.2) is still unknown to me, but it can be somehow related to the fact that the inputs are constant.

I will close this PR for now, as the bug is fixed.

@petvana petvana closed this Oct 6, 2020
@quinnj
Copy link
Member

quinnj commented Oct 6, 2020

Thanks for the response @petvana; I looked a little into the performance of T4 last night and I think it has to do with our creating a StringVector on each call of @sprintf (profiling shows most of the time is spent allocating the StringVector, then resize!ing at the end). The benchmarks are much better when comparing the Printf.format(buf, pos, fmt, args...) method, and reusing the same buf each time. That's good because, for example, in CSV.write, we would manage our own buf and use this method to avoid the extra allocations. So I'm not sure for the single @sprintf case how big of a deal it really is. The old grisu/printf code I think had a bunch of global buffers that it reused (which led to at least a couple of bugs that I remember), so there's also a tradeoff there: we could reuse global buffers and probably get some better performance, but it's a non-trivial amount of work and could introduce subtle bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants