Fix Format parsing in Printf (escaped %) #37807

petvana · 2020-09-29T15:56:45Z

This PR aims to fix #37784 by updating Format(f::AbstractString) function. It moves format detection into a separate inlined function (only refactoring, no changes) for better readability. There are also new tests for the issue. Notice performance has not been properly tested yet.

petvana · 2020-09-29T17:42:52Z

Sorry, no idea what caused the error on 32-bit during bootstrap on the new Printf._detectFormat! function.

...
Markdown  ���������������������������������  2.073647 seconds
error during bootstrap:
LoadError("sysimg.jl", 19, LoadError("/buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.6/LibGit2/src/LibGit2.jl", 3, LoadError("/buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.6/LibGit2/src/signature.jl", 58, LoadError("/buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.6/LibGit2/src/signature.jl", 62, MethodError(Printf._detectFormat!, ("%+03i:%02i", UInt8[0x25, 0x2b, 0x30, 0x33, 0x69, 0x3a, 0x25, 0x30, 0x32, 0x69], 10, 2, Printf.Spec[]), 0x000042a9)))))
jl_method_error_bare at /buildworker/worker/package_linux32/build/src/gf.c:1738
jl_method_error at /buildworker/worker/package_linux32/build/src/gf.c:1756
jl_lookup_generic_ at /buildworker/worker/package_linux32/build/src/gf.c:2326 [inlined]
...

stdlib/Printf/src/Printf.jl

Fix 32-bit version Co-authored-by: Simeon Schaub <simeondavidschaub99@gmail.com>

quinnj · 2020-09-29T21:08:51Z

Could you walk-through the changes here? It's a bit hard to tell what changed with the refactoring. What was the core issue? What is the fix?

petvana · 2020-09-29T21:44:22Z

Could you walk-through the changes here? It's a bit hard to tell what changed with the refactoring. What was the core issue? What is the fix?

It's hard to follow mainly because diff at GitHub failed to detect that I have moved ~80% of code unchanged to a separate function _detectFormat! with a different indention. The idea behind the code is as follows. The problem was when "%%]" occurred. The former code considered ] as an unsupported format and threw an error. It would have been possible to fix the code without refactoring into a separate function, but it would lead to messy and hard-to-maintain code.

Now, the proposed code in Format(f::AbstractString) parses the format until the end is reached (and the process is terminated by break) while each unescaped % is processed by _detectFormat! function.

Variables start and pos stands for the current segment that is being parsed, and the variable 'escaped' tells if there is an odd number of % in a row - then the format should be detected.

StefanKarpinski · 2020-09-30T15:05:30Z

t's hard to follow mainly because diff at GitHub failed to detect that I have moved ~80% of code unchanged to a separate function _detectFormat! with a different indention.

I would suggest viewing with the "ignore whitespace" box checked: https://github.com/JuliaLang/julia/pull/37807/files?diff=unified&w=1

petvana · 2020-09-30T18:29:40Z

I have tried to create a micro-benchmark focused mainly on format parsing.

Computational time in nanoseconds measured by `@belapsed`, `NaN` -> failed

	PrintfVersion	T1	T2	T3	T4	T5	T6
	String	Float64	Float64	Float64	Float64	Float64	Float64
1	JuliaLang/julia/v1.5.2	51.2989	240.779	123.127	134.039	469.168	54.7898
2	JuliaLang/julia/master	22.6024	307.746	48.4565	343.642	580.486	NaN
3	JuliaLang/julia/jq/37784	26.7329	418.95	48.2624	351.687	580.322	NaN
4	petvana/julia/jq/37784	23.5562	418.623	48.2551	353.719	597.915	47.3559
5	petvana/julia/Printf-no-unroll	23.6084	418.216	48.3725	343.617	539.508	43.0071

The test instances were as follows

# T1
Printf.@sprintf("short");
# T2
Printf.@sprintf("longlonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglong");
# T3 
Printf.@sprintf(" %15d ", 10^6);
# T4
Printf.@sprintf(" %d %d %d %d %d ", 1, 2, 3, 4, 5);
# T5
Printf.@sprintf(" %f %f %f %f %f ", pi, pi, pi, pi, pi);
# T6
Printf.@sprintf(" %% %% %% %% %% %% %% ");

Surprisingly cases T2, T4, T5 are slower than in v1.5.2 (at least on my CPU). During playing around, I also found that unrolling in output function format(buf::Vector{UInt8}, pos::Integer, f::Format, args...) seems to have a negative influence on the performance, the unrolled version is in branch petvana/julia/Printf-no-unroll (DIFF). @quinnj Do you have any example in which the unrolling is superior?

All the branches have been tested on the same Julia binary:

Julia Version 1.6.0-DEV.1087
Commit 8a9666ae22 (2020-09-29 22:33 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-10.0.1 (ORCJIT, icelake-client)

Full Julia source-code to replicate results

module BenchmarkPrintf

using BenchmarkTools
using DataFrames
using Test

# where the tested codes are stored
dir = "versions"

branches = [
    "JuliaLang/julia/v1.5.2",
    "JuliaLang/julia/master",
    "JuliaLang/julia/jq/37784",
    "petvana/julia/jq/37784",
    "petvana/julia/Printf-no-unroll",
]

#rm(dir, recursive=true, force=true)

data = DataFrame()
data.PrintfVersion = branches
data.T1 = NaN
data.T2 = NaN
data.T3 = NaN
data.T4 = NaN
data.T5 = NaN
data.T6 = NaN

for (idx, name) in enumerate(branches)
    @info "Benchmarking $(name)"
    actdir = dir * "/" * replace(name, "/" => "-")
    source_file = actdir * "/Printf.jl"
    mkpath(actdir)

    if name == "local"
        # cp(local_file, source_file, force=true)
    else
        url = "https://raw.githubusercontent.com/$(name)/stdlib/Printf/src/Printf.jl"
        if !isfile(source_file)
            @info "Downloading from $(url)"
            download(url, source_file)
            run(`sed -i 's/using Base.Grisu/using Grisu/g' $(source_file)`)
        end
    end

    include(source_file)
    using .Printf

    instance = :T1
    try
        t = @belapsed  Printf.@sprintf("short");
        data[idx, instance] = t * 10^9
    catch
    end

    instance = :T2
    try
        t = @belapsed Printf.@sprintf("longlonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglong");
        data[idx, instance] = t * 10^9
    catch
    end

    instance = :T3
    try
        t = @belapsed  Printf.@sprintf(" %15d ", 10^6);
        data[idx, instance] = t * 10^9
    catch
    end

    instance = :T4
    try
        t = @belapsed  Printf.@sprintf(" %d %d %d %d %d ", 1, 2, 3, 4, 5);
        data[idx, instance] = t * 10^9
    catch
    end

    instance = :T5
    try
        t = @belapsed  Printf.@sprintf(" %f %f %f %f %f ", pi, pi, pi, pi, pi);
        data[idx, instance] = t * 10^9
    catch
    end   

    instance = :T6
    try
        t = @belapsed  Printf.@sprintf(" %% %% %% %% %% %% %% ");
        data[idx, instance] = t * 10^9
    catch
    end 

    println(data)
end

println("")
show(stdout, MIME("text/html"), data)
println("")
println("")

end

Remove unnecessary unroll, and add comments to the code.

petvana · 2020-10-04T14:00:37Z

To take it more seriously, I have prepared a benchmark closer to real usage. It prints 1 or 10^7 lines into \dev\null containing random float/decimal values. The results indicate that the proposed PR provides at least the same performance as the master branch. I have also removed unrolling because it seems not to give any performance gain, and it is much slower to print the first line (up to 3.4s for F4 that is 10-times slower). I considered to open a separate PR for removing the unroll, but I didn't want to split the discussion into two threads. Source codes to reproduce the results are in petvana/BenchmarkPrintf.

@quinnj What do you think about the PR and results?

Utilized formats:

# F1
" %f \n",
# F2
" %f %f \n"
# F3
" %f %f %f %f \n",
# F4
" %f %f %f %f %f %f %f %f \n"
# F5
" Influence %f [%%], data %f \n"
# F6
" Influence %f [%%], data %f, %f, %f, %f, %f, %f, %f, \n"
# F7
" Very very very very very very very very very very very very very very very very very very very very long text %f \n"

# D1
" %d \n",
# D2
" %d %d \n"
# D3
" %d %d %d %d \n",
# D4
" %d %d %d %d %d %d %d %d \n"
# D5
" Influence %d [%%], data %d \n"
# D6
" Influence %d [%%], data %d, %d, %d, %d, %d, %d, %d, \n"
# D7
" Very very very very very very very very very very very very very very very very very very very very long text %d \n"

Print 1 line with float value(s) [s]

	PrintfVersion	F1	F2	F3	F4	F5	F6	F7
	String	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	JuliaLang/julia/v1.5.2	0.898296	0.0825418	0.0801878	0.140242	0.0476642	0.137446	0.0301876
2	JuliaLang/julia/master	0.366507	0.531089	1.3461	3.41591	0.0173058	0.0233405	0.0144985
3	JuliaLang/julia/jq/37784	0.311374	0.533566	1.26168	3.11948	0.0174601	0.0201481	0.0154846
4	petvana/julia/jq/37784	0.308301	0.317726	0.318067	0.33422	0.0163931	0.0201942	0.014353

Print 10^7 lines with float value(s) [s]

	PrintfVersion	F1	F2	F3	F4	F5	F6	F7
	String	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	JuliaLang/julia/v1.5.2	5.74427	10.9207	19.5306	33.2879	10.8054	33.8352	4.74343
2	JuliaLang/julia/master	3.7729	7.68967	17.5296	25.8437	7.29169	23.3581	4.76526
3	JuliaLang/julia/jq/37784	3.75447	7.66193	17.3082	25.8387	7.46661	23.0917	5.39984
4	petvana/julia/jq/37784	4.09708	7.50588	16.7598	22.0186	7.78205	22.0153	5.62088

Print 1 line with integer value(s) [s]

	PrintfVersion	D1	D2	D3	D4	D5	D6	D7
	String	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	JuliaLang/julia/v1.5.2	0.0441159	0.0348119	0.0508356	0.0818782	0.0345189	0.0796283	0.0235023
2	JuliaLang/julia/master	0.131236	0.108799	0.195427	0.365146	0.0177598	0.0213297	0.0159883
3	JuliaLang/julia/jq/37784	0.146931	0.123761	0.223365	0.399527	0.0193841	0.0232306	0.0163871
4	petvana/julia/jq/37784	0.147011	0.0921646	0.11106	0.119831	0.0181344	0.0219818	0.0167515

Print 10^7 lines with integer value(s) [s]

	PrintfVersion	D1	D2	D3	D4	D5	D6	D7
	String	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	JuliaLang/julia/v1.5.2	6.8859	16.0583	28.9534	54.3109	15.5503	53.4028	7.74338
2	JuliaLang/julia/master	2.55479	4.60268	10.9064	17.2639	5.22935	17.1426	3.69654
3	JuliaLang/julia/jq/37784	2.6154	4.87605	10.9827	17.555	5.31121	17.38	4.3217
4	petvana/julia/jq/37784	3.04381	4.9267	11.0332	16.0263	5.01531	16.5788	4.00052

quinnj · 2020-10-06T05:04:39Z

Going to try and dig into this now; sorry for the delay. Those benchmarks are.......a little weird since they seem like very obscure/corner case uses of printf. Most of the benchmarking I've done is just regular floats/ints/strings and sometimes mixed. The unrolling code should be much faster in the mixed type args case. The only benchmark you posted that really concerns me is T4 where printing 5 integers seems to be slower than 1.5.2.

quinnj · 2020-10-06T06:12:57Z

Ok, I pushed another simple commit that fixes the rest of the escaping test cases (and adds those tests) (d2f65db).

I do appreciate the performance benchmarking you've done across all these cases @petvana; I think that's a bit outside of the scope of the specific fix for the original issue (the fix doesn't itself introduce any performance issues). So let's merge the fixes in my branch and maybe open a new issue for the problematic performance issues you've found.

petvana · 2020-10-06T12:05:13Z

Thank you for the fix @quinnj. I have gone through the code, and it seems to solve all possible cases.

I have also tested performance with mixed types, and I can confirm the unrolling code is much faster in such a scenario. It seems the compiler can optimize (unroll) the code automatically if all the types are the same.

Now, the only performance drop (10-13%) of your branch compare to master is for very long substring ranges (F7, D7), but I guess they are sporadic. An alternative would be to pre-process the format string and remove escaped '%' symbols during Format parsing (compile time), but it would be necessary to store the modified format as a copy.

The reason for the performance drop in T4 (compare to 1.5.2) is still unknown to me, but it can be somehow related to the fact that the inputs are constant.

I will close this PR for now, as the bug is fixed.

quinnj · 2020-10-06T14:30:20Z

Thanks for the response @petvana; I looked a little into the performance of T4 last night and I think it has to do with our creating a StringVector on each call of @sprintf (profiling shows most of the time is spent allocating the StringVector, then resize!ing at the end). The benchmarks are much better when comparing the Printf.format(buf, pos, fmt, args...) method, and reusing the same buf each time. That's good because, for example, in CSV.write, we would manage our own buf and use this method to avoid the extra allocations. So I'm not sure for the single @sprintf case how big of a deal it really is. The old grisu/printf code I think had a bunch of global buffers that it reused (which led to at least a couple of bugs that I remember), so there's also a tradeoff there: we could reuse global buffers and probably get some better performance, but it's a non-trivial amount of work and could introduce subtle bugs.

Fix Format parsing in Printf (escaped %)

e974493

simeonschaub reviewed Sep 29, 2020

View reviewed changes

stdlib/Printf/src/Printf.jl Outdated Show resolved Hide resolved

petvana and others added 2 commits September 29, 2020 19:53

Update stdlib/Printf/src/Printf.jl

e8af3c7

Fix 32-bit version Co-authored-by: Simeon Schaub <simeondavidschaub99@gmail.com>

Fix 32-bit version (2nd attempt)

76ad781

Remove unroll

ed2baeb

Petr Vana and others added 2 commits October 4, 2020 14:11

Add comments + remove unroll

25b0317

Merge pull request #1 from petvana/Printf-no-unroll

04da7d0

Remove unnecessary unroll, and add comments to the code.

petvana closed this Oct 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Format parsing in Printf (escaped %) #37807

Fix Format parsing in Printf (escaped %) #37807

petvana commented Sep 29, 2020

petvana commented Sep 29, 2020

quinnj commented Sep 29, 2020

petvana commented Sep 29, 2020

StefanKarpinski commented Sep 30, 2020

petvana commented Sep 30, 2020

petvana commented Oct 4, 2020

quinnj commented Oct 6, 2020

quinnj commented Oct 6, 2020

petvana commented Oct 6, 2020

quinnj commented Oct 6, 2020

Fix Format parsing in Printf (escaped %) #37807

Fix Format parsing in Printf (escaped %) #37807

Conversation

petvana commented Sep 29, 2020

petvana commented Sep 29, 2020

quinnj commented Sep 29, 2020

petvana commented Sep 29, 2020

StefanKarpinski commented Sep 30, 2020

petvana commented Sep 30, 2020

petvana commented Oct 4, 2020

quinnj commented Oct 6, 2020

quinnj commented Oct 6, 2020

petvana commented Oct 6, 2020

quinnj commented Oct 6, 2020