Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSA optimized gensyms #9729

Merged
merged 20 commits into from
Jan 30, 2015
Merged

SSA optimized gensyms #9729

merged 20 commits into from
Jan 30, 2015

Conversation

vtjnash
Copy link
Sponsor Member

@vtjnash vtjnash commented Jan 11, 2015

this introduces a new Symbol type as a compiler optimization for places where only a simple unnamed memory location is required. it is for use in places where the jl_varinfo_t is known by construction:

     int closureidx = -1
     bool isAssigned = 1
     bool isCaptured = 0
     bool isSA = 1
     bool isVolatile = 0
     bool isArgument = 0
     bool isGhost = 0/1
     bool hasGCRoot = 0/1
     bool escapes = 0
     bool usedUndef = 0
     bool used = 0/1

the vinfo for the Symbol should correspond to 18

TODO:

  • after expanding macros, run the jlgensym renumbering pass before doing code splicing
  • use (make-jlgensym) instead of (gensym) in julia-syntax generation passes
  • build a boxed value cache for GenSym

@JeffBezanson

@@ -132,10 +136,10 @@

(define (sym-dot? e)
(and (length= e 3) (eq? (car e) '|.|)
(symbol? (cadr e))))
(or (symbol? (cadr e)) (jlgensym? (cadr e)))))
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be factored into a function, e.g. symbol-like?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had a feeling you would say that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

symboly?

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jan 24, 2015

(travis failures appears to be an unrelated issues with homebrew)

it's rare i can make a PR with such a clear across-the-board performance boost:

after:

jameson@julia:~/julia/test/perf$ make kernel
cons                 96.266  188.395  146.604   39.080
randn_zig            37.648   37.902   37.751    0.113
gk                  158.208  404.366  248.603   93.413
sparsemul            64.683   66.159   65.245    0.590
sparsemul2           98.068  160.006  112.229   26.794
sparserange          67.659   97.540   82.707   13.952
matvec               11.285   11.452   11.351    0.066
sortperm             15.733   15.777   15.758    0.016
stockcorr          6188.298 6223.274 6206.063   13.933
bench_eu_vec        145.298  191.906  155.934   20.140
bench_eu_devec       85.300   86.161   85.519    0.363
actorgraph          894.421  962.800  924.511   29.213
laplace_vec        1864.612 1941.088 1909.872   36.892
laplace_devec       162.415  164.409  163.389    0.795
go_benchmark       1704.725 1729.694 1722.933   10.303
simplex              56.799   57.757   57.386    0.353
raytracer          2549.374 2601.338 2569.424   26.118
funarg              173.549  227.696  195.820   28.767
vectorize           254.572  297.089  266.823   17.276
splitline            75.648  117.342  108.291   18.256
json                 32.894   78.704   42.742   20.109
add1                 91.500  127.591  119.742   15.796
devec_add1           13.797   14.003   13.889    0.096
add1_logical         85.611  126.161  116.746   17.494
devec_add1_logical   16.508   16.776   16.598    0.104

fib                   0.097    0.109    0.099    0.005
parse_int             0.359    0.398    0.377    0.015
mandel                0.341    0.342    0.341    0.000
quicksort             0.716    0.804    0.736    0.038
pi_sum               55.044   55.148   55.081    0.051
rand_mat_stat        25.393   64.662   41.392   18.443
rand_mat_mul         50.424   57.288   52.698    2.648
printfd              46.103   46.433   46.228    0.145

jameson@julia:~/julia/usr/lib/julia$ ls -l
total 68492
-rw-r--r-- 1 jameson jameson 17493118 Jan 23 12:22 sys0.ji
-rw-rw-r-- 1 jameson jameson 10592232 Jan 23 12:22 sys0.o
-rwxrwxr-x 1 jameson jameson  6233935 Jan 23 12:22 sys0.so
-rw-r--r-- 1 jameson jameson 17253308 Jan 24 01:09 sys.ji
-rw-rw-r-- 1 jameson jameson 11638456 Jan 24 01:09 sys.o
-rwxrwxr-x 1 jameson jameson  6916030 Jan 24 01:09 sys.so

before (master 4ff8145):

cons                113.650  185.712  162.388   28.246
randn_zig            37.793   38.043   37.881    0.102
gk                  162.818  389.811  256.085   83.340
sparsemul            66.235   72.378   68.198    2.465
sparsemul2          105.976  165.930  120.403   25.555
sparserange          63.644   87.102   76.334   11.494
matvec               10.831   11.518   11.193    0.333
sortperm             19.072   19.652   19.419    0.256
stockcorr           649.165  688.175  672.672   14.498
bench_eu_vec        150.354  182.641  157.916   13.903
bench_eu_devec       87.556   88.203   87.734    0.269
actorgraph          879.091  921.767  896.532   20.941
laplace_vec        1914.931 1938.558 1927.866    9.597
laplace_devec       183.827  192.356  187.885    3.728
go_benchmark       1736.351 1762.885 1746.600   10.829
simplex              58.539   58.821   58.668    0.127
raytracer          2528.574 2590.574 2553.886   25.791
funarg              173.349  228.377  196.373   28.769
vectorize           265.148  300.115  272.828   15.271
splitline            76.232  119.487  109.484   18.624
json                 33.350   73.516   41.494   17.901
add1                 85.856  130.709  120.766   19.532
devec_add1           13.943   14.258   14.098    0.122
add1_logical         92.527  144.523  126.769   19.875
devec_add1_logical   19.457   19.476   19.470    0.007

fib                   0.134    0.134    0.134    0.000
parse_int             0.393    0.601    0.499    0.086
mandel                0.333    0.356    0.338    0.010
quicksort             0.703    0.723    0.712    0.008
pi_sum               55.052   55.188   55.101    0.055
rand_mat_stat        25.068   55.838   37.501   15.788
rand_mat_mul         51.397   53.816   52.445    0.982
printfd              46.717   47.567   46.956    0.354

jameson@julia:~/julia-reference/usr/lib/julia$ ls -l
total 73424
-rw-r--r-- 1 jameson jameson 19802905 Jan 24 01:19 sys0.ji
-rw-rw-r-- 1 jameson jameson 10465920 Jan 24 01:19 sys0.o
-rwxrwxr-x 1 jameson jameson  6304820 Jan 24 01:19 sys0.so
-rw-r--r-- 1 jameson jameson 19854553 Jan 24 01:22 sys.ji
-rw-rw-r-- 1 jameson jameson 11696896 Jan 24 01:22 sys.o
-rwxrwxr-x 1 jameson jameson  7048188 Jan 24 01:22 sys.so

@vtjnash vtjnash changed the title WIP: SSA optimized gensyms SSA optimized gensyms Jan 24, 2015
@IainNZ
Copy link
Member

IainNZ commented Jan 24, 2015

Here are the percentages (new times / old times), using the data in the comments above:

                NAME     MIN     MAX    MEAN
                cons   84.70  101.44   90.28
           randn_zig   99.62   99.63   99.66
                  gk   97.17  103.73   97.08
           sparsemul   97.66   91.41   95.67
          sparsemul2   92.54   96.43   93.21
         sparserange  106.31  111.98  108.35
              matvec  104.19   99.43  101.41
            sortperm   82.49   80.28   81.15
           stockcorr  953.27  904.32  922.60
        bench_eu_vec   96.64  105.07   98.74
      bench_eu_devec   97.42   97.68   97.48
          actorgraph  101.74  104.45  103.12
         laplace_vec   97.37  100.13   99.07
       laplace_devec   88.35   85.47   86.96
        go_benchmark   98.18   98.12   98.64
             simplex   97.03   98.19   97.81
           raytracer  100.82  100.42  100.61
              funarg  100.12   99.70   99.72
           vectorize   96.01   98.99   97.80
           splitline   99.23   98.20   98.91
                json   98.63  107.06  103.01
                add1  106.57   97.61   99.15
          devec_add1   98.95   98.21   98.52
        add1_logical   92.53   87.29   92.09
  devec_add1_logical   84.84   86.14   85.25
                 fib   72.39   81.34   73.88
           parse_int   91.35   66.22   75.55
              mandel  102.40   96.07  100.89
           quicksort  101.85  111.20  103.37
              pi_sum   99.99   99.93   99.96
       rand_mat_stat  101.30  115.80  110.38
        rand_mat_mul   98.11  106.45  100.48
             printfd   98.69   97.62   98.45
median(% for MIN) => 98.17859407458513
median(% for MAX) => 98.99171984072773
median(% for MEAN) => 98.74490235314978

Whats up with stockcorr?

@timholy
Copy link
Sponsor Member

timholy commented Jan 24, 2015

What's the deal with the x.ji (large binary file) that's part of this commit?

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jan 24, 2015

Whats up with stockcorr?

good catch, i had missed that one. turns out it was a fix on base that i hadn't incorporated into my test results

What's the deal with the x.ji (large binary file) that's part of this commit?

hm, how did that sneak back in? I rebased to get rid of it, and added a gitignore entry for it. (it's the same as sys.ji, I just like to make copies of it)

@JeffBezanson
Copy link
Sponsor Member

Wow, is all that improvement just from having fewer symbol objects??

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jan 24, 2015

it's only a few %, but that would seem to be true

@JeffBezanson
Copy link
Sponsor Member

Quite nice to drop 2.5MB of just symbols from the system image. Yikes.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jan 24, 2015

to shrink the image a bit further, we might also try to intern strings more often. for example, we end up with a lot of cases of "unrecognized keyword argument "". and every LineNode seems to include a full copy of the filename string.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jan 24, 2015

CI is green, so I'll go ahead and merge as soon as you are OK with it

@JeffBezanson
Copy link
Sponsor Member

I thought the line nodes used symbols for filenames.

end
end

function GenSym(sv::StaticVarInfo, typ)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's good style to have a constructor mutate something. I'd call this something like newvar!(sv, typ).

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jan 24, 2015

I thought the line nodes used symbols for filenames.

i thought so too, but the text seems to be showing up too many times in strings sys.ji

@@ -1340,6 +1359,7 @@ static jl_arrayvar_t *arrayvar_for(jl_value_t *ex, jl_codectx_t *ctx)
if (aname && ctx->arrayvars->find(aname) != ctx->arrayvars->end()) {
return &(*ctx->arrayvars)[aname];
}
//TODO: gensym case
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still to be done?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i haven't re-implemented this optimization for GenSym variables

@JeffBezanson
Copy link
Sponsor Member

How do you feel about using the GenSym approach for all local variables? (In which case I would name it something like LocalVar.) This would have the following advantages:

  1. Every local has an index, so we can use arrays with fast lookup for all types and variable properties. We can get rid of the current clunky varinfo arrays.
  2. All local names become just a matter of debug info. This would automatically fix the problem we currently have of not reporting renamed variable names correctly in error messages.
  3. Type info during inference becomes just a matrix.
  4. Greater uniformity. Some people have complained, pretty reasonably, that when you see a symbol in an AST it is too hard to tell what it is.

Of course, we lose the advantage that GenSym means SSA. However it would be quite easy/fast to check isSSA(vinfo[x.id]).

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jan 24, 2015

it'll make an expaneded AST much harder to read, although I can see the appeal

@StefanKarpinski
Copy link
Sponsor Member

it'll make an expaneded AST much harder to read, although I can see the appeal

Couldn't that be dealt with by printing the AST in the more human-readable form – i.e. with readable names replaced for gensyms where there is a name to be replaced?

@JeffBezanson
Copy link
Sponsor Member

Yes, it could be dealt with within show.

We know we have to fix the renamed-variables issue eventually anyway, and it seems quite unwieldy to do that by adding a symbol-to-symbol mapping to the AST, as opposed to switching to indexes for everything.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jan 24, 2015

perhaps, i'm more concerned about dealing with them in contexts where I don't have easy access to the vinfo (like jl_, and emit_expr).

fwiw, this change set also renames inlined variables in such a way that the original names now get included (i figured that was a reasonable tradeoff now to make a few more unique gensym objects here in exchange for many fewer gensym objects elsewhere)

1) rename id to idx, since lldb doesn't like the name id
2) prevent back-propogation of Intrinsics.box type information
3) use SAvalue location for unboxed values as well as boxed values
4) mark a few more unboxed values (needed by 3)
5) make Type{()} a bitstype (!!!)
6) abstract the isGhost computation into type_is_ghost()
7) don't try to specsig a non-typeinferred function. emit_var may get confused by the presence of unmarked Symbols with known types (from args)
8) TODO: handle isGhost args correctly when their corresponding local variable is not a ghost
…x emission of variables in dead code (having type Union())
this previously could be an issue if specsig was allowed on non-type-inferred functions.
…syntax.scm handling of (jlgensym) objects, and use of (make-jlgensym) where possible, disable a test that this (temporarily) broke

julia-syntax optimistically assumes many variables will be jlgensym-compatible. it is the responsiblity of branch assignment locations (such as if blocks) to handle this case appropriately, and only emit one assignment to the (jlgensym? dest) variable.

the type-inference information for GenSym variables now may be just the count of the number of GenSym variable slots present in the function
…ement position, and cleanup of s/gensym/gensy/
@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jan 25, 2015

note, if we assume that the test suite is a test of compile time, we see an improvement on every test in my measured subset from this PR:

after:

jameson@julia:~/julia/test$ make core arrayops subarray
    JULIA test/core
     * core                 in  11.26 seconds
    SUCCESS
    JULIA test/arrayops
     * arrayops             in  35.67 seconds
    SUCCESS
    JULIA test/subarray
     * subarray             in  66.37 seconds
    SUCCESS
jameson@julia:~/julia/test$ make linalg
    JULIA test/linalg
    From worker 6:       * linalg/lapack        in  10.99 seconds
    From worker 6:       * linalg/cholmod       in   0.21 seconds
    From worker 8:       * linalg/tridiag       in  13.26 seconds
    From worker 8:       * linalg/givens        in   3.60 seconds
    From worker 6:       * linalg/umfpack       in   6.48 seconds
    From worker 9:       * linalg/pinv          in  18.71 seconds
    From worker 4:       * linalg3              in  30.70 seconds
    From worker 5:       * linalg4              in  33.37 seconds
    From worker 3:       * linalg2              in  82.83 seconds
    From worker 2:       * linalg1              in  84.17 seconds
    From worker 7:       * linalg/triangular    in 112.19 seconds
    SUCCESS

before (master adc1c83):

jameson@julia:~/julia-reference$ cd test/
jameson@julia:~/julia-reference/test$ make core arrayops subarray
    JULIA test/core
     * core                 in  11.46 seconds
    SUCCESS
    JULIA test/arrayops
     * arrayops             in  39.88 seconds
    SUCCESS
    JULIA test/subarray
     * subarray             in  70.74 seconds
    SUCCESS
jameson@julia:~/julia-reference/test$ make linalg
    JULIA test/linalg
    From worker 6:       * linalg/lapack        in  13.08 seconds
    From worker 6:       * linalg/cholmod       in   0.24 seconds
    From worker 8:       * linalg/tridiag       in  14.85 seconds
    From worker 8:       * linalg/givens        in   3.50 seconds
    From worker 9:       * linalg/pinv          in  19.76 seconds
    From worker 6:       * linalg/umfpack       in   6.89 seconds
    From worker 4:       * linalg3              in  34.84 seconds
    From worker 5:       * linalg4              in  40.50 seconds
    From worker 3:       * linalg2              in  85.28 seconds
    From worker 2:       * linalg1              in  93.01 seconds
    From worker 7:       * linalg/triangular    in 129.54 seconds
    SUCCESS

@ViralBShah
Copy link
Member

Looks like the linalg tests do not benefit and even slow down slightly. I always wish they get faster.

@JeffBezanson
Copy link
Sponsor Member

Look again; the "after" numbers are listed first. Looks like almost all the linalg tests are significantly faster. Only a couple are a hair slower; probably not statistically significant.

@JeffBezanson
Copy link
Sponsor Member

Anyway @vtjnash back to type_goto. Fortunately there are tests for all the relevant issues, and they pass, so hopefully we're in good shape. Your algorithm looks at least as good as mine AFAICT.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jan 25, 2015

OK. my algorithm avoids computing tchanged as often, thus potentially using a slightly different termination condition. but otherwise, it isn't too much of a departure so it wasn't hard to make the test pass

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jan 25, 2015

unfortunately, one of the torture tests i was hoping would be greatly improved by this PR (https://gist.github.com/vtjnash/27ff622b2cbed22b51dc) seems to have been not affected much / in the wrong direction:

after:

julia> @time include("../slow_performance.jl");
elapsed time: 0.024892235 seconds (333 kB allocated)
elapsed time: 3.61572336 seconds (7 MB allocated, 1.15% gc time in 1 pauses with 0 full sweep)

julia> @time show(t)
elapsed time: 24.334687017 seconds (2050 MB allocated, 10.09% gc time in 26 pauses with 13 full sweep)
2-element Array{T1Type,1}:



julia> 

before:

julia> @time include("../slow_performance.jl");
elapsed time: 0.023578148 seconds (334 kB allocated)
elapsed time: 3.447290547 seconds (7 MB allocated, 1.16% gc time in 1 pauses with 0 full sweep)

julia> @time show(t)
elapsed time: 50.349472299 seconds (2153 MB allocated, 5.71% gc time in 31 pauses with 13 full sweep)
2-element Array{T1Type,1}:



julia>

@JeffBezanson
Copy link
Sponsor Member

The first time seems to be within 10%. Isn't the second one 2x faster with the new code?

@ViralBShah
Copy link
Member

Yes - I misread the timings. Sorry for the noise.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Jan 25, 2015

The first time seems to be within 10%. Isn't the second one 2x faster with the new code?

oh, i was just looking at parse (actually expand) times, not the type-inference pass. yes, it seems it did help that a bit.

@vtjnash vtjnash merged commit d0fa2db into master Jan 30, 2015
@tkelman tkelman deleted the jn/gensym2 branch April 19, 2015 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants