Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM compilation #548

Merged
merged 39 commits into from
Sep 23, 2016
Merged

LLVM compilation #548

merged 39 commits into from
Sep 23, 2016

Conversation

sn6uv
Copy link
Member

@sn6uv sn6uv commented Sep 14, 2016

Allows compiling simple expressions with llvmlite.

At the moment nothing else is aware of CompiledFunction so the performance benefits are marginal (most of the time is spent looking for patterns) but this has the potential to speedup plotting significantly.

In[1]:= cf = Compile[{{x, _Real}}, Sin[x + 2]]
Out[1]= CompiledFunction[{x}, Sin[2 + x], -CompiledCode-]

In[2]:= data = RandomReal[1, 100];
In[3]:= Map[cf, data]; // Timing
Out[3]= {0.084535, Null}

In[4]:= Map[Sin[#1 + 2]&, data]; // Timing
Out[4]= {0.120563, Null}

Requires a minor bugfix to llvmlite: numba/llvmlite#205.

@ghost
Copy link

ghost commented Sep 14, 2016

Just please make it optional (so you can use mathics without llvmlite - because of pypy - it probably will only slowdown)

@sn6uv
Copy link
Member Author

sn6uv commented Sep 14, 2016

@TiberiumPy, yes the dependency on LLVM is optional. I think even PyPy can be optimised with this (although cffi would be better than ctypes for this).

The example above is still faster on PyPy (after warming up the JIT):

In[46]:= data = RandomReal[1, 10000];
In[47]:= Map[f, data]; // Timing
Out[47]= {1.31894, Null}

In[48]:= Map[cf, data]; // Timing
Out[48]= {0.940861, Null}

@sn6uv sn6uv force-pushed the LLVMCompile branch 2 times, most recently from 62c000f to 86a12da Compare September 14, 2016 13:14
@sn6uv
Copy link
Member Author

sn6uv commented Sep 14, 2016

I've got some experimental plotting working (pushed to my CompilePlot branch). The speed-ups are significant:

Compile

(venv_pypy) angus@thinkpad> python mathics/benchmark.py -s Plot                                                                    ~/Mathics
Plot
  'Plot[0, {x, -3, 3}]'
      100 loops, avg: 11.8 ms, best: 5.45 ms, median: 8.62 ms per loop
  'Plot[x^2 + x + 1, {x, -3, 3}]'
     1000 loops, avg: 5.44 ms, best: 3.12 ms, median: 4.39 ms per loop
  'Plot[Sin[Cos[x^2]], {x, -3, 3}]'
     1000 loops, avg: 6.07 ms, best:    5 ms, median: 5.33 ms per loop
  'Plot[Sin[100 x], {x, -3, 3}]'
     1000 loops, avg: 5.18 ms, best: 4.43 ms, median:  4.6 ms per loop

(venv_pypy) angus@thinkpad> python mathics/benchmark.py -s Plot3D                                                                  ~/Mathics
Plot3D
  'Plot3D[0, {x, -1, 1}, {y, -1, 1}]'
      100 loops, avg: 18.4 ms, best: 9.09 ms, median: 13.6 ms per loop
  'Plot3D[x + y^2, {x, -3, 3}, {y, -2, 2}]'
      100 loops, avg: 20.5 ms, best: 13.4 ms, median: 16.5 ms per loop
  'Plot3D[Sin[x + y^2], {x, -3, 3}, {y, -3, 3}]'
      100 loops, avg: 46.4 ms, best: 36.1 ms, median:   39 ms per loop
  'Plot3D[Sin[100 x + 100 y ^ 2], {x, 0, 1}, {y, 0, 1}]'
      100 loops, avg: 38.9 ms, best: 31.2 ms, median: 33.1 ms per loop

Current

(venv_pypy) angus@thinkpad> python mathics/benchmark.py -s Plot                                                                    ~/Mathics
Plot
  'Plot[0, {x, -3, 3}]'
       10 loops, avg:  120 ms, best: 87.2 ms, median:  116 ms per loop
  'Plot[x^2 + x + 1, {x, -3, 3}]'
        5 loops, avg:  395 ms, best:  218 ms, median:  313 ms per loop
  'Plot[Sin[Cos[x^2]], {x, -3, 3}]'
        5 loops, avg:  499 ms, best:  328 ms, median:  445 ms per loop
  'Plot[Sin[100 x], {x, -3, 3}]'
        5 loops, avg:  245 ms, best:  164 ms, median:  216 ms per loop

(venv_pypy) angus@thinkpad> python mathics/benchmark.py -s Plot3D                                                                  ~/Mathics
Plot3D
  'Plot3D[0, {x, -1, 1}, {y, -1, 1}]'
       10 loops, avg:  141 ms, best: 68.2 ms, median:  138 ms per loop
  'Plot3D[x + y^2, {x, -3, 3}, {y, -2, 2}]'
        5 loops, avg:  277 ms, best:  162 ms, median:  258 ms per loop
  'Plot3D[Sin[x + y^2], {x, -3, 3}, {y, -3, 3}]'
        5 loops, avg:  366 ms, best:  272 ms, median:  339 ms per loop
  'Plot3D[Sin[100 x + 100 y ^ 2], {x, 0, 1}, {y, 0, 1}]'
        5 loops, avg:  378 ms, best:  323 ms, median:  351 ms per loop

@teaalltr
Copy link
Contributor

What about adding packed array support? It would be very useful throughout Mathics. It could be implemented using either numpy's array or directly using SIMD llvm compiling. It should of course be completely transparent to the user and as transparent as possible to functions. Maybe automatic conversion could be done during idle time between evaluations.

teaalltr referenced this pull request in sn6uv/Mathics Sep 14, 2016
@sn6uv sn6uv force-pushed the LLVMCompile branch 6 times, most recently from 16c21be to fb79ede Compare September 15, 2016 08:04
@sn6uv
Copy link
Member Author

sn6uv commented Sep 15, 2016

Packed arrays would be a useful addition from a performance perspective. As you mentioned the important (and difficult) thing is to make it transparent to the user. I'm not sure llvm is a good candidate for this but numpy might be.

@poke1024
Copy link
Contributor

LLVM should handle SIMD optimizations for packed arrays beautifully: http://llvm.org/devmtg/2014-02/slides/golin-AutoVectorizationLLVM.pdf. Numpy will always have a disadvantage since it only packages certain operations, but cannot optimize the individual problem on an instruction level. I'm really excited about this LLVM introduction into Mathics.

@sn6uv
Copy link
Member Author

sn6uv commented Sep 15, 2016

Some really interesting stuff on calling back into python from within llvm: http://eli.thegreenplace.net/2015/calling-back-into-python-from-llvmlite-jited-code/.

I've got a basic Print function working with a similar approach but the callbacks are quite powerful. We can read/write the mathics symbol table (Definitions) at LLVM runtime and evaluate arbitrary un-compilable expressions a la MainEvaluate.

@wolfv
Copy link
Member

wolfv commented Sep 19, 2016

I am wondering if compiling to LLVM is better than using the resulting SymPy nodes and then use SymPy code generation to get some C++ and compile and import that to Python e.g. with cppimport (https://github.com/tbenthompson/cppimport). Just a thought :)

@sn6uv
Copy link
Member Author

sn6uv commented Sep 21, 2016

I've rebased on master to fix conflicts in setup.py. I've fixed the llvmlite intrinsic issue so I think this can be merged soon.

@sn6uv sn6uv added this to the 1.0 milestone Sep 21, 2016
@sn6uv sn6uv merged commit 6d9c95b into mathics:master Sep 23, 2016
@sn6uv sn6uv deleted the LLVMCompile branch September 23, 2016 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants