forked from serge-sans-paille/pythran-stories
-
Notifications
You must be signed in to change notification settings - Fork 0
/
compiler-flags.html
428 lines (401 loc) · 29.3 KB
/
compiler-flags.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Compiler Flags</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="">
<meta name="author" content="serge-sans-paille and other pythraners">
<!-- Le styles -->
<link rel="stylesheet" href="./theme/css/bootstrap.min.css" type="text/css" />
<style type="text/css">
body {
padding-top: 60px;
padding-bottom: 40px;
}
.sidebar-nav {
padding: 9px 0;
}
.tag-1 {
font-size: 13pt;
}
.tag-2 {
font-size: 10pt;
}
.tag-2 {
font-size: 8pt;
}
.tag-4 {
font-size: 6pt;
}
</style>
<link href="./theme/css/bootstrap-responsive.min.css" rel="stylesheet">
<link href="./theme/css/font-awesome.css" rel="stylesheet">
<link href="./theme/css/pygments.css" rel="stylesheet">
<!-- Le HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="//html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<!-- Le fav and touch icons -->
<link rel="shortcut icon" href="./theme/images/favicon.ico">
<link rel="apple-touch-icon" href="./theme/images/apple-touch-icon.png">
<link rel="apple-touch-icon" sizes="72x72" href="./theme/images/apple-touch-icon-72x72.png">
<link rel="apple-touch-icon" sizes="114x114" href="./theme/images/apple-touch-icon-114x114.png">
<link href="./" type="application/atom+xml" rel="alternate" title="Pythran stories ATOM Feed" />
</head>
<body>
<div class="navbar navbar-fixed-top">
<div class="navbar-inner">
<div class="container-fluid">
<a class="btn btn-navbar" data-toggle="collapse" data-target=".nav-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</a>
<a class="brand" href="./index.html">Pythran stories </a>
<div class="nav-collapse">
<ul class="nav">
<li class="divider-vertical"></li>
<li >
<a href="./category/benchmark.html">
<i class="icon-folder-open icon-large"></i>benchmark
</a>
</li>
<li class="active">
<a href="./category/compilation.html">
<i class="icon-folder-open icon-large"></i>compilation
</a>
</li>
<li >
<a href="./category/cython.html">
<i class="icon-folder-open icon-large"></i>cython
</a>
</li>
<li >
<a href="./category/engineering.html">
<i class="icon-folder-open icon-large"></i>engineering
</a>
</li>
<li >
<a href="./category/examples.html">
<i class="icon-folder-open icon-large"></i>examples
</a>
</li>
<li >
<a href="./category/optimisation.html">
<i class="icon-folder-open icon-large"></i>optimisation
</a>
</li>
<li >
<a href="./category/release.html">
<i class="icon-folder-open icon-large"></i>release
</a>
</li>
<ul class="nav pull-right">
<li><a href="./archives.html"><i class="icon-th-list"></i>Archives</a></li>
</ul>
</ul>
<!--<p class="navbar-text pull-right">Logged in as <a href="#">username</a></p>-->
</div><!--/.nav-collapse -->
</div>
</div>
</div>
<div class="container-fluid">
<div class="row">
<div class="span9" id="content">
<section id="content">
<article>
<header>
<h1>
<a href=""
rel="bookmark"
title="Permalink to Compiler Flags">
Compiler Flags
</a>
</h1>
</header>
<div class="entry-content">
<div class="well">
<footer class="post-info">
<span class="label">Date</span>
<abbr class="published" title="2016-03-29T00:00:00+02:00">
<i class="icon-calendar"></i>Tue 29 March 2016
</abbr>
<span class="label">By</span>
<a href="./author/serge-sans-paille.html"><i class="icon-user"></i>serge-sans-paille</a>
<span class="label">Category</span>
<a href="./category/compilation.html"><i class="icon-folder-open"></i>compilation</a>.
</footer><!-- /.post-info --> </div>
<div class="section" id="when-size-matters">
<h2>When Size Matters</h2>
<p>Everything started a few days ago with a Pythran user complaining about the
size of the binaries generated by Pythran. In essence, take the following code
<cite>cda.py</cite>:</p>
<div class="highlight"><pre><span></span><span class="c1">#pythran export closest_distance_arrays(float, float, float[], float[])</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
<span class="kn">import</span> <span class="nn">math</span>
<span class="k">def</span> <span class="nf">closest_distance_arrays</span> <span class="p">(</span><span class="n">lat1</span><span class="p">,</span> <span class="n">long1</span><span class="p">,</span> <span class="n">latitudes</span><span class="p">,</span> <span class="n">longitudes</span><span class="p">):</span>
<span class="n">degrees_to_radians</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">pi</span><span class="o">/</span><span class="mf">180.0</span>
<span class="n">phi1</span> <span class="o">=</span> <span class="p">(</span><span class="mf">90.0</span> <span class="o">-</span> <span class="n">lat1</span><span class="p">)</span><span class="o">*</span><span class="n">degrees_to_radians</span>
<span class="n">phi2</span> <span class="o">=</span> <span class="p">(</span><span class="mf">90.0</span> <span class="o">-</span> <span class="n">latitudes</span><span class="p">)</span><span class="o">*</span><span class="n">degrees_to_radians</span>
<span class="n">theta1</span> <span class="o">=</span> <span class="n">long1</span><span class="o">*</span><span class="n">degrees_to_radians</span>
<span class="n">theta2</span> <span class="o">=</span> <span class="n">longitudes</span><span class="o">*</span><span class="n">degrees_to_radians</span>
<span class="n">cos</span> <span class="o">=</span> <span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">sin</span><span class="p">(</span><span class="n">phi1</span><span class="p">)</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">sin</span><span class="p">(</span><span class="n">phi2</span><span class="p">)</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">cos</span><span class="p">(</span><span class="n">theta1</span> <span class="o">-</span> <span class="n">theta2</span><span class="p">)</span> <span class="o">+</span>
<span class="n">math</span><span class="o">.</span><span class="n">cos</span><span class="p">(</span><span class="n">phi1</span><span class="p">)</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">cos</span><span class="p">(</span><span class="n">phi2</span><span class="p">))</span>
<span class="n">arc</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arccos</span><span class="p">(</span> <span class="n">cos</span> <span class="p">)</span>
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">argmin</span><span class="p">(</span><span class="n">arc</span><span class="p">),</span> <span class="n">arc</span><span class="o">.</span><span class="n">min</span><span class="p">()</span>
</pre></div>
<p>It doesn't even weight a kilobyte, and when benchmarked, it runs in a few milliseconds:</p>
<div class="highlight"><pre><span></span>> python -m timeit -s <span class="s1">'import numpy as np; n = 20000 ; lat, lon = np.random.rand(n), np.random.rand(n); x,y = np.random.rand(), np.random.rand(); from cda import closest_distance_arrays'</span> <span class="s1">'closest_distance_arrays(x,y,lat, lon)'</span>
<span class="m">100</span> loops, best of <span class="m">3</span>: <span class="m">1</span>.95 msec per loop
</pre></div>
<p>Thanks to the <tt class="docutils literal">#pythran export</tt> annotation, Pythran can turn it into a native
library that runs slightly faster than the Python version:</p>
<div class="highlight"><pre><span></span>> pythran cda.py
> python -m timeit -s <span class="s1">'import numpy as np; n = 20000 ; lat, lon = np.random.rand(n), np.random.rand(n); x,y = np.random.rand(), np.random.rand(); from cda import closest_distance_arrays'</span> <span class="s1">'closest_distance_arrays(x,y,lat, lon)'</span>
<span class="m">1000</span> loops, best of <span class="m">3</span>: <span class="m">1</span>.17 msec per loop
</pre></div>
<p>It is, however, a very big binary:</p>
<div class="highlight"><pre><span></span>> ls -lh cda.so
-rwxr-xr-x <span class="m">1</span> sguelton sguelton <span class="m">1</span>.3M Mar <span class="m">29</span> <span class="m">18</span>:10 cda.so*
</pre></div>
<p>Who wants to multiply the binary size by <tt class="docutils literal">2e3</tt> to get less than a <tt class="docutils literal">x2</tt> speedup?</p>
</div>
<div class="section" id="the-culprits-debug-informations">
<h2>The culprits: Debug Informations</h2>
<p>One can call Pythran with the <tt class="docutils literal"><span class="pre">-v</span></tt> flag to inspect part of its internal,
especially the C++ compiler call done to perform object code generation and
linking:</p>
<div class="highlight"><pre><span></span>> pythran cda.py -v
running build_ext
running build_src
build_src
building extension <span class="s2">"cda"</span> sources
build_src: building npy-pkg config files
new_compiler returns distutils.unixccompiler.UnixCCompiler
INFO customize UnixCCompiler
customize UnixCCompiler using build_ext
********************************************************************************
distutils.unixccompiler.UnixCCompiler
<span class="nv">linker_exe</span> <span class="o">=</span> <span class="o">[</span><span class="s1">'gcc'</span><span class="o">]</span>
<span class="nv">compiler_so</span> <span class="o">=</span> <span class="o">[</span><span class="s1">'gcc'</span>, <span class="s1">'-DNDEBUG'</span>, <span class="s1">'-g'</span>, <span class="s1">'-fwrapv'</span>, <span class="s1">'-O2'</span>, <span class="s1">'-Wall'</span>, <span class="s1">'-Wstrict-prototypes'</span>, <span class="s1">'-fno-strict-aliasing'</span>, <span class="s1">'-g'</span>, <span class="s1">'-O2'</span>, <span class="s1">'-fPIC'</span><span class="o">]</span>
<span class="nv">archiver</span> <span class="o">=</span> <span class="o">[</span><span class="s1">'x86_64-linux-gnu-gcc-ar'</span>, <span class="s1">'rc'</span><span class="o">]</span>
<span class="nv">preprocessor</span> <span class="o">=</span> <span class="o">[</span><span class="s1">'gcc'</span>, <span class="s1">'-E'</span><span class="o">]</span>
<span class="nv">linker_so</span> <span class="o">=</span> <span class="o">[</span><span class="s1">'x86_64-linux-gnu-gcc'</span>, <span class="s1">'-pthread'</span>, <span class="s1">'-shared'</span>, <span class="s1">'-Wl,-O1'</span>, <span class="s1">'-Wl,-Bsymbolic-functions'</span>, <span class="s1">'-Wl,-z,relro'</span>, <span class="s1">'-fno-strict-aliasing'</span>, <span class="s1">'-DNDEBUG'</span>, <span class="s1">'-g'</span>, <span class="s1">'-fwrapv'</span>, <span class="s1">'-O2'</span>, <span class="s1">'-Wall'</span>, <span class="s1">'-Wstrict-prototypes'</span>, <span class="s1">'-Wdate-time'</span>, <span class="s1">'-D_FORTIFY_SOURCE=2'</span>, <span class="s1">'-g'</span>, <span class="s1">'-fstack-protector-strong'</span>, <span class="s1">'-Wformat'</span>, <span class="s1">'-Werror=format-security'</span>, <span class="s1">'-Wl,-z,relro'</span>, <span class="s1">'-g'</span>, <span class="s1">'-O2'</span><span class="o">]</span>
<span class="nv">compiler_cxx</span> <span class="o">=</span> <span class="o">[</span><span class="s1">'g++'</span><span class="o">]</span>
<span class="nv">ranlib</span> <span class="o">=</span> None
<span class="nv">compiler</span> <span class="o">=</span> <span class="o">[</span><span class="s1">'gcc'</span>, <span class="s1">'-DNDEBUG'</span>, <span class="s1">'-g'</span>, <span class="s1">'-fwrapv'</span>, <span class="s1">'-O2'</span>, <span class="s1">'-Wall'</span>, <span class="s1">'-Wstrict-prototypes'</span>, <span class="s1">'-fno-strict-aliasing'</span>, <span class="s1">'-g'</span>, <span class="s1">'-O2'</span><span class="o">]</span>
<span class="nv">libraries</span> <span class="o">=</span> <span class="o">[]</span>
<span class="nv">library_dirs</span> <span class="o">=</span> <span class="o">[]</span>
<span class="nv">include_dirs</span> <span class="o">=</span> <span class="o">[</span><span class="s1">'/usr/include/python2.7'</span><span class="o">]</span>
<span class="o">[</span>...<span class="o">]</span>
INFO Generated module: cda
INFO Output: /home/sguelton/sources/pythran/cda.so
</pre></div>
<p>That's a pretty long trace, but that's what verbose mode is for. The
enlightened reader noticed that we use <tt class="docutils literal">distutils</tt> under the hood to abstract
the compiler calls, and that's why we're getting some funky compiler flags like
<tt class="docutils literal"><span class="pre">-g</span> <span class="pre">-fwrapv</span> <span class="pre">-O2</span> <span class="pre">-Wall</span> <span class="pre">-fno-strict-aliasing</span> <span class="pre">-g</span> <span class="pre">-O2</span> <span class="pre">-fPIC</span></tt> or even funkier
<tt class="docutils literal"><span class="pre">-fstack-protector-strong</span> <span class="pre">-Wformat</span> <span class="pre">-Werror=format-security</span> <span class="pre">-Wl,-z,relro</span></tt>.
That's the default for native python extensions on my distrib. Funny enough the
last ones are hardening flags used to improve the security of the binary and I
wrote a (passionating) article about it for Quarkslab <a class="footnote-reference" href="#id4" id="id1">[0]</a>.</p>
<p>It turns out <tt class="docutils literal"><span class="pre">-g</span></tt> (and C++) is responsible for the fat binary: if we simply
strip the binary, we get back to a decent size:</p>
<div class="highlight"><pre><span></span>> strip cda.so
> ls -lh cda.so
-rwxr-xr-x <span class="m">1</span> sguelton sguelton 151K Mar <span class="m">29</span> <span class="m">18</span>:26 cda.so
</pre></div>
<p>As Pythran users generally don't want the debug info on the generated native
code, we chose to strip them by default, using the linker flag
<tt class="docutils literal"><span class="pre">-Wl,-strip-all</span></tt> that removes all symbol informations, including debug
symbols.</p>
</div>
<div class="section" id="a-step-further-default-symbol-visibility">
<h2>A Step further: Default Symbol visibility</h2>
<p>While we're at it, let's call <tt class="docutils literal">nm</tt> to check if any symbol remains in the
binary. After all, the Python interpreter still needs some of them to load the
native extension!</p>
<div class="highlight"><pre><span></span>> nm -C -D cda.so
<span class="o">[</span>...<span class="o">]</span> skipping > <span class="m">900</span> entries
000000000001ed00 u nt2::ext::implement<nt2::tag::rem_pio2_ <span class="o">(</span>boost::dispatch::meta::scalar_<boost::dispatch::meta::double_<double> >, boost::dispatch::meta::scalar_<boost::dispatch::meta::double_<double> >, boost::dispatch::meta::scalar_<boost::dispatch::meta::double_<double> ><span class="o">)</span>, boost::dispatch::tag::cpu_, void>::__kernel_rem_pio2<span class="o">(</span>double*, double*, int, int, int, int const*<span class="o">)</span>::PIo2
000000000001edc0 u nt2::ext::implement<nt2::tag::rem_pio2_ <span class="o">(</span>boost::dispatch::meta::scalar_<boost::dispatch::meta::double_<double> >, boost::dispatch::meta::scalar_<boost::dispatch::meta::double_<double> >, boost::dispatch::meta::scalar_<boost::dispatch::meta::double_<double> ><span class="o">)</span>, boost::dispatch::tag::cpu_, void>::__ieee754_rem_pio2<span class="o">(</span>double, double*<span class="o">)</span>::two_over_pi
000000000001ed40 u nt2::ext::implement<nt2::tag::rem_pio2_ <span class="o">(</span>boost::dispatch::meta::scalar_<boost::dispatch::meta::double_<double> >, boost::dispatch::meta::scalar_<boost::dispatch::meta::double_<double> >, boost::dispatch::meta::scalar_<boost::dispatch::meta::double_<double> ><span class="o">)</span>, boost::dispatch::tag::cpu_, void>::__ieee754_rem_pio2<span class="o">(</span>double, double*<span class="o">)</span>::npio2_hw
</pre></div>
<p>I can tell you Python is <em>not</em> using nt2 dispatch mechanism to load native
extensions. Again, the default compiler settings are responsible for this
noise, and the relevant compiler flag is <tt class="docutils literal"><span class="pre">-fvisibility=hidden</span></tt> that tells the
compiler than only the functions flagged with a special attribute are part of
the external ABI, the other ones are not exported. As Python uses a single
entry point to load Pythran modules, namely <tt class="docutils literal">PyInit_cda</tt> for Python3 modules
and <tt class="docutils literal">initcda</tt> for Python2 modules <a class="footnote-reference" href="#id5" id="id2">[1]</a>, one can add the <tt class="docutils literal">__attribute__
<span class="pre">((visibility("default")))</span></tt> on this symbol and it will be the only exported
one. This slightly impacts the code size, may decrease loading time and
eventually gives the compiler more optimization opportunities, but nothing
significant there (131K), apart the pleasure of generating cleaner binaries.
That's also going to be the default for next Pythran version.</p>
</div>
<div class="section" id="out-of-chance-getting-faster-binaries">
<h2>Out of chance: getting faster binaries</h2>
<p>In the (huge) info pages of GCC, near the doc of <tt class="docutils literal"><span class="pre">-fvisibility=hidden</span></tt>,
there's this (GCC only) compiler flag, <tt class="docutils literal"><span class="pre">-fwhole-program</span></tt> that implements some
kind of Link Time Optimization, in the sense that it tells the compiler to
consider the current compilation unit (or code) as a whole program. As
specified in the GCC man page, "All public functions and variables with the
exception of "main" and those merged by attribute "externally_visible" become
static functions and in effect are optimized more aggressively by
interprocedural optimizers.", which basically means that every function is
considered static except for "main" and the ones that are explicitly told not
to be. This allows the compiler for instance to remove functions that are
always inlined, and thus win space. So we flag the <tt class="docutils literal">initcda</tt> function with
<tt class="docutils literal">__attribute__ ((externally_visible))</tt>. That sounds a bit redundant to me
with the visibility attribute, but it turns out this triggers abunch of
different optimization path that gives us a significantly smaller binary, that
runs slightly faster:</p>
<div class="highlight"><pre><span></span>> pythran cda.py -fvisibility<span class="o">=</span>hidden -fwhole-program -Wl,-strip-all
> ls -lh cda.so
-rwxr-xr-x <span class="m">1</span> sguelton sguelton 31K Mar <span class="m">29</span> <span class="m">18</span>:52 cda.so*
> python -m timeit -s <span class="s1">'import numpy as np; n = 20000 ; lat, lon = np.random.rand(n), np.random.rand(n); x,y = np.random.rand(), np.random.rand(); from cda import closest_distance_arrays'</span> <span class="s1">'closest_distance_arrays(x,y,lat, lon)'</span>
<span class="m">1000</span> loops, best of <span class="m">3</span>: <span class="m">1</span>.15 msec per loop
</pre></div>
<p>All these flags are now the default on Linux.</p>
</div>
<div class="section" id="playing-with-the-optimization-flags-too">
<h2>Playing with the optimization flags too</h2>
<p>The default optimization flag is <tt class="docutils literal"><span class="pre">-O2</span></tt>, and that's generally a decent choice.
On <tt class="docutils literal">cda.py</tt>, using <tt class="docutils literal"><span class="pre">-O3</span></tt> does not give much change (gcc 4.9):</p>
<div class="highlight"><pre><span></span>> pythran cda.py -fvisibility<span class="o">=</span>hidden -fwhole-program -Wl,-strip-all -O3
> python -m timeit <span class="o">[</span>...<span class="o">]</span>
<span class="m">1000</span> loops, best of <span class="m">3</span>: <span class="m">1</span>.14 msec per loop
</pre></div>
<p>Asking for code specific to my CPU using <tt class="docutils literal"><span class="pre">-march=native</span></tt> actually gives some improvments</p>
<div class="highlight"><pre><span></span>> pythran cda.py -fvisibility<span class="o">=</span>hidden -fwhole-program -Wl,-strip-all -O3 -march<span class="o">=</span>native
> python -m timeit <span class="o">[</span>...<span class="o">]</span>
<span class="m">1000</span> loops, best of <span class="m">3</span>: <span class="m">1</span>.11 msec per loop
</pre></div>
<p>But the best speedup has a price: relaxing standard compliance with <tt class="docutils literal"><span class="pre">-Ofast</span></tt>
can be beneficial if you're not using denormalized numbers, infinity and the
monstrosity that lies with <tt class="docutils literal">NaN</tt>:</p>
<div class="highlight"><pre><span></span>> pythran cda.py -fvisibility<span class="o">=</span>hidden -fwhole-program -Wl,-strip-all -Ofast -march<span class="o">=</span>native
> python -m timeit <span class="o">[</span>...<span class="o">]</span>
<span class="m">1000</span> loops, best of <span class="m">3</span>: <span class="m">1</span>.02 msec per loop
</pre></div>
<p>If you're really into compiler flags tuning, you can try out <tt class="docutils literal"><span class="pre">-funroll-loops</span></tt>
or try to tune the <tt class="docutils literal"><span class="pre">-finline-limit=N</span></tt> parameter (that actually get mets dow
to <tt class="docutils literal">1ms per loop</tt>) but that's going a bit too far :-)</p>
</div>
<div class="section" id="don-t-forget-vectorization">
<h2>Don't forget Vectorization</h2>
<p>Combining <tt class="docutils literal"><span class="pre">-O3</span></tt> and <tt class="docutils literal"><span class="pre">-march=native</span></tt> triggers compiler auto-vectorization[2]_,
but that did not helped much on our case. Indeed, automatic vectorization, as
in « I am using the multimedia instruction set of my CPU » is still a difficult
task for compilers. Fortunately Pythran helps here, and passing the
not-so-experimental-anymore-but-still-not-default flag <tt class="docutils literal"><span class="pre">-DUSE_BOOST_SIMD</span></tt>
triggers some hard-coded vectorization based on <tt class="docutils literal">boost.simd</tt> <a class="footnote-reference" href="#id7" id="id3">[3]</a>, and that
<strong>did</strong> help:</p>
<div class="highlight"><pre><span></span>> <span class="c1"># esod mumixam</span>
> python -m pythran.run cda.cpp -fvisibility<span class="o">=</span>hidden -fwhole-program -Wl,-strip-all -Ofast -march<span class="o">=</span>native -funroll-loops -finline-limit<span class="o">=</span><span class="m">100000000</span> -DUSE_BOOST_SIMD
> python -m timeit <span class="o">[</span>...<span class="o">]</span>
<span class="m">1000</span> loops, best of <span class="m">3</span>: <span class="m">462</span> usec per loo
</pre></div>
<p>And that's woth 63 kilobytes :-)</p>
</div>
<div class="section" id="concluding-remarks">
<h2>Concluding Remarks</h2>
<p>Source-to-source compilers <em>do</em> generate ugly intermediate code, and Pythran is
not an exception. One benefit though is that you can get a full control over
the <em>backend</em> compiler, which means you can tune it to your needs. Given some
knowledge and benchmarking effort, it can get you closer to your goal without
changing the original code.</p>
<table class="docutils footnote" frame="void" id="id4" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id1">[0]</a></td><td>And I am shamelessly advertising it :-) <a class="reference external" href="http://blog.quarkslab.com/clang-hardening-cheat-sheet.html">http://blog.quarkslab.com/clang-hardening-cheat-sheet.html</a></td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id5" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id2">[1]</a></td><td>If you really want to inspect the intermediate C++ code generated by pythran use the <tt class="docutils literal"><span class="pre">-E</span></tt> flag and a <tt class="docutils literal">cda.cpp</tt> will be generated.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id6" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label">[2]</td><td>only GCC needs this, clang turns vectorisation at <tt class="docutils literal"><span class="pre">-O2</span></tt>. <tt class="docutils literal"><span class="pre">-march=native</span></tt> allows it to use a more recent instruction set if available.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id7" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id3">[3]</a></td><td>Thanks Numscale <a class="reference external" href="https://www.numscale.com/boost-simd/">https://www.numscale.com/boost-simd/</a></td></tr>
</tbody>
</table>
</div>
</div><!-- /.entry-content -->
</article>
</section>
</div><!--/span-->
<div class="span3 well sidebar-nav" id="sidebar">
<ul class="nav nav-list">
<li class="nav-header"><h4><i class="icon-external-link"></i>blogroll</h4></li>
<li><a href="http://pythonhosted.org/pythran"><i class="icon-external-link"></i>Pythran Doc</a></li>
<li><a href="https://pypi.python.org/pypi/pythran"><i class="icon-external-link"></i>Pythran on PyPI</a></li>
<li class="nav-header"><h4><i class="icon-home icon-large"></i> social</h4></li>
<li><a href="./feeds/all.atom.xml" rel="alternate"><i class="icon-bookmark icon-large"></i>atom feed</a></li>
<li><a href="https://github.com/serge-sans-paille/pythran"><i class="icon-github-sign icon-large"></i>github</a></li>
<li class="nav-header"><h4><i class="icon-folder-close icon-large"></i>Categories</h4></li>
<li>
<a href="./category/benchmark.html">
<i class="icon-folder-open icon-large"></i>benchmark
</a>
</li>
<li>
<a href="./category/compilation.html">
<i class="icon-folder-open icon-large"></i>compilation
</a>
</li>
<li>
<a href="./category/cython.html">
<i class="icon-folder-open icon-large"></i>cython
</a>
</li>
<li>
<a href="./category/engineering.html">
<i class="icon-folder-open icon-large"></i>engineering
</a>
</li>
<li>
<a href="./category/examples.html">
<i class="icon-folder-open icon-large"></i>examples
</a>
</li>
<li>
<a href="./category/optimisation.html">
<i class="icon-folder-open icon-large"></i>optimisation
</a>
</li>
<li>
<a href="./category/release.html">
<i class="icon-folder-open icon-large"></i>release
</a>
</li>
<li class="nav-header"><h4><i class="icon-tags icon-large"></i>Tags</h4></li>
</ul> </div><!--/.well -->
</div><!--/row-->
<hr>
<footer>
<address id="about">
Proudly powered by <a href="http://pelican.notmyidea.org/">Pelican <i class="icon-external-link"></i></a>,
which takes great advantage of <a href="http://python.org">Python <i class="icon-external-link"></i></a>.
</address><!-- /#about -->
<p>The theme is from <a href="http://twitter.github.com/bootstrap/">Bootstrap from Twitter <i class="icon-external-link"></i></a>,
and <a href="http://fortawesome.github.com/Font-Awesome/">Font-Awesome <i class="icon-external-link"></i></a>, thanks!</p>
</footer>
</div><!--/.fluid-container-->
<!-- Le javascript -->
<!-- Placed at the end of the document so the pages load faster -->
<script src="./theme/js/jquery-1.7.2.min.js"></script>
<script src="./theme/js/bootstrap.min.js"></script>
</body>
</html>