The NumExpr package supplies routines for the fast evaluation of array expressions elementwise by using a vector-based virtual machine.
Using it is simple:
>>> import numpy as np >>> import numexpr as ne >>> a = np.arange(10) >>> b = np.arange(0, 20, 2) >>> c = ne.evaluate('2*a + 3*b') >>> c array([ 0, 8, 16, 24, 32, 40, 48, 56, 64, 72])
It is also possible to use NumExpr to validate an expression:
>>> ne.validate('2*a + 3*b')
which returns None on success or raises an exception on invalid inputs.
and it can also re_evaluate an expression:
>>> b = np.arange(0, 40, 4) >>> ne.re_evaluate()
NumExpr requires Python 3.7 or greater, and NumPy 1.13 or greater. It is built in the standard Python way:
$ pip install .
You must have a C-compiler (i.e. MSVC Build tools on Windows and GCC on Linux) installed.
Then change to a directory that is not the repository directory (e.g. /tmp) and
test numexpr
with:
$ python -c "import numexpr; numexpr.test()"
Starting from release 1.2 on, numexpr includes support for Intel's VML library. This allows for better performance on Intel architectures, mainly when evaluating transcendental functions (trigonometrical, exponential, ...). It also enables numexpr using several CPU cores.
If you have Intel's MKL (the library that embeds VML), just copy the
site.cfg.example
that comes in the distribution to site.cfg
and
edit the latter giving proper directions on how to find your MKL
libraries in your system. After doing this, you can proceed with the
usual building instructions listed above. Pay attention to the
messages during the building process in order to know whether MKL has
been detected or not. Finally, you can check the speed-ups on your
machine by running the bench/vml_timing.py
script (you can play with
different parameters to the set_vml_accuracy_mode()
and
set_vml_num_threads()
functions in the script so as to see how it would
affect performance).
Threads are spawned at import-time, with the number being set by the environment
variable NUMEXPR_MAX_THREADS
. The default maximum thread count is 64.
There is no advantage to spawning more threads than the number of virtual cores
available on the computing node. Practically NumExpr scales at large thread
count (> 8) only on very large matrices (> 2**22). Spawning large numbers
of threads is not free, and can increase import times for NumExpr or packages
that import it such as Pandas or PyTables.
If desired, the number of threads in the pool used can be adjusted via an
environment variable, NUMEXPR_NUM_THREADS
(preferred) or OMP_NUM_THREADS
.
Typically only setting NUMEXPR_MAX_THREADS
is sufficient; the number of
threads used can be adjusted dynamically via numexpr.set_num_threads(int)
.
The number of threads can never exceed that set by NUMEXPR_MAX_THREADS
.
If the user has not configured the environment prior to importing NumExpr, info logs will be generated, and the initial number of threads that are used*_ will be set to the number of cores detected in the system or 8, whichever is *less.
Usage:
import os os.environ['NUMEXPR_MAX_THREADS'] = '16' os.environ['NUMEXPR_NUM_THREADS'] = '8' import numexpr as ne
NumExpr's principal routine is:
evaluate(ex, local_dict=None, global_dict=None, optimization='aggressive', truediv='auto')
where ex
is a string forming an expression, like "2*a+3*b"
. The
values for a
and b
will by default be taken from the calling
function's frame (through the use of sys._getframe()
).
Alternatively, they can be specified using the local_dict
or
global_dict
arguments, or passed as keyword arguments.
The optimization
parameter can take the values 'moderate'
or 'aggressive'
. 'moderate'
means that no optimization is made
that can affect precision at all. 'aggressive'
(the default) means that
the expression can be rewritten in a way that precision could be affected, but
normally very little. For example, in 'aggressive'
mode, the
transformation x~**3
-> x*x*x
is made, but not in
'moderate'
mode.
The truediv parameter specifies whether the division is a 'floor division' (False) or a 'true division' (True). The default is the value of __future__.division in the interpreter. See PEP 238 for details.
Expressions are cached, so reuse is fast. Arrays or scalars are allowed for the variables, which must be of type 8-bit boolean (bool), 32-bit signed integer (int), 64-bit signed integer (long), double-precision floating point number (float), 2x64-bit, double-precision complex number (complex) or raw string of bytes (str). If they are not in the previous set of types, they will be properly upcasted for internal use (the result will be affected as well). The arrays must all be the same size.
NumExpr operates internally only with the following types:
- 8-bit boolean (bool)
- 32-bit signed integer (int or int32)
- 64-bit signed integer (long or int64)
- 32-bit single-precision floating point number (float or float32)
- 64-bit, double-precision floating point number (double or float64)
- 2x64-bit, double-precision complex number (complex or complex128)
- Raw string of bytes (str in Python 2.7, bytes in Python 3+, numpy.str in both cases)
If the arrays in the expression does not match any of these types, they will be upcasted to one of the above types (following the usual type inference rules, see below). Have this in mind when doing estimations about the memory consumption during the computation of your expressions.
Also, the types in NumExpr conditions are somewhat stricter than those
of Python. For instance, the only valid constants for booleans are
True
and False
, and they are never automatically cast to integers.
Casting rules in NumExpr follow closely those of NumPy. However, for implementation reasons, there are some known exceptions to this rule, namely:
- When an array with type
int8
,uint8
,int16
oruint16
is used inside NumExpr, it is internally upcasted to anint
(orint32
in NumPy notation).- When an array with type
uint32
is used inside NumExpr, it is internally upcasted to along
(orint64
in NumPy notation).- A floating point function (e.g.
sin
) acting onint8
orint16
types returns afloat64
type, instead of thefloat32
that is returned by NumPy functions. This is mainly due to the absence of nativeint8
orint16
types in NumExpr.- In operations implying a scalar and an array, the normal rules of casting are used in NumExpr, in contrast with NumPy, where array types takes priority. For example, if
a
is an array of typefloat32
andb
is an scalar of typefloat64
(or Pythonfloat
type, which is equivalent), thena*b
returns afloat64
in NumExpr, but afloat32
in NumPy (i.e. array operands take priority in determining the result type). If you need to keep the result afloat32
, be sure you use afloat32
scalar too.
NumExpr supports the set of operators listed below:
- Bitwise operators (and, or, not, xor):
&, |, ~, ^
- Comparison operators:
<, <=, ==, !=, >=, >
- Unary arithmetic operators:
-
- Binary arithmetic operators:
+, -, *, /, **, %, <<, >>
The next are the current supported set:
where(bool, number1, number2): number
-- number1 if the bool condition is true, number2 otherwise.{sin,cos,tan}(float|complex): float|complex
-- trigonometric sine, cosine or tangent.{arcsin,arccos,arctan}(float|complex): float|complex
-- trigonometric inverse sine, cosine or tangent.arctan2(float1, float2): float
-- trigonometric inverse tangent of float1/float2.{sinh,cosh,tanh}(float|complex): float|complex
-- hyperbolic sine, cosine or tangent.{arcsinh,arccosh,arctanh}(float|complex): float|complex
-- hyperbolic inverse sine, cosine or tangent.{log,log10,log1p}(float|complex): float|complex
-- natural, base-10 and log(1+x) logarithms.{exp,expm1}(float|complex): float|complex
-- exponential and exponential minus one.sqrt(float|complex): float|complex
-- square root.abs(float|complex): float|complex
-- absolute value.conj(complex): complex
-- conjugate value.{real,imag}(complex): float
-- real or imaginary part of complex.complex(float, float): complex
-- complex from real and imaginary parts.contains(np.str, np.str): bool
-- returns True for every string inop1
that containsop2
.
abs()
for complex inputs returns acomplex
output too. This is a departure from NumPy where afloat
is returned instead. However, NumExpr is not flexible enough yet so as to allow this to happen. Meanwhile, if you want to mimic NumPy behaviour, you may want to select the real part via thereal
function (e.g.real(abs(cplx))
) or via thereal
selector (e.g.abs(cplx).real
).
More functions can be added if you need them. Note however that NumExpr 2.6 is in maintenance mode and a new major revision is under development.
The next are the current supported set:
sum(number, axis=None)
: Sum of array elements over a given axis. Negative axis are not supported.prod(number, axis=None)
: Product of array elements over a given axis. Negative axis are not supported.
Note: because of internal limitations, reduction operations must appear the last in the stack. If not, it will be issued an error like:
>>> ne.evaluate('sum(1)*(-1)') RuntimeError: invalid program: reduction operations must occur last
evaluate(expression, local_dict=None, global_dict=None, optimization='aggressive', truediv='auto')
: Evaluate a simple array expression element-wise. See examples above.
re_evaluate(local_dict=None)
: Re-evaluate the last array expression without any check. This is meant for accelerating loops that are re-evaluating the same expression repeatedly without changing anything else than the operands. If unsure, use evaluate() which is safer.
test()
: Run all the tests in the test suite.
print_versions()
: Print the versions of software that numexpr relies on.
set_num_threads(nthreads)
: Sets a number of threads to be used in operations. Returns the previous setting for the number of threads. See note below to see how the number of threads is set via environment variables.If you are using VML, you may want to use set_vml_num_threads(nthreads) to perform the parallel job with VML instead. However, you should get very similar performance with VML-optimized functions, and VML's parallelizer cannot deal with common expressions like (x+1)*(x-2), while NumExpr's one can.
detect_number_of_cores()
: Detects the number of cores on a system.
When compiled with Intel's VML (Vector Math Library), you will be able to use some additional functions for controlling its use. These are:
set_vml_accuracy_mode(mode)
: Set the accuracy for VML operations.The
mode
parameter can take the values:
'low'
: Equivalent to VML_LA - low accuracy VML functions are called'high'
: Equivalent to VML_HA - high accuracy VML functions are called'fast'
: Equivalent to VML_EP - enhanced performance VML functions are calledIt returns the previous mode.
This call is equivalent to the
vmlSetMode()
in the VML library. See:http://www.intel.com/software/products/mkl/docs/webhelp/vml/vml_DataTypesAccuracyModes.html
for more info on the accuracy modes.
set_vml_num_threads(nthreads)
: Suggests a maximum number of threads to be used in VML operations.This function is equivalent to the call
mkl_domain_set_num_threads(nthreads, MKL_VML)
in the MKL library. See:for more info about it.
get_vml_version()
: Get the VML/MKL library version.
NumExpr is distributed under the MIT license.