Skip to content
This repository was archived by the owner on Jan 25, 2023. It is now read-only.

Commit bcb9b90

Browse files
reazulhoqueDeb, Diptorup
authored andcommitted
Ms138 directives (numba#52)
* Fix bugs to make pranges work * Update README * Update README * Only keep setitems to track buffers being modified inside kernel
1 parent 26e747a commit bcb9b90

File tree

5 files changed

+200
-63
lines changed

5 files changed

+200
-63
lines changed

README.rst

Lines changed: 52 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,78 +1,80 @@
1-
****************
2-
DPPY
3-
****************
4-
=========
1+
2+
DPPY
3+
====
4+
5+
========
56
1. What?
6-
=========
7+
========
78

89
DPPy proof-of-concept backend for NUMBA to support compilation for Intel CPU and
910
GPU architectures. The present implementation of DPPy is based on OpenCL 2.1,
1011
but is likely to change in the future to rely on Sycl/DPC++ or Intel Level-0
1112
driver API.
1213

13-
================
14+
===============
1415
2. Prequisites?
15-
================
16-
17-
Bash : In the system and not as default Shell
18-
Tar : To extract files
19-
Git : To fetch required dependencies listed below
20-
C/C++ compiler : To build the dependencies
21-
Cmake : For managing build process of dependencies
22-
Python3 : Version 3 is required
23-
Conda or miniconda : Can be found at https://docs.conda.io/en/latest/miniconda.html
24-
25-
OpenCL 2.1 driver : DPPy currently works for both Intel GPUs and CPUs is
26-
a correct OpenCL driver version is found on the system.
16+
===============
2717

28-
Note. To use the GPU users should be added to "video"
29-
user group on Linux systems.
18+
- Bash : In the system and not as default Shell
19+
- Tar : To extract files
20+
- Git : To fetch required dependencies listed below
21+
- C/C++ compiler : To build the dependencies
22+
- Cmake : For managing build process of dependencies
23+
- Python3 : Version 3 is required
24+
- Conda or miniconda : Can be found at https://docs.conda.io/en/latest/miniconda.html
25+
- OpenCL 2.1 driver : DPPy currently works for both Intel GPUs and CPUs is a correct OpenCL driver version is found on the system.
26+
Note. To use the GPU users should be added to "video" user group on Linux systems.
3027

3128

32-
The following requisites will be installed by the install script provided with
33-
this package.
29+
The following requisites will need to be present in the system. Refer to next section for more details.
30+
*******************************************************************************************************
3431

35-
NUMBA v0.48 : The DPPy backend has only been tested for NUMBA v0.48.
36-
The included install script downloads and applies the
37-
DDPy patch to the correct NUMBA version.
32+
- NUMBA v0.48 : The DPPy backend has only been tested for NUMBA v0.48. The included install script downloads and applies the DDPy patch to the correct NUMBA version.
3833

39-
LLVM-SPIRV translator: Used for SPIRV generation from LLVM IR.
34+
- LLVM-SPIRV translator: Used for SPIRV generation from LLVM IR.
4035

41-
SPIRV-Tools : Used internally for code-generation. The provided install
42-
script would handle downloading and installing the
43-
required version.
36+
- SPIRV-Tools : Used internally for code-generation.
4437

45-
LLVMDEV : To support LLVM IR generation.
38+
- LLVMDEV : To support LLVM IR generation.
4639

47-
Others : All existing dependecies for NUMBA, such as llvmlite,
48-
also apply to DPPy.
40+
- Others : All existing dependecies for NUMBA, such as llvmlite, also apply to DPPy.
4941

5042
==================
5143
3. How to install?
5244
==================
45+
Install Pre-requisites
46+
*************************
47+
Make sure the dependencies of NUMBA-DPPY are installed in the system, for convenience
48+
and to make sure the dependencies are installed with consistent version of LLVM we provide
49+
installation script that will create a CONDA environment and install LLVM-SPIRV translator,
50+
SPIRV-Tools and llvmlite in that environment. **To use this CONDA has to be available in the system**.
5351

54-
Extract the archive:
52+
The above mentioned installation script can be found `here <https://github.intel.com/SAT/numba-pvc-build-scripts>`_. Please follow the README to run the installation script.
5553

56-
tar -zxvf NUMBA-PVC-offline.tar.gz
54+
After successful installation the following message should be displayed:
5755

58-
Run the installer script:
56+
| #
57+
| # Use the following to activate the correct environment
58+
| #
59+
| # ` $ ``conda activate numba-dppy-env`` `
60+
| #
61+
| # Use the following to deactivate environment
62+
| #
63+
| # ` $ ``conda deactivate`` `
5964
60-
./build_numba_dppy.sh --prefix $PATH_TO_INSTALL_NUMBA-DPPY
65+
The installer script creates a new conda environment called numba-dppy-env with
66+
all the needed dependencies already installed. **Please activate the numba-dppy-env before proceeding**.
6167

62-
After successful installation the following message should be displayed:
6368

64-
#
65-
# Use the following to activate the correct environment
66-
#
67-
# $ conda activate numba-dppy-env
68-
#
69-
# Use the following to deactivate environment
70-
#
71-
# $ conda deactivate
69+
Install DPPY backend
70+
***********************
71+
NUMBA-DPPY also depend on DPPY backend. It can be found `here <https://github.intel.com/SAT/dppy>`_. Please run
72+
`build_for_conda.sh` to install DPPY backend.
7273

73-
The installer script creates a new conda environment called numba-dppy-env with
74-
all the needed dependencies already installed. To use the DPPy backend, please
75-
activate the numba-dppy-env
74+
Install NUMBA-DPPY
75+
*********************
76+
After all the dependencies are installed please run ``build_for_develop.sh`` to get a local installation of NUMBA-DPPY. **Both step 2 and 3 assumes CONDA environment with
77+
the dependencies of NUMBA-DPPY installed in it, was activated**.
7678

7779
================
7880
4. Running tests
@@ -81,12 +83,11 @@ activate the numba-dppy-env
8183
To make sure the installation was successful, try running the examples and the
8284
test suite:
8385

84-
$PATH_TO_INSTALL_NUMBA-DPPY/numba/dppy/examples/
85-
$PATH_TO_INSTALL_NUMBA-DPPY/numba/dppy/tests/dppy/
86+
$PATH_TO_NUMBA-DPPY/numba/dppy/examples/
8687

8788
To run the test suite execute the following:
8889

89-
$ python -m numba.runtests numba.dppy.tests
90+
$ ``python -m numba.runtests numba.dppy.tests``
9091

9192
===========================
9293
5. How Tos and Known Issues
@@ -101,5 +102,3 @@ examples, supported functionalities, and known issues.
101102
===================
102103

103104
Please email diptorup.deb@intel.com to report issues and bugs.
104-
105-

numba/dppy/dppy_host_fn_call_gen.py

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77
import llvmlite.binding as lb
88
from .. import types, cgutils
99

10+
from numba.ir_utils import legalize_names
11+
1012
class DPPyHostFunctionCallsGenerator(object):
1113
def __init__(self, lowerer, cres, num_inputs):
1214
self.lowerer = lowerer
@@ -126,7 +128,7 @@ def _call_dppy_kernel_arg_fn(self, args):
126128
self.builder.store(self.builder.load(kernel_arg), dst)
127129

128130

129-
def process_kernel_arg(self, var, llvm_arg, arg_type, gu_sig, val_type, index):
131+
def process_kernel_arg(self, var, llvm_arg, arg_type, gu_sig, val_type, index, modified_arrays):
130132

131133
if isinstance(arg_type, types.npytypes.Array):
132134
if llvm_arg is None:
@@ -176,6 +178,13 @@ def process_kernel_arg(self, var, llvm_arg, arg_type, gu_sig, val_type, index):
176178
buffer_ptr]
177179
self.builder.call(self.create_dppy_rw_mem_buffer, args)
178180

181+
# names are replaces usig legalize names, we have to do the same for them to match
182+
legal_names = legalize_names([var])
183+
184+
if legal_names[var] in modified_arrays:
185+
self.read_bufs_after_enqueue.append((buffer_ptr, total_size, data_member))
186+
187+
# We really need to detect when an array needs to be copied over
179188
if index < self.num_inputs:
180189
args = [self.builder.inttoptr(self.gpu_device_int_const, self.void_ptr_t),
181190
self.builder.load(buffer_ptr),
@@ -185,8 +194,6 @@ def process_kernel_arg(self, var, llvm_arg, arg_type, gu_sig, val_type, index):
185194
self.builder.bitcast(self.builder.load(data_member), self.void_ptr_t)]
186195

187196
self.builder.call(self.write_mem_buffer_to_device, args)
188-
else:
189-
self.read_bufs_after_enqueue.append((buffer_ptr, total_size, data_member))
190197

191198
self.builder.call(self.create_dppy_kernel_arg_from_buffer, [buffer_ptr, kernel_arg])
192199
dst = self.builder.gep(self.kernel_arg_array, [self.context.get_constant(types.intp, self.cur_arg)])
@@ -215,7 +222,7 @@ def process_kernel_arg(self, var, llvm_arg, arg_type, gu_sig, val_type, index):
215222
for this_stride in range(arg_type.ndim):
216223
stride_entry = self.builder.gep(stride_member,
217224
[self.context.get_constant(types.int32, 0),
218-
self.context.get_constant(types.int32, this_dim)])
225+
self.context.get_constant(types.int32, this_stride)])
219226

220227
args = [self.builder.bitcast(stride_entry, self.void_ptr_ptr_t),
221228
self.context.get_constant(types.uintp, self.sizeof_intp)]
@@ -275,4 +282,4 @@ def enqueue_kernel_and_read_back(self, loop_ranges):
275282
self.zero,
276283
self.builder.load(array_size_member),
277284
self.builder.bitcast(self.builder.load(data_member), self.void_ptr_t)]
278-
self.builder.call(self.read_mem_buffer_from_device, args)
285+
self.builder.call(self.read_mem_buffer_from_device, args)

numba/dppy/dppy_lowerer.py

Lines changed: 37 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,18 @@
5050
import dppy.core as driver
5151

5252

53+
def _print_block(block):
54+
for i, inst in enumerate(block.body):
55+
print(" ", i, inst)
56+
57+
def _print_body(body_dict):
58+
'''Pretty-print a set of IR blocks.
59+
'''
60+
for label, block in body_dict.items():
61+
print("label: ", label)
62+
_print_block(block)
63+
64+
5365
# This loop scheduler is pretty basic, there is only
5466
# 3 dimension allowed in OpenCL, so to make the backend
5567
# functional we will schedule the first 3 dimensions
@@ -201,6 +213,20 @@ def to_scalar_from_0d(x):
201213
return x.dtype
202214
return x
203215

216+
def find_setitems_block(setitems, block, typemap):
217+
for inst in block.body:
218+
if isinstance(inst, ir.StaticSetItem) or isinstance(inst, ir.SetItem):
219+
setitems.add(inst.target.name)
220+
elif isinstance(inst, parfor.Parfor):
221+
find_setitems_block(setitems, inst.init_block, typemap)
222+
find_setitems_body(setitems, inst.loop_body, typemap)
223+
224+
def find_setitems_body(setitems, loop_body, typemap):
225+
"""
226+
Find the arrays that are written into (goes into setitems)
227+
"""
228+
for label, block in loop_body.items():
229+
find_setitems_block(setitems, block, typemap)
204230

205231
def _create_gufunc_for_regular_parfor():
206232
#TODO
@@ -487,6 +513,9 @@ def print_arg_with_addrspaces(args):
487513
wrapped_blocks = wrap_loop_body(loop_body)
488514
#hoisted, not_hoisted = hoist(parfor_params, loop_body,
489515
# typemap, wrapped_blocks)
516+
setitems = set()
517+
find_setitems_body(setitems, loop_body, typemap)
518+
490519
hoisted = []
491520
not_hoisted = []
492521

@@ -599,7 +628,7 @@ def print_arg_with_addrspaces(args):
599628
if config.DEBUG_ARRAY_OPT:
600629
print("kernel_sig = ", kernel_sig)
601630

602-
return kernel_func, parfor_args, kernel_sig, func_arg_types
631+
return kernel_func, parfor_args, kernel_sig, func_arg_types, setitems
603632

604633

605634
def _lower_parfor_gufunc(lowerer, parfor):
@@ -680,7 +709,7 @@ def _lower_parfor_gufunc(lowerer, parfor):
680709
numba.parfor.sequential_parfor_lowering = True
681710
loop_ranges = [(l.start, l.stop, l.step) for l in parfor.loop_nests]
682711

683-
func, func_args, func_sig, func_arg_types =(
712+
func, func_args, func_sig, func_arg_types, modified_arrays =(
684713
_create_gufunc_for_parfor_body(
685714
lowerer,
686715
parfor,
@@ -731,7 +760,8 @@ def _lower_parfor_gufunc(lowerer, parfor):
731760
loop_ranges,
732761
parfor.init_block,
733762
index_var_typ,
734-
parfor.races)
763+
parfor.races,
764+
modified_arrays)
735765

736766
if config.DEBUG_ARRAY_OPT:
737767
sys.stdout.flush()
@@ -820,7 +850,8 @@ def generate_dppy_host_wrapper(lowerer,
820850
loop_ranges,
821851
init_block,
822852
index_var_typ,
823-
races):
853+
races,
854+
modified_arrays):
824855
'''
825856
Adds the call to the gufunc function from the main function.
826857
'''
@@ -841,6 +872,7 @@ def generate_dppy_host_wrapper(lowerer,
841872
print("sin", sin)
842873
print("sout", sout)
843874
print("cres", cres, type(cres))
875+
print("modified_arrays", modified_arrays)
844876
# print("cres.library", cres.library, type(cres.library))
845877
# print("cres.fndesc", cres.fndesc, type(cres.fndesc))
846878

@@ -908,7 +940,7 @@ def val_type_or_none(context, lowerer, x):
908940
"\n\tindex:", index)
909941

910942
dppy_cpu_lowerer.process_kernel_arg(var, llvm_arg, arg_type, gu_sig,
911-
val_type, index)
943+
val_type, index, modified_arrays)
912944
# -----------------------------------------------------------------------
913945

914946
# loadvars for loop_ranges

0 commit comments

Comments
 (0)