Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocAL - Python and Pybind Changes #9

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions rocAL/rocAL_pybind/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -95,18 +95,21 @@ endif()
if(${BUILD_ROCAL_PYBIND})

link_directories(${ROCM_PATH}/lib)
include_directories(${AMDRPP_INCLUDE_DIRS})
include_directories(../rocAL/include/api/

include_directories(../rocAL/include/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove additional line at L98

../rocAL/include/api/
../rocAL/include/pipeline/
../rocAL/include/device/
third_party_lib/
../../amd_openvx/openvx/include/)

add_subdirectory(third_party_lib/pybind11)

pybind11_add_module(rocal_pybind rocal_pybind.cpp)
target_link_libraries(rocal_pybind PRIVATE ${AMDRPP_LIBRARIES} vx_rpp rocal)
target_link_libraries(rocal_pybind PRIVATE rocal vx_rpp amd_rpp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if this change is intended

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be same as before only right?

message("-- ${White}rocal_pybind -- CMAKE_CXX_FLAGS:${CMAKE_CXX_FLAGS}${ColourReset}")
install(TARGETS rocal_pybind DESTINATION lib)
message("-- ${Green}ROCm Augmentation Library Python Binding - rocal_pybind module added ${ColourReset}")
message("-- ${Green}Radeon Augmentation Library Python Binding - rocal_pybind module added ${ColourReset}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change Radeon to ROCm

else()
message("-- ${Red}WARNING: rocAL Pybind module excluded ${ColourReset}")
endif()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is included?
The pycache files also should be in PR?

Binary file not shown.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if this file needs to be in the PR

Binary file not shown.
10 changes: 5 additions & 5 deletions rocAL/rocAL_pybind/amd/rocal/decoders.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,9 @@ def image(*inputs, user_feature_key_map=None, path='', file_root='', annotations
"shuffle": random_shuffle,
"loop": False,
"decode_size_policy": decode_size_policy,
"max_width": max_decoded_width,
"max_height": max_decoded_height,
"dec_type": decoder_type}
"max_width": max_decoded_width, # 1024
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if this comment is needed ROCm#1024

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these comments in the 3 lines

"max_height": max_decoded_height, # 1024
"dec_type": decoder_type} # USER_GIVEN_SIZE_ORIG
decoded_image = b.COCO_ImageDecoderShard(Pipeline._current_pipeline._handle, *(kwargs_pybind.values()))

elif (reader == "TFRecordReaderClassification" or reader == "TFRecordReaderDetection"):
Expand Down Expand Up @@ -164,8 +164,6 @@ def image_random_crop(*inputs, user_feature_key_map=None, path='', file_root='',
"color_format": output_type,
"num_shards": num_shards,
'is_output': False,
"user_key_for_encoded": user_feature_key_map["image/encoded"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are this lines removed?

"user_key_for_filename": user_feature_key_map["image/filename"],
"shuffle": random_shuffle,
"loop": False,
"decode_size_policy": decode_size_policy,
Expand Down Expand Up @@ -303,3 +301,5 @@ def image_slice(*inputs, file_root='', path='', annotations_file='', shard_id=0,
"max_height": max_decoded_height}
image_decoder_slice = b.FusedDecoderCropShard(Pipeline._current_pipeline._handle, *(kwargs_pybind.values()))
return (image_decoder_slice)


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove additional spaces at L304, L305

532 changes: 68 additions & 464 deletions rocAL/rocAL_pybind/amd/rocal/fn.py

Large diffs are not rendered by default.

Empty file.
47 changes: 43 additions & 4 deletions rocAL/rocAL_pybind/amd/rocal/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ class Pipeline(object):
_handle = None
_current_pipeline = None

def __init__(self, batch_size=-1, num_threads=-1, device_id=-1, seed=1,
def __init__(self, batch_size=-1, num_threads=-1, device_id=-1, seed=-1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this seed=-1 intentional if not change whatever is in TOT

exec_pipelined=True, prefetch_queue_depth=2,
exec_async=True, bytes_per_sample=0,
rocal_cpu=False, max_streams=-1, default_cuda_stream_priority=0, tensor_layout = types.NCHW, reverse_channels = False, multiplier = [1.0,1.0,1.0], offset = [0.0, 0.0, 0.0], tensor_dtype=types.FLOAT):
Expand Down Expand Up @@ -137,6 +137,7 @@ def __init__(self, batch_size=-1, num_threads=-1, device_id=-1, seed=1,
self._name = None
self._anchors = None
self._BoxEncoder = None
self._BoxIOUMatcher = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove BoxIOUMatcher as part of this PR

self._encode_tensor = None
self._numOfClasses = None
self._oneHotEncoding = False
Expand All @@ -155,7 +156,7 @@ def build(self):
exit(0)
return self

def run(self):
def rocalRun(self):
""" Run the pipeline using rocalRun call
"""
status = b.rocalRun(self._handle)
Expand Down Expand Up @@ -229,7 +230,7 @@ def GetOneHotEncodedLabels(self, array, device):

def set_outputs(self, *output_list):
self._output_list_length = len(output_list)
b.setOutputImages(self._handle,len(output_list),output_list)
b.setOutputImages(self._handle, len(output_list), output_list)

def __enter__(self):
Pipeline._current_pipeline = self
Expand Down Expand Up @@ -329,6 +330,44 @@ def isEmpty(self):

def Timing_Info(self):
return b.getTimingInfo(self._handle)

def rocalGetImageLabels(self):
return b.rocalGetImageLabels(self._handle)

def rocalGetBoundingBoxLabel(self):
return b.rocalGetBoundingBoxLabel(self._handle)

def rocalGetBoundingBoxCords(self):
return b.rocalGetBoundingBoxCords(self._handle)

def rocalGetBoundingBoxCount(self):
return b.rocalGetBoundingBoxCount(self._handle)

def rocalGetMatchedIndices(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this as part of this PR

return b.rocalGetMatchedIndices(self._handle)

def copy_out_data_ptr(self, data_ptr):
return b.copy_data_ptr(self._handle, data_ptr)

def rocalGetOutputTensors(self):
return b.rocalGetOutputTensors(self._handle)

def run(self):
"""
It rises StopIteration if data set reached its end.
return:
:return:
A list of `rocalTensorList` objects for respective pipeline outputs.
"""
try:
print("getRemainingImages :", self.getRemainingImages())
if self.getRemainingImages() > 0:
self.rocalRun()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give better names instead of rocalRun(), it is slightly misleading here

return b.rocalGetOutputTensors(self._handle)
except:
print("Raise stop iter")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is print required here?

raise StopIteration


def _discriminate_args(func, **func_kwargs):
"""Split args on those applicable to Pipeline constructor and the decorated function."""
Expand Down Expand Up @@ -459,4 +498,4 @@ def create_pipeline(*args, **kwargs):
create_pipeline._is_pipeline_def = True
return create_pipeline

return actual_decorator(fn) if fn else actual_decorator
return actual_decorator(fn) if fn else actual_decorator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add new line at EOF

120 changes: 63 additions & 57 deletions rocAL/rocAL_pybind/amd/rocal/plugin/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,33 +22,36 @@
import numpy as np
import rocal_pybind as b
import amd.rocal.types as types
import ctypes

class ROCALGenericImageIterator(object):
def __init__(self, pipeline):
self.loader = pipeline
self.w = b.getOutputWidth(self.loader._handle)
self.h = b.getOutputHeight(self.loader._handle)
self.n = b.getOutputImageCount(self.loader._handle)
color_format = b.getOutputColorFormat(self.loader._handle)
self.p = (1 if (color_format == int(types.GRAY)) else 3)
height = self.h*self.n
self.out_tensor = None
self.out_bbox = None
self.out_image = np.zeros((height, self.w, self.p), dtype = "uint8")
self.bs = pipeline._batch_size
self.batch_size = pipeline._batch_size
self.out = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename it as self.out_tensor like before


def next(self):
return self.__next__()

def __next__(self):

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove empty line here

if(self.loader.isEmpty()):
raise StopIteration

if self.loader.run() != 0:
raise StopIteration

self.loader.copyImage(self.out_image)
return self.out_image , self.out_tensor
else:
self.output_tensor_list = self.loader.rocalGetOutputTensors()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.loader.run() is supposed to run and give the output tensor right?
Why are we calling that and in else part again calling the getOutputTensors?

I am not entirely getting this if else part


self.augmentation_count = len(self.output_tensor_list)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need augmentation count if it is not used?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the other variables may I know why we are defining them here?

self.w = self.output_tensor_list[0].batch_width() if self.w is None else self.w
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that these variables are checked if it is None, but do we not need to init them to None before checking?

self.h = self.output_tensor_list[0].batch_height() if self.h is None else self.h
self.batch_size = self.output_tensor_list[0].batch_size() if self.batch_size is None else self.batch_size
self.color_format = self.output_tensor_list[0].color_format() if self.color_format is None else self.color_format
self.output_tensor_list[0].copy_data(ctypes.c_void_p(self.out.data_ptr()))

return self.out

def reset(self):
b.rocalResetLoaders(self.loader._handle)
Expand All @@ -68,48 +71,13 @@ def __init__(self, pipeline, tensor_layout = types.NCHW, reverse_channels = Fals
self.reverse_channels = reverse_channels
self.tensor_dtype = tensor_dtype
self.display = display
self.w = b.getOutputWidth(self.loader._handle)
self.h = b.getOutputHeight(self.loader._handle)
self.n = b.getOutputImageCount(self.loader._handle)
self.bs = pipeline._batch_size
if self.loader._name is None:
self.loader._name= self.loader._reader
color_format = b.getOutputColorFormat(self.loader._handle)
self.p = (1 if (color_format == int(types.GRAY)) else 3)
self.labels_size = ((self.bs*self.loader._numOfClasses) if (self.loader._oneHotEncoding == True) else self.bs)
if tensor_layout == types.NCHW:
if self.device == "cpu":
if self.tensor_dtype == types.FLOAT:
self.out = np.empty((self.bs*self.n, self.p, int(self.h/self.bs), self.w,), dtype=np.float32)
elif self.tensor_dtype == types.FLOAT16:
self.out = np.empty((self.bs*self.n, self.p, int(self.h/self.bs), self.w,), dtype=np.float16)
self.labels = np.empty(self.labels_size, dtype = np.int32)

else:
with cp.cuda.Device(device=self.device_id):
if self.tensor_dtype == types.FLOAT:
self.out = cp.empty((self.bs*self.n, self.p, int(self.h/self.bs), self.w,), dtype=cp.float32)
elif self.tensor_dtype == types.FLOAT16:
self.out = cp.empty((self.bs*self.n, self.p, int(self.h/self.bs), self.w,), dtype=cp.float16)
self.labels = cp.empty(self.labels_size, dtype = cp.int32)

else: #NHWC
if self.device == "cpu":
if self.tensor_dtype == types.FLOAT:
self.out = np.empty((self.bs*self.n, int(self.h/self.bs), self.w, self.p), dtype=np.float32)
elif self.tensor_dtype == types.FLOAT16:
self.out = np.empty((self.bs*self.n, int(self.h/self.bs), self.w, self.p), dtype=np.float16)
self.labels = np.empty(self.labels_size, dtype = np.int32)

else:
with cp.cuda.Device(device=self.device_id):
if self.tensor_dtype == types.FLOAT:
self.out = cp.empty((self.bs*self.n, int(self.h/self.bs), self.w, self.p), dtype=cp.float32)
elif self.tensor_dtype == types.FLOAT16:
self.out = cp.empty((self.bs*self.n, int(self.h/self.bs), self.w, self.p), dtype=cp.float16)
self.labels = cp.empty(self.labels_size, dtype = cp.int32)


self.out = None
if self.bs != 0:
self.len = b.getRemainingImages(self.loader._handle)//self.bs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I know why we are dividing by bs and else we are not?

else:
Expand All @@ -119,27 +87,65 @@ def next(self):
return self.__next__()

def __next__(self):

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove additional empty line

if(b.isEmpty(self.loader._handle)):
raise StopIteration

if self.loader.run() != 0:
raise StopIteration

if(types.NCHW == self.tensor_format):
self.loader.copyToTensorNCHW(self.out, self.multiplier, self.offset, self.reverse_channels, int(self.tensor_dtype))
else:
self.loader.copyToTensorNHWC(self.out, self.multiplier, self.offset, self.reverse_channels, int(self.tensor_dtype))

self.output_tensor_list =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement seems incomplete


self.augmentation_count = len(self.output_tensor_list)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Augmentation count not used

self.w = self.output_tensor_list[0].batch_width() if self.w is None else self.w
self.h = self.output_tensor_list[0].batch_height() if self.h is None else self.h
self.batch_size = self.output_tensor_list[0].batch_size() if self.batch_size is None else self.batch_size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.batch_size - Not used by this iterator

self.color_format = self.output_tensor_list[0].color_format() if self.color_format is None else self.color_format

if self.out is None:
if self.tensor_layout == types.NCHW:
if self.device == "cpu":
if self.tensor_dtype == types.FLOAT:
self.out = np.empty((self.bs*self.n, self.p, int(self.h/self.bs), self.w,), dtype=np.float32)
Copy link
Collaborator

@sampath1117 sampath1117 Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove additional , after self.w
You can have if statement for datatype outside the NCHW and NHWC if condition like below

tensor_dtype = []
if self.tensor_dtype == types.FLOAT:
   tensor_dtype = np.float32
elif self.tensor_dtype == types.FLOAT16:
  tensor_dtype = np.float16

and use as below
self.out = np.empty((self.bs*self.n, self.p, int(self.h/self.bs), self.w,), dtype=tensor_dtype)

This will reduce multiple branch statements

elif self.tensor_dtype == types.FLOAT16:
self.out = np.empty((self.bs*self.n, self.p, int(self.h/self.bs), self.w,), dtype=np.float16)
self.labels = np.empty(self.labels_size, dtype = np.int32)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove additional space

else:
with cp.cuda.Device(device=self.device_id):
if self.tensor_dtype == types.FLOAT:
self.out = cp.empty((self.bs*self.n, self.p, int(self.h/self.bs), self.w,), dtype=cp.float32)
elif self.tensor_dtype == types.FLOAT16:
self.out = cp.empty((self.bs*self.n, self.p, int(self.h/self.bs), self.w,), dtype=cp.float16)
self.labels = cp.empty(self.labels_size, dtype = cp.int32)

else: #NHWC
if self.device == "cpu":
if self.tensor_dtype == types.FLOAT:
self.out = np.empty((self.bs*self.n, int(self.h/self.bs), self.w, self.p), dtype=np.float32)
elif self.tensor_dtype == types.FLOAT16:
self.out = np.empty((self.bs*self.n, int(self.h/self.bs), self.w, self.p), dtype=np.float16)
self.labels = np.empty(self.labels_size, dtype = np.int32)

else:
with cp.cuda.Device(device=self.device_id):
if self.tensor_dtype == types.FLOAT:
self.out = cp.empty((self.bs*self.n, int(self.h/self.bs), self.w, self.p), dtype=cp.float32)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename self.p to self.channels

elif self.tensor_dtype == types.FLOAT16:
self.out = cp.empty((self.bs*self.n, int(self.h/self.bs), self.w, self.p), dtype=cp.float16)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all these instances I could see that you have created a unified tensor for all the outputs together
i.e if we have set outputs for 2 augmentations you have created a tensor N*2, H, W, C and using it

But in the tensor pipeline, every output can have different data type, every output can have different W and H values so we need to accomodate that

self.output_tensor_list[0].copy_data is only going to copy the 1st output set
self.output_tensor_list[1].copy_data would copy the second outputs

So we may need to add a for loop based on the number of outputs

self.labels = cp.empty(self.labels_size, dtype = cp.int32)

self.output_tensor_list[0].copy_data(ctypes.c_void_p(self.out.data_ptr()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a copy_data in L52
self.output_tensor_list[0].copy_data(ctypes.c_void_p(self.out.data_ptr()))
why again one more is copy_data call is present


if(self.loader._name == "labelReader"):
if(self.loader._oneHotEncoding == True):
self.loader.GetOneHotEncodedLabels(self.labels, self.device)
self.labels_tensor = self.labels.reshape(-1, self.bs, self.loader._numOfClasses)
print("Support for OneHotLabels not given yet")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the external PR we can retain this API call and resolve the issue with OneHostImageLabels in the backend
@shobana-mcw your thoughts please

exit(0)
else:
if self.display:
for i in range(self.bs):
img = (self.out)
draw_patches(img[i], i, 0)
self.loader.getImageLabels(self.labels)
self.labels = self.loader.rocalGetImageLabels()
if self.device == "cpu":
self.labels_tensor = self.labels.astype(dtype=np.int_)
else:
Expand Down
Loading