-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Laplacian GPU operator #3644
Conversation
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
!build |
CI MESSAGE: [3826306]: BUILD STARTED |
CI MESSAGE: [3826306]: BUILD PASSED |
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@@ -0,0 +1,125 @@ | |||
// Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved from laplacian_params.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not 1:1 copy though :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the hindsight - it is the worst kind of moving the code around.
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
!build |
CI MESSAGE: [3832078]: BUILD STARTED |
constexpr static const int maxWindowSize = 23; | ||
|
||
template <typename T> | ||
class LaplacianWindows { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to laplacian_windows.h
CI MESSAGE: [3832078]: BUILD PASSED |
CI MESSAGE: [3840048]: BUILD STARTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok, few comments.
Please check the test running time and the compilation time.
@@ -29,13 +29,20 @@ namespace dali { | |||
#define LAPLACIAN_CPU_SUPPORTED_TYPES \ | |||
(uint8_t, int8_t, uint16_t, int16_t, uint32_t, int32_t, uint64_t, int64_t, float16, float) | |||
|
|||
// TODO(klecki): float16 support - it's not easily compatible with float window, | |||
// need to introduce some cast in between and expose it in the kernels | |||
// #define LAPLACIAN_GPU_SUPPORTED_TYPES (float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be an old comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I purposely copied that part as it seems to apply here as well as the op is based on the gpu convolution. I thought it would be easier to navigate, but maybe it is unnecessary or irrelevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the // #define LAPLACIAN_GPU_SUPPORTED_TYPES (float)
part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, obviously. I did not notice that even when pointed to. Thanks :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
op_impl_uptr GetLaplacianGpuImpl(const OpSpec& spec, | ||
const DimDesc& dim_desc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: weird formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
BOOL_SWITCH(dim_desc.is_channel_last(), HasChannels, ( | ||
BOOL_SWITCH(dim_desc.is_sequence(), IsSeq, ( | ||
using LaplacianImpl = LaplacianOpGpu<Out, In, Axes, HasChannels, IsSeq>; | ||
return std::make_unique<LaplacianImpl>(spec, dim_desc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my recollection using make_unique in code that already has layers of templates resulted in super slow compilation times.
That's why for gaussian and arithm ops I used
std::unique_ptr<OpImplBase<GPUBackend>> result;
SWITCH( ...) (
result.reset(new OpGpu<...>(...));
);
Can you check it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I remember wondering why the reset method. I'll check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* to allow for parallel compilation of underlying kernels. | ||
*/ | ||
template <typename Out, typename In> | ||
op_impl_uptr GetLaplacianGpuImpl(const OpSpec& spec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you are replicating a pattern from Gaussian Blur, you might want to take a look into the reasoning in: #3472 to have a GSG-style pass-by-pointer here. I'm just posting this, but I'm not sure I agree with the reasoning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
std::array<bool, axes> has_smoothing = uniform_array<axes>(false); | ||
for (int sample_idx = 0; sample_idx < nsamples; sample_idx++) { | ||
const auto& window_sizes = args.GetWindowSizes(sample_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LaplacianArgs prohibits from smoothing one sample and not smoothing other sample or is it allowed?
Can we have some weird case where smoothing window would have a 1D size for one sample and be empty for other? Or is it still 1D with {0} size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no empty windows and the smallest window size is 1 (that corresponds to a window [1]). If for given partial derivative all samples don't require smoothing the whole list of windows is empty. If some samples require smoothing and some don't, those that don't will be convolved with [1].
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
bf9074d
to
51c6bb4
Compare
!build |
CI MESSAGE: [3862472]: BUILD STARTED |
CI MESSAGE: [3862472]: BUILD PASSED |
* Add Laplacian GPU operator * Move LaplacianWindows to kernels * Add slow attr to some of Laplacian Python tests Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add Laplacian GPU operator * Move LaplacianWindows to kernels * Add slow attr to some of Laplacian Python tests Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add Laplacian GPU operator * Move LaplacianWindows to kernels * Add slow attr to some of Laplacian Python tests Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Category:
New feature (non-breaking change which adds functionality)
Description:
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Don't be discouraged with the number of added lines, 300 of them is boilerplate for splitting the instantiation of op impl in separate files.
Checklist
Tests
Documentation
DALI team only
Requirements
REQ IDs: LAPL.01 - LAPL.17
JIRA TASK: DALI-2438