Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kernel-wrapper around NPP debayer calls #4486

Merged
merged 10 commits into from
Dec 5, 2022

Conversation

stiepan
Copy link
Member

@stiepan stiepan commented Dec 1, 2022

Signed-off-by: Kamil Tokarski ktokarski@nvidia.com

Category:

New feature (non-breaking change which adds functionality)

Description:

Adds a simple kernel that wraps calls to NPP bayer -> RGB functions together with debayer-specific enums.
Exposes the enum with bayer pattern in python so that it can be specified per-sample/per-frame easily.

Additional information:

This PR does not contain cpp tests - the NPP like bilinear debayer is different form opencv's so it requires a separate implemenation for the baseline. I've done it in Python and it will be tested with the operator. I don't see much value in implementing the baseline twice for, essentialy, testing the NPP.
The cpp tests check is done on simple gradient samples - one that should yield similar results irrespecitve of the concrete debayer algorithm used.

Affected modules and functionalities:

  • npp.h - set of helpers for calling npp functions
  • remap kernel - the creation and updates of the npp context are moved to reusable npp.h helpers. It seems that the flags for the context were set incorrectly before the workspace stream was set to npp context, this PR changes that.

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-3149

Sorry, something went wrong.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6649042]: BUILD STARTED

@stiepan stiepan marked this pull request as draft December 1, 2022 16:00
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6649042]: BUILD PASSED

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan stiepan marked this pull request as ready for review December 1, 2022 17:16
@stiepan
Copy link
Member Author

stiepan commented Dec 1, 2022

I decided to drop the idea of per-sample pattern as strings, as those become deficult if used per-frame. So the enum is now exposed in python.

namespace kernels {
namespace debayer {

enum class DALIBayerAlgorithm {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
enum class DALIBayerAlgorithm {
enum class DALIDebayerAlgorithm {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

namespace debayer {

enum class DALIBayerAlgorithm {
DALI_BAYER_BILINEAR_NPP = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DALI_BAYER_BILINEAR_NPP = 0
DALI_DEBAYER_BILINEAR_NPP = 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -574,6 +578,7 @@ DALI_REGISTER_TYPE(string, DALI_STRING);
DALI_REGISTER_TYPE(DALIImageType, DALI_IMAGE_TYPE);
DALI_REGISTER_TYPE(DALIDataType, DALI_DATA_TYPE);
DALI_REGISTER_TYPE(DALIInterpType, DALI_INTERP_TYPE);
DALI_REGISTER_TYPE(DALIColorFilter, DALI_COLOR_FILTER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really, really need this? I don't like adding stuff to types.h and, honestly, I'd very much love to remove other enums (like ImageType and InterpType) from here instead of adding more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is best we have for now. We need the ability to specify the one of four avialable options per-sample as tensor inputs. I've tried going around it with np.bytes_("bg") in external source, but 1. it requires "parsing" strings per-sample in each iteration, 2. is unfamiliar to users, 3. has limitations, for example, I don't see a way to create a tensor of strings that could be digested by DALI.

Copy link
Contributor

@mzient mzient Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the motivation for this is to allow the user to specify the pattern per sample, then the likeliest reason is so that you can demosaic a crop. In this case, dealing with blue (or red) position within a Bayer cell would be much easier to work with:

(blue coordinates in XY order)
color_filter=(0,0)  # BGGR
color_filter=(0,1)  # GRBG
color_filter=(1,0)  # GBRG
color_filter=(1,1)  # RGGB 

If you know your original layout, you can calculate the layout of the crop as:
(layout_x - crop_x)&1, (layout_y - crop_y)&1

Copy link
Member Author

@stiepan stiepan Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption I see in the requirements is more that the samples in a batch may come from different sources and thus have different patterns. I guess we could use a pair of integers to describe position within 2x2 tile if I get your idea correctly. However, both in case of opencv and npp those are enums, it seems like a more familiar way of doing so. For users who migrate from opencv, using the enum should be more straightforward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per slack disussion: Reverting the exposure of enum, because passing the enums per-sample is cumbersome (requires reinterpret op along the way),

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan
Copy link
Member Author

stiepan commented Dec 2, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6661813]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6661813]: BUILD FAILED

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan
Copy link
Member Author

stiepan commented Dec 2, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6662792]: BUILD STARTED

@@ -227,7 +213,7 @@ struct NppRemapKernel : public RemapKernel<Backend, T> {
}


NppStreamContext npp_ctx_{};
NppStreamContext npp_ctx_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This is quite risky and offers little, if any, performance benefits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has hidden a problem with the call CUDA_CALL(cudaStreamGetFlags(npp_ctx_.hStream, &npp_ctx_.nStreamFlags)); accessing hStream that has not been set. So I left a version that actually reported problems that stream is not set when it is not set.



protected:
NppStreamContext npp_ctx_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NppStreamContext npp_ctx_;
NppStreamContext npp_ctx_{};

or

Suggested change
NppStreamContext npp_ctx_;
NppStreamContext npp_ctx_{ cudaStream_t(-1), 0 };

if you want to make sure that the stream is invalid until reasonably initialized.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan stiepan force-pushed the debayer_npp_wrapper branch from 3a93bdd to 541b29d Compare December 2, 2022 16:23
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6662792]: BUILD PASSED

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan
Copy link
Member Author

stiepan commented Dec 5, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6681266]: BUILD STARTED

Comment on lines 105 to 111
for (int i = 0; i < 2; i++) {
for (int j = 0; j < 2; j++) {
bayer_sample.data[(h + i) * width + w + j] =
rgb_sample.data[(h + i) * width * num_channels + (w + j) * num_channels +
pattern2channel[static_cast<int>(pattern)][i][j]];
}
}
Copy link
Contributor

@mzient mzient Dec 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this is simpler? Your call.

Suggested change
for (int i = 0; i < 2; i++) {
for (int j = 0; j < 2; j++) {
bayer_sample.data[(h + i) * width + w + j] =
rgb_sample.data[(h + i) * width * num_channels + (w + j) * num_channels +
pattern2channel[static_cast<int>(pattern)][i][j]];
}
}
int i = y & 1;
int j = x & 1;
int c = pattern2channel[static_cast<int>(pattern)][i][j];
bayer_sample.data[h * width + w] =
rgb_sample.data[(h * width + w) * num_channels + c];

BTW, isn't num_channels always 3?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, it is simpler. As to the number of channels, it is. I made it static constexpr int num_channels = 3; to "track usages".

Copy link
Contributor

@mzient mzient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, you can apply the suggestion if CI hasn't been run yet.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan
Copy link
Member Author

stiepan commented Dec 5, 2022

!build

Comment on lines 87 to 92
std::array<std::array<std::array<int, 2>, 2>, 4> pattern2channel{{
{{{0, 1}, {1, 2}}}, // bggr -> rggb -> 0112
{{{1, 0}, {2, 1}}}, // gbrg -> grbg -> 1021
{{{1, 2}, {0, 1}}}, // grbg -> gbrg -> 1201
{{{2, 1}, {1, 0}}} // rggb -> bggr -> 2110
}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the braces are mismatched. Plain C array would be easier to read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::array<std::array<std::array<int, 2>, 2>, 4> pattern2channel{{
{{{0, 1}, {1, 2}}}, // bggr -> rggb -> 0112
{{{1, 0}, {2, 1}}}, // gbrg -> grbg -> 1021
{{{1, 2}, {0, 1}}}, // grbg -> gbrg -> 1201
{{{2, 1}, {1, 0}}} // rggb -> bggr -> 2110
}};
int pattern2channel[4][2][2] = {
{{0, 1}, {1, 2}}, // bggr -> rggb -> 0112
{{1, 0}, {2, 1}}, // gbrg -> grbg -> 1021
{{1, 2}, {0, 1}}, // grbg -> gbrg -> 1201
{{2, 1}, {1, 0}} // rggb -> bggr -> 2110
};

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6681553]: BUILD STARTED

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan
Copy link
Member Author

stiepan commented Dec 5, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6681606]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [6681606]: BUILD PASSED

@stiepan stiepan merged commit b47f07c into NVIDIA:main Dec 5, 2022
@JanuszL JanuszL mentioned this pull request Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants