Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1198 efficient differentiable bilateral filter #1375

Merged
merged 57 commits into from
Dec 18, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
e382b3f
Initial cpu implementation
charliebudd Nov 11, 2020
a04bcc6
hue speed increase moving away from torch tensors
charliebudd Nov 11, 2020
80ced0e
allowing generic channel dimension size
charliebudd Nov 11, 2020
3b98676
allowing generic dimensionality
charliebudd Nov 12, 2020
ecdf753
manual testing script for bilateral filter
charliebudd Nov 12, 2020
98f9940
Initial commit of CRF using permutohedral lattice to optimise message…
charliebudd Oct 29, 2020
193c8d1
temporary example scripts demonstrating the usage of crf and phl, and…
charliebudd Oct 29, 2020
19e2c1c
alternate phl implementation using a more efficient cuda hashtable, c…
charliebudd Oct 29, 2020
eecd36c
c++ cpu permutohedral lattice implementation
charliebudd Nov 12, 2020
f534b52
comparison script
charliebudd Nov 12, 2020
abfb80f
REVERSE THIS COMMIT
charliebudd Nov 12, 2020
3c97af1
cleaning
charliebudd Nov 16, 2020
7bcce81
backwards pass
charliebudd Nov 16, 2020
ab85503
initial cuda brute force kernel
charliebudd Nov 18, 2020
b0097b4
reverting setup.py
charliebudd Nov 18, 2020
a52f496
working cuda kernel for brute force 2d kernel
charliebudd Nov 20, 2020
5f7304c
fixing unresolved symbols when compiling without cuda
charliebudd Nov 20, 2020
9c19f48
removing macros file
charliebudd Nov 23, 2020
a025f57
placeholder cuda phl implementation
charliebudd Nov 23, 2020
24a7b9b
changing test case color sigma
charliebudd Nov 23, 2020
32eb18b
Initial import of reference cuda phl implementation
charliebudd Nov 23, 2020
566c43f
fixing cuda kernel color weight error
charliebudd Nov 23, 2020
3a8add5
initial edits to cuda phl code
charliebudd Nov 24, 2020
3276ee6
fixing errors in cuda phl implementation
charliebudd Nov 25, 2020
72b8d35
removing logging from cuda phl
charliebudd Nov 25, 2020
c225505
using template data_ptr function
charliebudd Nov 26, 2020
cc6bd4a
fixed cpu phl implementation
charliebudd Nov 26, 2020
1d44988
updating cpu phl to run on arbitrary input dimensions and channles
charliebudd Nov 27, 2020
9c4a4a3
2d and 3d testing scripts
charliebudd Nov 27, 2020
409aafb
Generalising permutohedral implementation to remove independance on w…
charliebudd Dec 2, 2020
9b5948c
removing width and height from permutohedral filter function and impl…
charliebudd Dec 2, 2020
7c01d99
fixed typo
charliebudd Dec 2, 2020
d5b19a4
exteneding 3d testing script
charliebudd Dec 2, 2020
698dce1
generalising bruteforce cuda implementation
charliebudd Dec 2, 2020
a0e4975
updating testing scripts
charliebudd Dec 2, 2020
a395723
file organisiation
charliebudd Dec 3, 2020
0ada661
fixing weight error in cpu bruteforce implementation, also inlineing …
charliebudd Dec 3, 2020
0724e5c
some refactoring and introducing proper batch handling
charliebudd Dec 4, 2020
54d7a4a
fixing indexing error at border
charliebudd Dec 7, 2020
0b27227
fixing some artifacts in cuda phl for high color sigmas
charliebudd Dec 7, 2020
133ede9
fixing gaussian kernel function for cpu and cuda bruteforce
charliebudd Dec 8, 2020
77f2b4a
ensuring kernel is an odd numbered size
charliebudd Dec 9, 2020
523003b
adding tests for precised implementation
charliebudd Dec 9, 2020
2c8d1b7
adding approximate imlpementation test
charliebudd Dec 17, 2020
c5dfbd1
templating implementations based on scalar_type
charliebudd Dec 17, 2020
80677c8
cleaning up
charliebudd Dec 17, 2020
20af55b
Merge branch 'master' of https://github.com/charliebudd/MONAI into 11…
charliebudd Dec 17, 2020
e526ce8
code formatting
charliebudd Dec 17, 2020
1ca9762
removing half precision implementation for permutohedral due to error…
charliebudd Dec 18, 2020
e12c49f
skipping cuda tests if cuda missing
charliebudd Dec 18, 2020
3018c6a
Merge branch 'master' into 1198-efficient-differentiable-bilateral-fi…
charliebudd Dec 18, 2020
67baedd
reformating
charliebudd Dec 18, 2020
3014534
adding unit test skip based on cpp extention availablility
charliebudd Dec 18, 2020
83d1d03
removing unused import
charliebudd Dec 18, 2020
978a80a
changing to use of THCatomic add
charliebudd Dec 18, 2020
39b007e
adding missing licenses
charliebudd Dec 18, 2020
5b557bf
clang reformat
charliebudd Dec 18, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/source/networks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,11 @@ Layers
~~~~~~~~~~~~~~~~
.. autoclass:: GaussianFilter
:members:

`BilateralFilter`
~~~~~~~~~~~~~~~~~
.. autoclass:: BilateralFilter
:members:

`HilbertTransform`
~~~~~~~~~~~~~~~~~~
Expand Down
5 changes: 5 additions & 0 deletions monai/csrc/ext.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,16 @@ limitations under the License.
*/

#include <torch/extension.h>

#include "filtering/filtering.h"
#include "lltm/lltm.h"
#include "resample/pushpull.h"
#include "utils/resample_utils.h"

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
// filtering
m.def("bilateral_filter", &BilateralFilter, "Bilateral Filter");

// lltm
m.def("lltm_forward", &lltm_forward, "LLTM forward");
m.def("lltm_backward", &lltm_backward, "LLTM backward");
Expand Down
42 changes: 42 additions & 0 deletions monai/csrc/filtering/bilateral/bilateral.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
/*
Copyright 2020 MONAI Consortium
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

#pragma once

#include <torch/extension.h>
#include "utils/common_utils.h"

torch::Tensor BilateralFilterCpu(torch::Tensor input, float spatial_sigma, float color_sigma);
torch::Tensor BilateralFilterPHLCpu(torch::Tensor input, float spatial_sigma, float color_sigma);

#ifdef WITH_CUDA
torch::Tensor BilateralFilterCuda(torch::Tensor input, float spatial_sigma, float color_sigma);
torch::Tensor BilateralFilterPHLCuda(torch::Tensor input, float spatial_sigma, float color_sigma);
#endif

torch::Tensor BilateralFilter(torch::Tensor input, float spatial_sigma, float color_sigma, bool usePHL) {
torch::Tensor (*filterFunction)(torch::Tensor, float, float);

#ifdef WITH_CUDA
if (torch::cuda::is_available() && input.is_cuda()) {
CHECK_CONTIGUOUS_CUDA(input);
filterFunction = usePHL ? &BilateralFilterPHLCuda : &BilateralFilterCuda;
} else {
filterFunction = usePHL ? &BilateralFilterPHLCpu : &BilateralFilterCpu;
}
#else
filterFunction = usePHL ? &BilateralFilterPHLCpu : &BilateralFilterCpu;
#endif

return filterFunction(input, spatial_sigma, color_sigma);
}
167 changes: 167 additions & 0 deletions monai/csrc/filtering/bilateral/bilateralfilter_cpu.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
/*
Copyright 2020 MONAI Consortium
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

#include <math.h>
#include <torch/extension.h>

#include "utils/tensor_description.h"

struct Indexer {
public:
Indexer(int dimensions, int* sizes) {
m_dimensions = dimensions;
m_sizes = sizes;
m_index = new int[dimensions]{0};
}

bool operator++(int) {
for (int i = 0; i < m_dimensions; i++) {
m_index[i] += 1;

if (m_index[i] < m_sizes[i]) {
return true;
} else {
m_index[i] = 0;
}
}

return false;
}

int& operator[](int dimensionIndex) {
return m_index[dimensionIndex];
}

private:
int m_dimensions;
int* m_sizes;
int* m_index;
};

template <typename scalar_t>
void BilateralFilterCpu(torch::Tensor inputTensor, torch::Tensor outputTensor, float spatialSigma, float colorSigma) {
// Getting tensor description.
TensorDescription desc = TensorDescription(inputTensor);

// Raw tensor data pointers.
scalar_t* inputTensorData = inputTensor.data_ptr<scalar_t>();
scalar_t* outputTensorData = outputTensor.data_ptr<scalar_t>();

// Pre-calculate common values
int windowSize = (int)ceil(5.0f * spatialSigma) | 1; // ORing last bit to ensure odd window size
int halfWindowSize = floor(0.5f * windowSize);
scalar_t spatialExpConstant = -1.0f / (2 * spatialSigma * spatialSigma);
scalar_t colorExpConstant = -1.0f / (2 * colorSigma * colorSigma);

// Kernel sizes.
int* kernelSizes = new int[desc.dimensions];

for (int i = 0; i < desc.dimensions; i++) {
kernelSizes[i] = windowSize;
}

// Pre-calculate gaussian kernel in 1D.
scalar_t* gaussianKernel = new scalar_t[windowSize];

for (int i = 0; i < windowSize; i++) {
int distance = i - halfWindowSize;
gaussianKernel[i] = exp(distance * distance * spatialExpConstant);
}

// Kernel aggregates used to calculate
// the output value.
scalar_t* valueSum = new scalar_t[desc.channelCount];
scalar_t weightSum = 0;

// Looping over the batches
for (int b = 0; b < desc.batchCount; b++) {
int batchOffset = b * desc.batchStride;

// Looping over all dimensions for the home element
Indexer homeIndex = Indexer(desc.dimensions, desc.sizes);
do // while(homeIndex++)
{
// Calculating indexing offset for the home element
int homeOffset = batchOffset;

for (int i = 0; i < desc.dimensions; i++) {
homeOffset += homeIndex[i] * desc.strides[i];
}

// Zero kernel aggregates.
for (int i = 0; i < desc.channelCount; i++) {
valueSum[i] = 0;
}

weightSum = 0.0f;

// Looping over all dimensions for the neighbour element
Indexer kernelIndex = Indexer(desc.dimensions, kernelSizes);
do // while(kernelIndex++)
{
// Calculating buffer offset for the neighbour element
// Index is clamped to the border in each dimension.
int neighbourOffset = batchOffset;

for (int i = 0; i < desc.dimensions; i++) {
int neighbourIndex = homeIndex[i] + kernelIndex[i] - halfWindowSize;
int neighbourIndexClamped = std::min(desc.sizes[i] - 1, std::max(0, neighbourIndex));
neighbourOffset += neighbourIndexClamped * desc.strides[i];
}

// Euclidean color distance.
scalar_t colorDistanceSquared = 0;

for (int i = 0; i < desc.channelCount; i++) {
scalar_t diff = inputTensorData[homeOffset + i * desc.channelStride] -
inputTensorData[neighbourOffset + i * desc.channelStride];
colorDistanceSquared += diff * diff;
}

// Calculating and combining the spatial
// and color weights.
scalar_t spatialWeight = 1;

for (int i = 0; i < desc.dimensions; i++) {
spatialWeight *= gaussianKernel[kernelIndex[i]];
}

scalar_t colorWeight = exp(colorDistanceSquared * colorExpConstant);
scalar_t totalWeight = spatialWeight * colorWeight;

// Aggregating values.
for (int i = 0; i < desc.channelCount; i++) {
valueSum[i] += inputTensorData[neighbourOffset + i * desc.channelStride] * totalWeight;
}

weightSum += totalWeight;
} while (kernelIndex++);

for (int i = 0; i < desc.channelCount; i++) {
outputTensorData[homeOffset + i * desc.channelStride] = valueSum[i] / weightSum;
}
} while (homeIndex++);
}
}

torch::Tensor BilateralFilterCpu(torch::Tensor inputTensor, float spatialSigma, float colorSigma) {
// Preparing output tensor.
torch::Tensor outputTensor = torch::zeros_like(inputTensor);

AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputTensor.type(), "BilateralFilterCpu", ([&] {
BilateralFilterCpu<scalar_t>(
inputTensor, outputTensor, spatialSigma, colorSigma);
}));

return outputTensor;
}
89 changes: 89 additions & 0 deletions monai/csrc/filtering/bilateral/bilateralfilter_cpu_phl.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
/*
Copyright 2020 MONAI Consortium
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

#include <torch/extension.h>

#include "filtering/permutohedral/permutohedral.h"
#include "utils/tensor_description.h"

template <typename scalar_t>
void BilateralFilterPHLCpu(
torch::Tensor inputTensor,
torch::Tensor outputTensor,
float spatialSigma,
float colorSigma) {
// Getting tensor description.
TensorDescription desc = TensorDescription(inputTensor);

int featureChannels = desc.channelCount + desc.dimensions;

// Preparing memory
scalar_t* inputTensorData = inputTensor.data_ptr<scalar_t>();
scalar_t* outputTensorData = outputTensor.data_ptr<scalar_t>();
scalar_t* data = new scalar_t[desc.channelStride * desc.channelCount];
scalar_t* features = new scalar_t[desc.channelStride * featureChannels];

// Precalculating inverse sigmas
float invSpatialSigma = 1.0f / spatialSigma;
float invColorSigma = 1.0f / colorSigma;

// Looping over batches
for (int b = 0; b < desc.batchCount; b++) {
int batchOffset = b * desc.batchStride;

// Creating features (also permuting input data to be channel last. Permutohedral
// implementation should be changed to channel first to avoid this)
for (int i = 0; i < desc.channelStride; i++) {
// Color features (and permutation)
for (int c = 0; c < desc.channelCount; c++) {
features[i * featureChannels + c] = invColorSigma * inputTensorData[batchOffset + i + c * desc.channelStride];
data[i * desc.channelCount + c] = inputTensorData[batchOffset + i + c * desc.channelStride];
}

// Spatial features
int offsetRemanider = i;

for (int d = 0; d < desc.dimensions; d++) {
int coord = offsetRemanider / desc.strides[d];
offsetRemanider -= coord * desc.strides[d];

features[i * featureChannels + desc.channelCount + d] = invSpatialSigma * coord;
}
}

// Filtering data with respect to the features.
scalar_t* output =
PermutohedralCPU<scalar_t>(data, features, desc.channelCount, featureChannels, desc.channelStride);

// Writing output tensor.
for (int i = 0; i < desc.channelStride; i++) {
for (int c = 0; c < desc.channelCount; c++) {
outputTensorData[batchOffset + i + c * desc.channelStride] = output[i * desc.channelCount + c];
}
}
}

delete[] data;
delete[] features;
}

// Function to choose template implementation based on dynamic, channels and dimensions
torch::Tensor BilateralFilterPHLCpu(torch::Tensor inputTensor, float spatialSigma, float colorSigma) {
torch::Tensor outputTensor = torch::zeros_like(inputTensor);

AT_DISPATCH_FLOATING_TYPES(inputTensor.type(), "BilateralFilterPhlCpu", ([&] {
BilateralFilterPHLCpu<scalar_t>(inputTensor, outputTensor, spatialSigma, colorSigma);
}));

return outputTensor;
}
Loading