Remove Cython dependency #74

carloshorn · 2020-08-06T15:25:33Z

It would be nice to replace correct_tsm_issue.mean_filter with a pure python implementation.

A possible solution is:

import numpy as np
import bottleneck as bn

def rolling_window(a, shape):  # rolling window for 2D array
    s = (a.shape[0] - shape[0] + 1,) + (a.shape[1] - shape[1] + 1,) + shape
    strides = a.strides + a.strides
    return np.lib.stride_tricks.as_strided(a, shape=s, strides=strides)

def std_filter(data, box_size):
    shape = (box_size, box_size)
    border = box_size // 2
    # need to surround the data with NaNs to calculate values at the boundary
    padded_data = np.pad(
        data, (border, border), 
        mode='constant', 
        constant_values=np.nan
    )
    windows = rolling_window(padded_data, shape)
    n3, n2, n1, n0 = windows.shape
    windows = windows.reshape((n3, n2, n1*n0))
    result = bn.nanstd(windows, axis=2)
    return result

Using bottleneck instead of the equivalent numpy function reduces the processing time to become competitive with Cython or numba.

Another possibility would be an implementation with numba:

import numpy as np
import numba as nb

@nb.njit
def std_filter_2(data, box_size):
    N = box_size // 2
    nrows, ncols = data.shape
    std = np.empty_like(data)
    for j in range(nrows):
        for i in range(ncols):
            total = 0.0
            count = 0
            jj_min = max(0, j-N)
            jj_max = min(nrows, j+N+1)
            ii_min = max(0, i-N)
            ii_max = min(ncols, i+N+1)
            for jj in range(jj_min, jj_max):
                for ii in range(ii_min, ii_max):
                    value = data[jj, ii]
                    if np.isnan(value):
                        continue
                    total += value
                    count += 1
            if count == 0:
                std[j, i] = np.nan
                continue
            mean = total / count
            diff2 = 0.0
            for jj in range(jj_min, jj_max):
                for ii in range(ii_min, ii_max):
                    value = data[jj, ii]
                    if np.isnan(value):
                        continue
                    diff2 += (value - mean)**2
            std[j, i] = np.sqrt(diff2 / count)
    return std

Edit: Slightly cleaner numba implementation.

The text was updated successfully, but these errors were encountered:

mraspaud · 2020-08-06T15:31:20Z

I find the numpy/bottleneck implementation cleaner.
Could you provide some numbers to get an idea of the performance?

carloshorn · 2020-08-06T21:09:30Z

Using pure numpy (1300 ms) is definitely too slow, because processing an entire file with pygac-run takes about 10 s. However, numpy + bottleneck (272 ms) is competitive with numba (115 ms), which should be at a similar speed as a Cython implementation.

mraspaud · 2020-08-06T21:22:30Z

That's really nice!

sfinkens · 2020-08-10T11:22:53Z

👍 for the numpy + bottleneck variant!

carloshorn · 2020-08-10T19:50:37Z

A clear plus for numpy + bottleneck is that both packages have a major version greater than one. People could argue that a stable release should be a minimum requirement for production code.
Maybe in future the numba stancil feature could be an elegant solution, but for today I would also vote for numpy + bottleneck.

djhoese · 2020-08-10T20:29:16Z

both packages have a major version greater than one.

This is not a good reason for choosing packages in the python ecosystem, especially the scientific python ecosystem. I've argued the point that Cython needs to go to version 1.0 in the past on other forums, but they had their own timeline for things. Regardless, it looks like Cython is working on a 3.0 release: https://pypi.org/project/Cython/#history (skipping 1.0 and 2.0)

carloshorn · 2020-08-11T08:06:40Z

@djhoese, that's true, good example for your point is the pandas library. It was the first choice for many things long before the release of 1.0 on January this year.

carloshorn mentioned this issue Aug 31, 2020

Replace cython filter #83

Merged

mraspaud closed this as completed in #83 Sep 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Cython dependency #74

Remove Cython dependency #74

carloshorn commented Aug 6, 2020 •

edited

Loading

mraspaud commented Aug 6, 2020

carloshorn commented Aug 6, 2020

mraspaud commented Aug 6, 2020

sfinkens commented Aug 10, 2020

carloshorn commented Aug 10, 2020

djhoese commented Aug 10, 2020

carloshorn commented Aug 11, 2020

Remove Cython dependency #74

Remove Cython dependency #74

Comments

carloshorn commented Aug 6, 2020 • edited Loading

mraspaud commented Aug 6, 2020

carloshorn commented Aug 6, 2020

mraspaud commented Aug 6, 2020

sfinkens commented Aug 10, 2020

carloshorn commented Aug 10, 2020

djhoese commented Aug 10, 2020

carloshorn commented Aug 11, 2020

carloshorn commented Aug 6, 2020 •

edited

Loading