Bandwidth selection for 2d data #81

JHWu92 · 2020-11-29T20:39:27Z

Hi,

How can I apply automatic bandwidth selection for 2d data? I tried 'silverman', 'scott', 'ISJ' and they all say "rule is only available for 1D data".

I also think about cross-validation but I don't know what and how to compute some error function for grid search. Any suggestion?

Best,
Jeff

tommyod · 2020-11-30T07:57:40Z

Hi Jeff,

This is not implemented, since it's not obvious (to me) how to generalize bandwidth to several dimensions for all kernels in an elegant way. In higher dimensions, the bandwidth is a matrix and not a number. That being said, you could scale the data and use bw=1, and it would be equivalent. Here is how it would work in 1D first:

from KDEpy.bw_selection import silvermans_rule
import KDEpy
import numpy as np

np.random.seed(1)

# 1D example
data = np.random.randn(100, 1)

# Compute the BW, scale, then scale back
bw = silvermans_rule(data[:, [0]])
data_scaled = data / np.array([bw])
x_scaled, y_scaled = KDEpy.FFTKDE(bw=1).fit(data_scaled).evaluate()
x = x_scaled * np.array([bw])
y = y_scaled / (bw)

# Use Silverman directly without scaling
x2, y2 = KDEpy.FFTKDE(bw="silverman").fit(data).evaluate()
plt.plot(x, y, label="scaling_method"); plt.plot(x2, y2, label="direct method");plt.legend();

If you are comfortable with a diagonal bandwidth matrix (scaling along each axis independently), you could generalize this to 2D like so:

from KDEpy.bw_selection import silvermans_rule
import KDEpy
import numpy as np

np.random.seed(1)

# 2D example
data = np.random.randn(100, 2)

# Compute the BW, scale, then scale back
bw1 = silvermans_rule(data[:, [0]])
bw2 = silvermans_rule(data[:, [1]])
data_scaled = data / np.array([bw1, bw2])
x_scaled, y_scaled = KDEpy.FFTKDE(bw=1).fit(data_scaled).evaluate((32, 32))
x = x_scaled * np.array([bw1, bw2])
y = y_scaled / (bw1 * bw2)

# Verify that the integral is 1
dx = (x[-1, 0] - x[0, 0] ) / (32 - 1) # 32 grid points by default in each direction
dy = (x[-1, 1] - x[0, 1] ) / (32 - 1)
np.sum(y * dx * dy) # 0.9999712583635577

If you want a full bandwidth matrix with off diagonal entries not equal to zero, you could scale and rotate the input data (using principal components), fit the bw=1 kernel, then scale and rotate back using the inverse transform. See this discussion for more details. Hopefully the above is good enough though.

I appreciate thoughts and questions on this issue. Basically multidimensional kernels either have the bandwidth matrix H equal to either (1) identity * h, (2) diagonal H (as seen above), or (3) non-diagonal H (PCA). Maybe I could implemented it in an elegant way, but I am not sure.

JHWu92 · 2020-11-30T20:29:53Z

Hi Tommy,

This is a nice trick! I know it would be hard to generalize to multiple dimensions. I don't even fully understand the mathematics of bandwidth in 1D, let alone higher dimension. I am working on geographical data, where x and y are projected to equal distance coordinate reference system. So I guess a diagonal bandwidth matrix should be fine, most certainly better than me assigning an arbitrary bandwidth.

By the way, improved_sheather_jones is the same as KDEpy.FFTKDE(bw="ISJ"), right? I tried your code with ISJ instead for 1d and the plot looks identical.

Thanks again for this highly efficient KDE package!

Best,
Jeff

tommyod · 2020-11-30T21:30:01Z

Yea, it should do the trick for practical purposes at least. I hope to implement some better support, but I don't want to implement too much magic either. At least tricks like these force the user the think explicitly about what they want. 👍

Yup, improved_sheather_jones is the same as ISJ :)

Kaiyangshi-Ito · 2023-04-24T20:55:51Z

In KDEpy 1.1.1, I didn't see any of such warning signs -- has this been implemented already? Thanks.

tommyod · 2023-04-25T07:02:50Z

@Kaiyangshi-Ito Not implemented. See

KDEpy/KDEpy/bw_selection.py

Line 152 in 73d421e

raise ValueError("ISJ is only available for 1D data.")

adhamenaya · 2024-11-06T14:20:35Z

@tommyod This workaround is so elegant!! thank you so much!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bandwidth selection for 2d data #81

Bandwidth selection for 2d data #81

JHWu92 commented Nov 29, 2020

tommyod commented Nov 30, 2020 •

edited

Loading

JHWu92 commented Nov 30, 2020

tommyod commented Nov 30, 2020

Kaiyangshi-Ito commented Apr 24, 2023

tommyod commented Apr 25, 2023

adhamenaya commented Nov 6, 2024

Bandwidth selection for 2d data #81

Bandwidth selection for 2d data #81

Comments

JHWu92 commented Nov 29, 2020

tommyod commented Nov 30, 2020 • edited Loading

JHWu92 commented Nov 30, 2020

tommyod commented Nov 30, 2020

Kaiyangshi-Ito commented Apr 24, 2023

tommyod commented Apr 25, 2023

adhamenaya commented Nov 6, 2024

tommyod commented Nov 30, 2020 •

edited

Loading