Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bandwidth selection for 2d data #81

Open
JHWu92 opened this issue Nov 29, 2020 · 6 comments
Open

Bandwidth selection for 2d data #81

JHWu92 opened this issue Nov 29, 2020 · 6 comments

Comments

@JHWu92
Copy link

JHWu92 commented Nov 29, 2020

Hi,

How can I apply automatic bandwidth selection for 2d data? I tried 'silverman', 'scott', 'ISJ' and they all say "rule is only available for 1D data".

I also think about cross-validation but I don't know what and how to compute some error function for grid search. Any suggestion?

Best,
Jeff

@tommyod
Copy link
Owner

tommyod commented Nov 30, 2020

Hi Jeff,

This is not implemented, since it's not obvious (to me) how to generalize bandwidth to several dimensions for all kernels in an elegant way. In higher dimensions, the bandwidth is a matrix and not a number. That being said, you could scale the data and use bw=1, and it would be equivalent. Here is how it would work in 1D first:

from KDEpy.bw_selection import silvermans_rule
import KDEpy
import numpy as np

np.random.seed(1)

# 1D example
data = np.random.randn(100, 1)

# Compute the BW, scale, then scale back
bw = silvermans_rule(data[:, [0]])
data_scaled = data / np.array([bw])
x_scaled, y_scaled = KDEpy.FFTKDE(bw=1).fit(data_scaled).evaluate()
x = x_scaled * np.array([bw])
y = y_scaled / (bw)

# Use Silverman directly without scaling
x2, y2 = KDEpy.FFTKDE(bw="silverman").fit(data).evaluate()
plt.plot(x, y, label="scaling_method"); plt.plot(x2, y2, label="direct method");plt.legend();

image

If you are comfortable with a diagonal bandwidth matrix (scaling along each axis independently), you could generalize this to 2D like so:

from KDEpy.bw_selection import silvermans_rule
import KDEpy
import numpy as np

np.random.seed(1)

# 2D example
data = np.random.randn(100, 2)

# Compute the BW, scale, then scale back
bw1 = silvermans_rule(data[:, [0]])
bw2 = silvermans_rule(data[:, [1]])
data_scaled = data / np.array([bw1, bw2])
x_scaled, y_scaled = KDEpy.FFTKDE(bw=1).fit(data_scaled).evaluate((32, 32))
x = x_scaled * np.array([bw1, bw2])
y = y_scaled / (bw1 * bw2)

# Verify that the integral is 1
dx = (x[-1, 0] - x[0, 0] ) / (32 - 1) # 32 grid points by default in each direction
dy = (x[-1, 1] - x[0, 1] ) / (32 - 1)
np.sum(y * dx * dy) # 0.9999712583635577

If you want a full bandwidth matrix with off diagonal entries not equal to zero, you could scale and rotate the input data (using principal components), fit the bw=1 kernel, then scale and rotate back using the inverse transform. See this discussion for more details. Hopefully the above is good enough though.

I appreciate thoughts and questions on this issue. Basically multidimensional kernels either have the bandwidth matrix H equal to either (1) identity * h, (2) diagonal H (as seen above), or (3) non-diagonal H (PCA). Maybe I could implemented it in an elegant way, but I am not sure.

@JHWu92
Copy link
Author

JHWu92 commented Nov 30, 2020

Hi Tommy,

This is a nice trick! I know it would be hard to generalize to multiple dimensions. I don't even fully understand the mathematics of bandwidth in 1D, let alone higher dimension. I am working on geographical data, where x and y are projected to equal distance coordinate reference system. So I guess a diagonal bandwidth matrix should be fine, most certainly better than me assigning an arbitrary bandwidth.

By the way, improved_sheather_jones is the same as KDEpy.FFTKDE(bw="ISJ"), right? I tried your code with ISJ instead for 1d and the plot looks identical.

Thanks again for this highly efficient KDE package!

Best,
Jeff

@tommyod
Copy link
Owner

tommyod commented Nov 30, 2020

Yea, it should do the trick for practical purposes at least. I hope to implement some better support, but I don't want to implement too much magic either. At least tricks like these force the user the think explicitly about what they want. 👍

Yup, improved_sheather_jones is the same as ISJ :)

@Kaiyangshi-Ito
Copy link

In KDEpy 1.1.1, I didn't see any of such warning signs -- has this been implemented already? Thanks.

@tommyod
Copy link
Owner

tommyod commented Apr 25, 2023

@Kaiyangshi-Ito Not implemented. See

raise ValueError("ISJ is only available for 1D data.")

@adhamenaya
Copy link

@tommyod This workaround is so elegant!! thank you so much!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants