-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add alternative
argument for determining one-tailed or two-tailed permutation tests
#199
Comments
After giving this a more thorough look, it appears that the local join count is choosing the lesser of the 2 p-values from greater or lesser. Is there a way to select either lesser or greater alternatives when calculating the p-values? Choosing the lesser of the 2 feels somewhat off to me |
Thanks for the report! This is the intended behavior of our two-sided permutation test. The LJC does this, as do all of our simulation p-values (any
amounts to a one-tailed permutation test. We havent implemented a one-tailed version, but we're happy to implemented this with an To try to explain why we're doing this, maybe this will be helpful. For most autocorrelation statistics (especially local stats) we don't have a prior hypothesis about the direction of the test. So, it's reasonable to default to a "two-sided test:" we want to know how extreme a given test statistic is under a specific null hypothesis. Now, we need a way to assess extremity. We don't have anything "analytical" (like, equations & the like), so we synthesize replicates. We simulate from the "conditional randomization null hypothesis" using permutation to build up a set of "fake" local test statistics. We just wrote Sauer et al. (2021) (preprint) with plenty of discussion on the topic, but the original outlines come from Anselin (1995). Computationally, this means we have a target set of replicates and a test statistic. We want to figure out how "extreme" that test statistic is, given the distribution of replicates. So, we want the fraction of values that are on the "other side" of the test statistic from the rest of the distribution... but how do we define "other side?" There are three ways.
Where we can, we implement 1 and 3. In from esda.moran import Moran_Local
import numpy
numpy.random.seed(112112)
lmo = Moran_Local(guerry_ds.Donatns, w)
#here, the local replicate matrix is the .rlisas attribute
folded_replicates = numpy.abs(lmo.rlisas - numpy.median(lmo.rlisas, axis=1, keepdims=True))
folded_p_sim = (folded_replicates >= numpy.abs(lmo.Is[:,None])).mean(axis=1)
print((folded_p_sim < .01).sum()) # 0 are clusters using folding
print((lmo.p_sim < .01).sum()) # 9 are clusters using "outside" counting
print((lmo.p_z_sim*2 < .01).sum()) # 2 are clusters assuming a normal approximation The nonparametric counting strategy was implemented by @sjsrey back in the early days of the library. I think this is a faithful interpretation of why he did it this way, but I don't think we've actually spoken about it explicitly 😮 Note: edits to code example for repro & correctness on the folding |
Thank you so much for the added color! Much appreciated. |
alternative
argument for determining one-tailed or two-tailed permutation tests
We'll go ahead and close this then, and we can reopen if needed in the future. |
I've had the time to investigate the differences here after ruminating on some email exchanges with @rsbivand, and I think that we may want to report For justification, we can think about the folding strategy ( def folded_p(replicates, observed):
n_extreme = (numpy.abs(replicates) >= numpy.abs(observed.reshape(-1,1)).sum(axis=1)) + 1
return n_extreme / (replicates.shape[1]+1) Using this very simple method, we get the a correlation between the p-values of 99%, but the "outside" I think this may lead to an overstatement of the number of "significant" local statistics at any given significance threshold. This overstatement seems to be proportional to the number of significant observations, but this largely goes away when we use
from libpysal import weights, examples
from esda.moran import Moran_Local
import geopandas, numpy
data = geopandas.read_file(examples.get_path("election.shp"))
rook = weights.Rook.from_dataframe(data)
knn1 = weights.KNN.from_dataframe(data, k=1)
w = weights.attach_islands(rook, knn1)
swing = data.pct_dem_16 - data.pct_dem_12
lmo = Moran_Local(swing.values, w, permutations=9999)
p_abs = (numpy.abs(lmo.rlisas) > numpy.abs(lmo.Is[:, None])).sum(axis=1) + 1.0
p_abs /= lmo.permutations + 1
p_sim = lmo.p_sim
numpy.corrcoef(p_abs, p_sim)[0,1] |
@ljwolf, thank you very much for sharing! If you and Roger do end up doing a thorough write up, please let me know here. I'm following quite intently! |
@JosiahParry the determination of the pseudo-values is done with the following: Lines 222 to 224 in 7f83108
which is for the local statistics. This follows the logic we use for a global statistic (which is arguably easier to grok): Lines 181 to 190 in 7f83108
Your issue makes us realize we need to surface this logic in the documentation. Thanks! |
@sjsrey any perspective on the |
@ljwolf the outside counting method we have is a one-tailed procedure that follows from logic in geoda. It does the selection of the tail for the user. I think we can surface this better in the docs. For the lisas, I'm not clear on the use of the absolute values? That would seem to suggest you want a two-tailed test, not a directional test? |
Right, so this is my confusion: My understanding of a "classical" one-tailed test requires you to specify a direction a priori (e.g. Now, this is normally the case for directed and undirected tests: Does that make sense? Maybe I misunderstand. |
I see what you are getting at, and I agree with the points you make. My interpretation is the logic for the permutation based inference was born in the "pre-local" era (and before geoda) where the focus was on a single test statistic: Things are more complicated for the local tests, since we now have At first glance, I think a fix might be to add a kw argument to the global test At the local level, we interpret all tests as one-tailed with the current logic. We could say the default is Does this make sense to you? [1] Hope, A. C. A. (1968). A Simplified Monte Carlo Significance Test Procedure. Journal of the Royal Statistical Society. Series B (Methodological), 30(3), 582–598. |
I should say: I recognize the issue that specifying the direction of local tests one by one would be impossible... but we could provide similar "auto" behavior by picking the direction of the test using EIc? If the test statistic is above its EIc, we run "greater", otherwise we run "lesser". Then, we have four alternative options:
|
I agree on the new logic. I'm a bit hesitant to deprecate the current behavior as it is based on what geoda does. Might we consider a |
Sounds like a great idea to me! |
Very interesting discussion! I'll align spdep with your choices, probably with "hope" as default, once esda has been updated. |
Hey folks! This issue has been brought up again in regards to spdep in a private email to me. Has |
It's not implemented yet, but we agreed above to implement the following:
I also want to test whether a percentile rule will work (find the percentile p of the test statistic in the permutation distribution, then calculate the % of permutations outside of p and 1-p.) I think that doing this should be easy, and should be done in a general function like: def calculate_significance(test_stat, reference_distribution, method='two-sided'):
... which we then just call within every statistic. |
Sounds good! Should this issue be reopened or another made with less baggage? |
I should say, @JosiahParry if you have time to start defining keep the discussion here, and I will reopen the issue. |
It appears that when a p-value exceeds
0.5
, a different method of p-value calculation is used rather than the simulation p-value formula(M + 1) / (R + 1)
.The below example manually calculates the p-value for each observation.
Can the p-value calculation method be documented or corrected?
Note that when the calculated p-value is
0.5
the report simulated p-value is less than 0.5.The text was updated successfully, but these errors were encountered: