Feature Request for Shave #24

shambhu112 · 2021-06-07T15:15:53Z

Would be great to have a function that can shave off rows and cols that are above a threshold for poorly corelated variables

i.e something like
shave(min = -0.2 , max = 0.2)

this will shave off (i.e not show) variables that are corelated to any other variable in the range above

r-link · 2021-06-09T14:50:28Z

This is actually pretty easy to implement. Basically you'd have to subset the numeric columns of the dataset by something like data[ , sapply(1:ncol(data), function(i) max(abs(cor(data)[-i, i]))) > threshold] or something like that.

I am not sure if I really want to add such a feature because it does not really fit with the philosophy behind corrmorant - my idea was to provide a versatile tool for data inspection, but to make it extra complicated to use it for data dredging and p hacking. If you ever wondered why there is no build-in function to add p-values to the correlations, that's the reason (you can do it with add_funtext() but if you know enough R to find out how you probably also know why that's not a good idea).

I see why shave() may be useful, but I am not really fond of the idea that people might use the shave function to remove the variables that are not strongly correlated with anything and then publish a paper based on the reduced dataset without mentioning it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request for Shave #24

Feature Request for Shave #24

shambhu112 commented Jun 7, 2021

r-link commented Jun 9, 2021

Feature Request for Shave #24

Feature Request for Shave #24

Comments

shambhu112 commented Jun 7, 2021

r-link commented Jun 9, 2021