Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

findCorrelation_exact with problem in the variable selection #472

Closed
igorbraga13 opened this issue Oct 23, 2024 · 1 comment
Closed

findCorrelation_exact with problem in the variable selection #472

igorbraga13 opened this issue Oct 23, 2024 · 1 comment

Comments

@igorbraga13
Copy link

Since the commit f2ad13509ba3d6ab28069840c9631a66a9e7ecc8 the function findCorrelation_exact in the row 38 are computing the wrong mean absolute correlation to the mn2. The function are calculating the mean of the matrix without the variable j instead of calculate the mean of variable j against the another variables.

Have 2 possible solutions, the first one is remove the minus sign of the j, so you will compare 2 different rows of the matrix, but the output when the verbose = TRUE prints: "Compare row i and column j with corr..." and even if the row and column are mirrored(the same) this is a incorrect text. Then we have the second option: remove the minus sign and change the j to the column position.

#before the commit
(mean(x[i, -i]) > mean(x[-j, j]) #compare the mean of two vectors
#after the commit
mn1 <- mean(x2[i,], na.rm = TRUE) #return the mean of a vector
mn2 <- mean(x2[-j,], na.rm = TRUE) #return the mean of a matrix
#possible solution 1
mn2 <- mean(x2[j,], na.rm = TRUE) #return the mean of the row 
#possible solution2
mn2 <- mean(x2[, j], na.rm = TRUE) #return the mean of the column

reprex:

require(tidyverse)
#> Carregando pacotes exigidos: tidyverse

iris <- datasets::iris
cor_mat <- as.matrix(iris[,1:4]) %>% cor()

#Finding correlation manually ----
## Example with row 3 and column 4 ----
diag(cor_mat) <- NA
cor_mat <- abs(cor_mat)
i <- 3 #simulating the i-th iteraction when i = 3
j <- 4 #simulating the j-th iteraction when j = 4

mn1 <- mean(cor_mat[i,], na.rm = TRUE)
mn2 <- mean(cor_mat[,j], na.rm = TRUE)

cat("Compare row", i, " and column ", j,
    "with corr ", round(cor_mat[i,j], 3), "\n",
    "  Means: ", round(mn1, 3), "vs", round(mn2,3))
#> Compare row 3  and column  4 with corr  0.963 
#>    Means:  0.754 vs 0.716

## Example with row 4 and column 1 ----
cor_mat[3,] <- NA #removing row 3 and column 3 to simulate the iteraction of the function, based on the previous step
cor_mat[,3] <- NA

i <- 4 #simulating the i-th iteraction when i = 3
j <- 1 #simulating the j-th iteraction when j = 4

mn1 <- mean(cor_mat[i,], na.rm = TRUE)
mn2 <- mean(cor_mat[,j], na.rm = TRUE)

cat("Compare row", i, " and column ", j,
    "with corr ", round(cor_mat[i,j], 3), "\n",
    "  Means: ", round(mn1, 3), "vs", round(mn2,3))
#> Compare row 4  and column  1 with corr  0.818 
#>    Means:  0.592 vs 0.468

#Finding correlation with findCorrelation_exact ----
iris <- datasets::iris
cor_mat <- as.matrix(iris[,1:4]) %>% cor()

vars_cor <- caret::findCorrelation(
    cor_mat,
    cutoff = 0.7,
    names = T,
    exact = T,
    verbose = T)
#> Compare row 3  and column  4 with corr  0.963 
#>   Means:  0.754 vs 0.554 so flagging column 3 
#> Compare row 4  and column  1 with corr  0.818 
#>   Means:  0.592 vs 0.417 so flagging column 4 
#> All correlations <= 0.7

Created on 2024-10-23 with reprex v2.1.1

@jennybc
Copy link
Member

jennybc commented Oct 23, 2024

I'm not sure which package this issue is meant to target, but it's not reprex. reprex is just a reporting tool, i.e. the thing that helps you render the code above nicely, for pasting into a GitHub issue. But it still needs to go into the GitHub issues of the package you're struggling with (caret maybe?). If I'm right, this is the place to go: https://github.com/topepo/caret/issues.

@jennybc jennybc closed this as not planned Won't fix, can't repro, duplicate, stale Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants