gdim
estimates graph dimension using cross-validated eigenvalues, via
the graph-splitting technique developed in
https://arxiv.org/abs/2108.03336. Theoretically, the method works by
computing a special type of cross-validated eigenvalue which follows a
simple central limit theorem. This allows users to perform hypothesis
tests on the rank of the graph.
You can install gdim
from CRAN with:
install.packages("gdim")
# to get the development version from GitHub:
install.packages("pak")
pak::pak("RoheLab/gdim")
eigcv()
is the main function in gdim
. The single required parameter
for the function is the maximum possible dimension, k_max
.
In the following example, we generate a random graph from the stochastic block model (SBM) with 1000 nodes and 5 blocks (as such, we would expect the estimated graph dimension to be 5).
library(fastRG)
#> Loading required package: Matrix
B <- matrix(0.1, 5, 5)
diag(B) <- 0.3
model <- sbm(
n = 1000,
k = 5,
B = B,
expected_degree = 40,
poisson_edges = FALSE,
allow_self_loops = FALSE
)
A <- sample_sparse(model)
Here, A
is the adjacency matrix.
Now, we call the eigcv()
function with k_max=10
to estimate graph
dimension.
library(gdim)
eigcv_result <- eigcv(A, k_max = 10)
#> 'as(<dsCMatrix>, "dgCMatrix")' is deprecated.
#> Use 'as(., "generalMatrix")' instead.
#> See help("Deprecated") and help("Matrix-deprecated").
eigcv_result
#> Estimated graph dimension: 5
#>
#> Number of bootstraps: 10
#> Edge splitting probabaility: 0.1
#> Significance level: 0.05
#>
#> ------------ Summary of Tests ------------
#> k z pvals padj
#> 1 59.7488180 2.220446e-16 2.220446e-15
#> 2 12.9094629 2.220446e-16 2.220446e-15
#> 3 11.8600427 2.220446e-16 2.220446e-15
#> 4 11.9412340 2.220446e-16 2.220446e-15
#> 5 9.0252520 2.220446e-16 2.220446e-15
#> 6 -0.8512008 8.026711e-01 1.000000e+00
#> 7 -0.8182195 7.933841e-01 1.000000e+00
#> 8 -0.9912649 8.392219e-01 1.000000e+00
#> 9 -0.9005808 8.160944e-01 1.000000e+00
#> 10 -1.1677953 8.785553e-01 1.000000e+00
In this example, eigcv()
suggests k=5
.
To visualize the result, use plot()
which returns a ggplot
object.
The function displays the test statistic (z score) for each hypothesized
graph dimension.
plot(eigcv_result)
Chen, Fan, Sebastien Roch, Karl Rohe, and Shuqi Yu. “Estimating Graph Dimension with Cross-Validated Eigenvalues.” ArXiv:2108.03336 [Cs, Math, Stat], August 6, 2021. https://arxiv.org/abs/2108.03336.