Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nb print method with large nb objects crashes R #160

Closed
JosiahParry opened this issue Jun 11, 2024 · 8 comments
Closed

nb print method with large nb objects crashes R #160

JosiahParry opened this issue Jun 11, 2024 · 8 comments

Comments

@JosiahParry
Copy link
Contributor

I'm unsure what specifically about the nb class print method is causing this, but I've noticed it previously and am encountering it again today.

With somewhat large neighbor matrices, spdep will hang in computing for quite some time leading to RStudio reporting that R is not responding.

housing <- read.csv("https://raw.githubusercontent.com/xj-liu/spatial_feature_incorporation/main/houses1990.csv") |> 
  sf::st_as_sf(coords = c("longitude", "latitude"), crs = 4326) |> 
  sf::st_jitter(0.0001)

# calculate the maximum distance for K = 1 as the distance band
upper_bound <- sfdep::critical_threshold(housing$geometry)

nb <- spdep::dnearneigh(
    housing$geometry,
    0, upper_bound
)

# printing here will cause R to hang
nb
@JosiahParry
Copy link
Contributor Author

I think it may be this part: {n} disjoint connected subgraphs

@rsbivand
Copy link
Member

https://github.com/r-spatial/spdep/blob/main/R%2Fsummary.nb.R#L65 was added to declare the subgraph count. This is done regularly in CARBayes and other downstream modelling packages because if the weights matrix is block diagonal, there could be multiple measures, one for each block. See also the end of https://r-spatial.org/book/14-Areal.html#contiguous-neighbours, where n.comp.nb is discussed. Maybe igraph might be faster. Don't trust RStudio about session non-responsiveness. I'll try to check how the function performs both with large n and with dense neighbours. Maybe run under debug to check the location of the slow-running line, or run n.comp.nb on the neighour object.

@rsbivand
Copy link
Member

The implementation by Nicholas Lewin-Koh from 2001 uses Depth First Search; https://en.wikipedia.org/wiki/Depth-first_search gives its time complexity as order of node count plus edge count. The time complexity for igraph::components seems to be the same: https://igraph.org/c/doc/igraph-Structural.html#igraph_connected_components. igraph::is_connected has the same time complexity, unfortunately.

In: , I added an igraph option to m.comp.nb, house has ~ 25000 points:

> library(spdep)
Loading required package: spData
Loading required package: sf
Linking to GEOS 3.12.2, GDAL 3.9.0, PROJ 9.4.1; sf_use_s2() is TRUE
> data(house)
> house <- sf::st_as_sf(house)
> dim(house)
[1] 25357    25
> k6 <- knn2nb(knearneigh(house, k=6))
> system.time(o <- n.comp.nb(k6, igraph=FALSE))
   user  system elapsed 
  1.617   0.004   1.634 
> system.time(o <- n.comp.nb(k6, igraph=TRUE))
   user  system elapsed 
  2.986   0.058   3.055 
> k60 <- knn2nb(knearneigh(house, k=60))
> system.time(o <- n.comp.nb(k60, igraph=FALSE))
   user  system elapsed 
 12.218   0.020  12.307 
> system.time(o <- n.comp.nb(k60, igraph=TRUE))
   user  system elapsed 
 16.574   0.089  16.727 
> k600 <- knn2nb(knearneigh(house, k=600))
> system.time(o <- n.comp.nb(k600, igraph=FALSE))
   user  system elapsed 
275.825   0.107 277.011 
> system.time(o <- n.comp.nb(k600, igraph=TRUE))
   user  system elapsed 
322.273   1.327 325.135 

with |V| always 25357, but here |E| k=6: 183834, k=60: 1776536; k=600: 18267908. So the edge count is driving most of the time complexity. However, the nb objects are made symmetric inside n.comp.nb, and make.sym.nb takes most of the time, which needs to be handled. I'm looking at that now.

@rsbivand
Copy link
Member

> library(spdep)
Loading required package: spData
Loading required package: sf
Linking to GEOS 3.12.2, GDAL 3.9.0, PROJ 9.4.1; sf_use_s2() is TRUE
> data(house)
> house <- sf::st_as_sf(house)
> k6 <- knn2nb(knearneigh(house, k=6))
> system.time(o <- n.comp.nb(k6, igraph=FALSE))
   user  system elapsed 
  1.603   0.002   1.618 
> system.time(o <- n.comp.nb(k6, igraph=TRUE))
   user  system elapsed 
  1.522   0.059   1.596 
> k60 <- knn2nb(knearneigh(house, k=60))
> system.time(o <- n.comp.nb(k60, igraph=FALSE))
   user  system elapsed 
 11.925   0.006  11.979 
> system.time(o <- n.comp.nb(k60, igraph=TRUE))
   user  system elapsed 
  6.589   0.128   6.736 
> k600 <- knn2nb(knearneigh(house, k=600))
> system.time(o <- n.comp.nb(k600, igraph=FALSE))
   user  system elapsed 
267.720   0.081 268.524 
> system.time(o <- n.comp.nb(k600, igraph=TRUE))
   user  system elapsed 
 73.950   1.633  75.786 

Commit (including some other correction of igraph use): 80262d0. Also added in the igraph=FALSE setting the possibility to interrupt the C function doing the depth-first search.

@rsbivand
Copy link
Member

I dropped the igraph= argument, and in commit 8e8fdbe detect igraph and spatialreg in n.comp.nb and use them if available. The symmetry attribute in the nb object is used. When nb objects are created, an "ncomp" attribute is added then used subsequently where needed, controlled by get.SubgraphOption (default TRUE) and get.SubgraphCeiling (default 10000) being less than |N|+|E|.

@JosiahParry
Copy link
Contributor Author

Brilliant thank you so much! I think this will be an improvement for users!

@rsbivand
Copy link
Member

Running reverse dependency checks now, should be OK, but I'll also contact packages maintainers using n.comp.nb to alert them to the changes, and to ask for input before moving forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants