Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future Improvements- Returning a DataFrame of all source-destination nodes within a certain distance (and including the distance) #56

Open
stefancoe opened this issue Jan 30, 2017 · 24 comments

Comments

@stefancoe
Copy link

With an eye towards the release of the urbanaccess libary, I'd like to test using pandana to build accessibility isochrones. My idea is to associate parcels with nodes (as we are already doing in our Soundcast accessibility scripts: https://github.com/psrc/soundcast/blob/master/scripts/accessibility/accessibility.py) and then have pandana return all the nodes/parcels that are accessible within a certain distance/impedance to a passed in set of coordinates. I would then be able to build polygon isochrones using either the parcel point or polygon geometries from the returned ids. I don't think this functionality exists in pandana yet- there would be potentially too many nodes/parcels for the nearest pois method. It sounds like the Future Improvement listed in the title would do the trick and so I was wondering if there are plans to implement this feature? The end goal is to produce transit accessibility isochrones so if this functionality will exist in urbanaccess then I am happy to wait until its release. Thanks!

Stefan

@fscottfoti
Copy link
Contributor

FYI - this has existed in Pandana in C++ for a while, but did not make it into the version that was converted to use Pandas DataFrames.

@sablanchard
Copy link
Collaborator

Thanks Stefan,
We will place this on our planned list of improvements and can keep you posted on our progress once we commit to digging into the C++ code. We have some other items on the list that require this so likely will try to address all of them at once.

@semcogli
Copy link

Hi, Stefan

I think you could go the other way by returning your set of coordinates ( I assume those are transit stops) within predefined distance to parcel centroids. Then do a reversed query to get your isochrones. We have done similar analysis for transit access and we used 200+ transit stations. And pandana returned a table with 400+ columns successfully!

@fscottfoti
Copy link
Contributor

Ahh yes, that's a pretty great idea @semcogli very clever

@stefancoe
Copy link
Author

@semcogli-that is a great idea! Funny because we are doing that already in our accessibility script to get the distance from each parcel to nearest transit stop. Thanks also to @fscottfoti and @sablanchard for replying!

@bstabler
Copy link
Member

bstabler commented Aug 2, 2019

I would like to see this feature added as well. Are there any updates on this? Is it likely to be added anytime soon?

@knaaptime
Copy link

hey folks, is there still potential to expose this from the C++ side? I'm working on a new spatial weights constructor for libpysal that uses pandana to find the "neighbors" withn some network distance of each observation. It accomplishes the spirit of this issue, but does so really inefficiently by iterating over all o-d pairs of an input geodataframe and calling shortest_path_lengths before filtering out those beyond the threshold. If instead I could just call get_nodes_in_range (and it returned the distance in addition to the node id) it would be killer and presumably much faster

@fscottfoti
Copy link
Contributor

It looks like the get_nodes_in_range function is gone, but it's not really needed. Basically this returns a DistanceVec which is a node_id to distance map (so has what you want).

Also once you call precomputeRangeQueries, the info you want is just sitting there in memory here. Must be tantalizing ;)

It just needs to be added to the cython file (cyaccess.pyx) to get it back into python-land. Maybe you should just add it?

@knaaptime
Copy link

sweet, thanks @fscottfoti! i was hoping there would be a relatively simple path like this (and i was pretty sure precompute essentially calculated everything i needed...)

Tantalizing indeed! I'll read up on how these C/++ extensions work. Once complete, I'm guessing this would mean reducing to a workflow something like

net = pdna.Network(...)
net.precompute(3000)
neighbor_dist = net.get_distance_vec()

which would grab the data sitting in memory and return a scipy sparse matrix or a dataframe or something?

@fscottfoti
Copy link
Contributor

Yeah sounds right - might just return a dict of dicts of weights? The first key could be the origin node_id, the second key would be the destination node_id and the second value would be the weight?

@knaaptime
Copy link

perfect

@smmaurer
Copy link
Member

smmaurer commented Jan 5, 2021

I agree that this would be super useful! Thanks @fscottfoti for pointing out the relevant code. I can take a stab at revealing this on the Python side -- @knaaptime let's coordinate.

@knaaptime
Copy link

🎉🎉

@knaaptime
Copy link

(fwiw, when this is implemented it would also mean you could use all the same tooling and workflows, but estimate spatial models in place of OLS for the hedonics back upstream in UDST)

@smmaurer
Copy link
Member

smmaurer commented Jan 5, 2021

Ok, a first pass of this is working in the range-queries branch!

The initial Python function is network.nodes_in_range(node, radius), which returns results for a single source node. I'm thinking the Python API should support a single source node, a list of source nodes, and the entire graph.

Precomputing is optional -- if it's been done, we can return cached results, and otherwise they'll be computed when the function is called.

In C++, the data is stored as a vector of NodeID–distance pairs. The direct translation into Python is a list of tuples, for example the following if you ask for nodes within 4 impedance units of node 1:

[(1, 0.0), (0, 1.0), (908, 1.0), (21, 1.0), (907, 2.0), (787, 2.0), (423, 2.0), 
(22, 2.0), (20, 2.0), (513, 3.0), (19, 3.0), (786, 3.0), (462, 3.0), (214, 3.0), 
(679, 3.0), (420, 3.0), (123, 4.0), (124, 4.0), (17, 4.0), (1436, 4.0), 
(141, 4.0), (1313, 4.0), (212, 4.0), (903, 4.0), (511, 4.0)]

What output formats seem most useful, though? Probably a dataframe or sparse matrix? I'll try some preprocessing in C++ to optimize performance for generating those.

cc @janowicz

@fscottfoti
Copy link
Contributor

Nice! So cython knows to turn a vector of pairs into a list of tuples? That's pretty nice.

A list of namedtuples could be nice for the single node. Maybe a sparse matrix for the whole graph.

@knaaptime
Copy link

awesome!! thanks a ton for the quick work on this! I agree with @fscottfoti on the formats

@knaaptime
Copy link

i know you guys are busy with other parts of the package at the moment, but just wanted to drop by and see if there's anything i can do to test/otherwise help move this along :)

@smmaurer
Copy link
Member

Hi @knaaptime, thanks for following up about this!

Looking back at my work in progress, I think it's close to being mergeable. I'll find some time soon to finish it up, and some testing/input from your end would definitely be helpful too!

You can check things out in the range-queries branch, which includes a runnable example of the new functionality in examples/range_example.py. Does this seem workable for the use case you had in mind?

One to-do was that it returns results as a list of lists of node id–distance tuples, which is not a very convenient format if you have a lot of origin nodes. We could either leave this as-is for the initial release, or take a pass at conversion to something else -- probably multi-indexed DataFrame would be the easiest but I'm not sure.

@bstabler
Copy link
Member

I'm excited for this feature, thanks @smmaurer!

@knaaptime
Copy link

awesome @smmaurer, thank you! I'll play with the range-queries branch, but i agree this is pretty much complete for what i'm after. A dataframe would be manageable, but i think a scipy.sparse matrix might be the ideal output for my use case (generating a spatial weights matrix)

@ljwolf
Copy link
Contributor

ljwolf commented Jul 9, 2021

I'd love to see this get merged! Like @knaaptime, I'd love to have simple range-queries on networks with Pandana. I've been blown away by its efficiency!

I've tackled the return value thing plus docstrings in my fork, and I've sent a draft PR (#167). I opted there to work in the same manner as nearest_pois, using columns rather than indices in the output data frame. Happy to change that, but I was hoping to get the branch cleaned up & prepared to merge.

The main issue is that the branch currently returns distances greater than the limiting radius. I've implemented a fix in c++ (that's buggy) and a fix using pandas (that's wasteful).

I hope this is useful, and I'm game to do the legwork to push this through.

@knaaptime
Copy link

hey folks, just wanted to check in and see if anyone is game to merge this and cut a new release. Would love to use this new functionality, and would be more than happy to help tie any loose ends I can :)

@knaaptime
Copy link

sorry to bother folks, but just wanted to check back in and see if there's any hope to get this merged. If there's any work left to do, I'm happy to do the lifting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants