Future Improvements- Returning a DataFrame of all source-destination nodes within a certain distance (and including the distance) #56

stefancoe · 2017-01-30T18:19:42Z

With an eye towards the release of the urbanaccess libary, I'd like to test using pandana to build accessibility isochrones. My idea is to associate parcels with nodes (as we are already doing in our Soundcast accessibility scripts: https://github.com/psrc/soundcast/blob/master/scripts/accessibility/accessibility.py) and then have pandana return all the nodes/parcels that are accessible within a certain distance/impedance to a passed in set of coordinates. I would then be able to build polygon isochrones using either the parcel point or polygon geometries from the returned ids. I don't think this functionality exists in pandana yet- there would be potentially too many nodes/parcels for the nearest pois method. It sounds like the Future Improvement listed in the title would do the trick and so I was wondering if there are plans to implement this feature? The end goal is to produce transit accessibility isochrones so if this functionality will exist in urbanaccess then I am happy to wait until its release. Thanks!

Stefan

fscottfoti · 2017-01-30T18:42:09Z

FYI - this has existed in Pandana in C++ for a while, but did not make it into the version that was converted to use Pandas DataFrames.

sablanchard · 2017-01-30T19:46:03Z

Thanks Stefan,
We will place this on our planned list of improvements and can keep you posted on our progress once we commit to digging into the C++ code. We have some other items on the list that require this so likely will try to address all of them at once.

semcogli · 2017-01-30T19:48:57Z

Hi, Stefan

I think you could go the other way by returning your set of coordinates ( I assume those are transit stops) within predefined distance to parcel centroids. Then do a reversed query to get your isochrones. We have done similar analysis for transit access and we used 200+ transit stations. And pandana returned a table with 400+ columns successfully!

fscottfoti · 2017-01-30T20:00:38Z

Ahh yes, that's a pretty great idea @semcogli very clever

stefancoe · 2017-01-30T23:50:26Z

@semcogli-that is a great idea! Funny because we are doing that already in our accessibility script to get the distance from each parcel to nearest transit stop. Thanks also to @fscottfoti and @sablanchard for replying!

bstabler · 2019-08-02T22:27:21Z

I would like to see this feature added as well. Are there any updates on this? Is it likely to be added anytime soon?

knaaptime · 2020-12-28T01:33:36Z

hey folks, is there still potential to expose this from the C++ side? I'm working on a new spatial weights constructor for libpysal that uses pandana to find the "neighbors" withn some network distance of each observation. It accomplishes the spirit of this issue, but does so really inefficiently by iterating over all o-d pairs of an input geodataframe and calling shortest_path_lengths before filtering out those beyond the threshold. If instead I could just call get_nodes_in_range (and it returned the distance in addition to the node id) it would be killer and presumably much faster

fscottfoti · 2020-12-29T17:35:16Z

It looks like the get_nodes_in_range function is gone, but it's not really needed. Basically this returns a DistanceVec which is a node_id to distance map (so has what you want).

Also once you call precomputeRangeQueries, the info you want is just sitting there in memory here. Must be tantalizing ;)

It just needs to be added to the cython file (cyaccess.pyx) to get it back into python-land. Maybe you should just add it?

knaaptime · 2020-12-30T03:40:02Z

sweet, thanks @fscottfoti! i was hoping there would be a relatively simple path like this (and i was pretty sure precompute essentially calculated everything i needed...)

Tantalizing indeed! I'll read up on how these C/++ extensions work. Once complete, I'm guessing this would mean reducing to a workflow something like

net = pdna.Network(...)
net.precompute(3000)
neighbor_dist = net.get_distance_vec()

which would grab the data sitting in memory and return a scipy sparse matrix or a dataframe or something?

fscottfoti · 2020-12-30T15:55:55Z

Yeah sounds right - might just return a dict of dicts of weights? The first key could be the origin node_id, the second key would be the destination node_id and the second value would be the weight?

knaaptime · 2020-12-30T21:10:01Z

perfect

smmaurer · 2021-01-05T00:41:45Z

I agree that this would be super useful! Thanks @fscottfoti for pointing out the relevant code. I can take a stab at revealing this on the Python side -- @knaaptime let's coordinate.

knaaptime · 2021-01-05T00:58:49Z

🎉🎉

knaaptime · 2021-01-05T01:21:48Z

(fwiw, when this is implemented it would also mean you could use all the same tooling and workflows, but estimate spatial models in place of OLS for the hedonics back upstream in UDST)

smmaurer · 2021-01-05T23:58:48Z

Ok, a first pass of this is working in the range-queries branch!

The initial Python function is network.nodes_in_range(node, radius), which returns results for a single source node. I'm thinking the Python API should support a single source node, a list of source nodes, and the entire graph.

Precomputing is optional -- if it's been done, we can return cached results, and otherwise they'll be computed when the function is called.

In C++, the data is stored as a vector of NodeID–distance pairs. The direct translation into Python is a list of tuples, for example the following if you ask for nodes within 4 impedance units of node 1:

[(1, 0.0), (0, 1.0), (908, 1.0), (21, 1.0), (907, 2.0), (787, 2.0), (423, 2.0), 
(22, 2.0), (20, 2.0), (513, 3.0), (19, 3.0), (786, 3.0), (462, 3.0), (214, 3.0), 
(679, 3.0), (420, 3.0), (123, 4.0), (124, 4.0), (17, 4.0), (1436, 4.0), 
(141, 4.0), (1313, 4.0), (212, 4.0), (903, 4.0), (511, 4.0)]

What output formats seem most useful, though? Probably a dataframe or sparse matrix? I'll try some preprocessing in C++ to optimize performance for generating those.

cc @janowicz

fscottfoti · 2021-01-06T00:18:03Z

Nice! So cython knows to turn a vector of pairs into a list of tuples? That's pretty nice.

A list of namedtuples could be nice for the single node. Maybe a sparse matrix for the whole graph.

knaaptime · 2021-01-06T01:40:51Z

awesome!! thanks a ton for the quick work on this! I agree with @fscottfoti on the formats

knaaptime · 2021-05-07T21:36:48Z

i know you guys are busy with other parts of the package at the moment, but just wanted to drop by and see if there's anything i can do to test/otherwise help move this along :)

smmaurer · 2021-06-16T23:22:04Z

Hi @knaaptime, thanks for following up about this!

Looking back at my work in progress, I think it's close to being mergeable. I'll find some time soon to finish it up, and some testing/input from your end would definitely be helpful too!

You can check things out in the range-queries branch, which includes a runnable example of the new functionality in examples/range_example.py. Does this seem workable for the use case you had in mind?

One to-do was that it returns results as a list of lists of node id–distance tuples, which is not a very convenient format if you have a lot of origin nodes. We could either leave this as-is for the initial release, or take a pass at conversion to something else -- probably multi-indexed DataFrame would be the easiest but I'm not sure.

bstabler · 2021-06-17T15:52:40Z

I'm excited for this feature, thanks @smmaurer!

knaaptime · 2021-06-24T15:48:00Z

awesome @smmaurer, thank you! I'll play with the range-queries branch, but i agree this is pretty much complete for what i'm after. A dataframe would be manageable, but i think a scipy.sparse matrix might be the ideal output for my use case (generating a spatial weights matrix)

ljwolf · 2021-07-09T22:39:32Z

I'd love to see this get merged! Like @knaaptime, I'd love to have simple range-queries on networks with Pandana. I've been blown away by its efficiency!

I've tackled the return value thing plus docstrings in my fork, and I've sent a draft PR (#167). I opted there to work in the same manner as nearest_pois, using columns rather than indices in the output data frame. Happy to change that, but I was hoping to get the branch cleaned up & prepared to merge.

The main issue is that the branch currently returns distances greater than the limiting radius. I've implemented a fix in c++ (that's buggy) and a fix using pandas (that's wasteful).

I hope this is useful, and I'm game to do the legwork to push this through.

knaaptime · 2021-12-14T23:32:11Z

hey folks, just wanted to check in and see if anyone is game to merge this and cut a new release. Would love to use this new functionality, and would be more than happy to help tie any loose ends I can :)

knaaptime · 2022-08-01T15:52:58Z

sorry to bother folks, but just wanted to check back in and see if there's any hope to get this merged. If there's any work left to do, I'm happy to do the lifting

pksohn added the Type: enhancement label Apr 4, 2017

knaaptime mentioned this issue Oct 7, 2019

Expose shortest path functionality #39

Open

knaaptime mentioned this issue Aug 14, 2020

integrate_network returning floats instead of ints UDST/urbanaccess#64

Open

knaaptime mentioned this issue Jan 6, 2021

[WIP] initial draft of network weights pysal/libpysal#356

Closed

darribas mentioned this issue Mar 14, 2021

Consider adding isochrones to transport costs section darribas/gds4ae#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future Improvements- Returning a DataFrame of all source-destination nodes within a certain distance (and including the distance) #56

Future Improvements- Returning a DataFrame of all source-destination nodes within a certain distance (and including the distance) #56

stefancoe commented Jan 30, 2017

fscottfoti commented Jan 30, 2017

sablanchard commented Jan 30, 2017

semcogli commented Jan 30, 2017

fscottfoti commented Jan 30, 2017

stefancoe commented Jan 30, 2017

bstabler commented Aug 2, 2019 •

edited

Loading

knaaptime commented Dec 28, 2020

fscottfoti commented Dec 29, 2020

knaaptime commented Dec 30, 2020

fscottfoti commented Dec 30, 2020

knaaptime commented Dec 30, 2020

smmaurer commented Jan 5, 2021

knaaptime commented Jan 5, 2021

knaaptime commented Jan 5, 2021

smmaurer commented Jan 5, 2021

fscottfoti commented Jan 6, 2021

knaaptime commented Jan 6, 2021

knaaptime commented May 7, 2021

smmaurer commented Jun 16, 2021

bstabler commented Jun 17, 2021

knaaptime commented Jun 24, 2021

ljwolf commented Jul 9, 2021

knaaptime commented Dec 14, 2021

knaaptime commented Aug 1, 2022

Future Improvements- Returning a DataFrame of all source-destination nodes within a certain distance (and including the distance) #56

Future Improvements- Returning a DataFrame of all source-destination nodes within a certain distance (and including the distance) #56

Comments

stefancoe commented Jan 30, 2017

fscottfoti commented Jan 30, 2017

sablanchard commented Jan 30, 2017

semcogli commented Jan 30, 2017

fscottfoti commented Jan 30, 2017

stefancoe commented Jan 30, 2017

bstabler commented Aug 2, 2019 • edited Loading

knaaptime commented Dec 28, 2020

fscottfoti commented Dec 29, 2020

knaaptime commented Dec 30, 2020

fscottfoti commented Dec 30, 2020

knaaptime commented Dec 30, 2020

smmaurer commented Jan 5, 2021

knaaptime commented Jan 5, 2021

knaaptime commented Jan 5, 2021

smmaurer commented Jan 5, 2021

fscottfoti commented Jan 6, 2021

knaaptime commented Jan 6, 2021

knaaptime commented May 7, 2021

smmaurer commented Jun 16, 2021

bstabler commented Jun 17, 2021

knaaptime commented Jun 24, 2021

ljwolf commented Jul 9, 2021

knaaptime commented Dec 14, 2021

knaaptime commented Aug 1, 2022

bstabler commented Aug 2, 2019 •

edited

Loading