Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select link analysis #260

Closed
wants to merge 27 commits into from
Closed

Conversation

janzill
Copy link
Contributor

@janzill janzill commented Jun 12, 2021

Select link analysis via on-disk path files. This is slower than an implementation during assignment, however it has the advantage that it works for any link(s) as long as path files have been stored during assignment.

The main reason to go down this path is that path file saving is now available and therefore this implementation is a lot less work.

@janzill janzill added the WIP Work in Progress label Jun 12, 2021
@janzill
Copy link
Contributor Author

janzill commented Jun 12, 2021

@pedrocamargo basic functionality for MSA and FW is here. I need to

  1. turn my dev notebook stuff into proper tests
  2. handle matrices with several cores, atm it's all 2d
  3. implement the calculation CFW and BFW contributions per iteration
  4. implement turning network ids into compressed graph ids, atm you need to provide the latter

Pipfile Outdated Show resolved Hide resolved
aequilibrae/paths/assignment_paths.py Show resolved Hide resolved
aequilibrae/paths/assignment_paths.py Outdated Show resolved Hide resolved
# - some assertions as per constructor comments
# - demand matrix operations do not take cores into account (i.e. we assume we have a 2d demand matrix per class)
# - tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a TODO here for later that is to have the code to find the matrices that were used from the assignment report.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I'd like that. at the moment we're passing in the demand, but doing this automatically would be much better

}
return select_link_matrices

def run_select_link_analysis(self, link_ids: List[int]) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the annotation be " --> AequilibraeMatrix"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a dict of dict (mode and link id) of np.array, but should probably make it an Aequlibraematrix instead of np.array

aequilibrae/paths/select_link.py Outdated Show resolved Hide resolved
@janzill
Copy link
Contributor Author

janzill commented Jun 17, 2021

Wouldn't it make much more sense to parallelize it over origins? I guess it would allow us to use a higher number of cores. Wouldn't it?

Sure, it's just not as straight forward with having to read in files, and the resulting path-link arrays and indexes not being fixed length. I didn't have much time at the end of the day (middle of the night really :) ) so I decided to go for the quickest win

@pedrocamargo
Copy link
Contributor

Wouldn't it make much more sense to parallelize it over origins? I guess it would allow us to use a higher number of cores. Wouldn't it?

Sure, it's just not as straight forward with having to read in files, and the resulting path-link arrays and indexes not being fixed length. I didn't have much time at the end of the day (middle of the night really :) ) so I decided to go for the quickest win

Do you mean that the reading would have to happen in each thread separately?

@janzill
Copy link
Contributor Author

janzill commented Dec 2, 2022

I think this could be handled much more efficiently with duckdb - just need to optimise the on-disk path file storage (currently many small parquet files because it is set up to save one per origin and traffic class and iteration - could be as simple as combining these) and then merge demand and query. I won't have time to do this anytime soon but will come back to it at some point.

@pedrocamargo
Copy link
Contributor

@jamiecook , I had forgotten about this PR Jan had prepared a long time ago. Let's discuss this. We either incorporate it scrap it

@janzill
Copy link
Contributor Author

janzill commented Feb 13, 2023

I'm happy for this to be deleted given that there is a proper implementation during assignment now. This was meant as a quick fix to enable the functionality without spending too much time on it. It's also old - if I'd do this today I'd just process the parquet files with duckdb and most of the code above would go away.

@Art-Ev
Copy link
Contributor

Art-Ev commented Nov 7, 2023

@pedrocamargo, don't know what has been decided for this PR but when calibrating a model, a solution to do link analysis without re-running a full assignment can save a lot of time (with a menu like flow bundle in VISUM) even with the needed storage. Being able to do both (save for later or direct link analysis during assignment like Jake commits) would be awesome!

If we keep both solutions, I would be happy to try to work on the use of path_file. But I don't think I'll be really relevant about the creation of path_file itslef (structure, duckdb or feather, ...). Maybe somebody could have a brillant idea for a quick-win based on this PR for path file savinf ? (@janzill, @Jake-Moss or maybe @djfrancesco ?)

@janzill
Copy link
Contributor Author

janzill commented Nov 7, 2023

@pedrocamargo, don't know what has been decided for this PR but when calibrating a model, a solution to do link analysis without re-running a full assignment can save a lot of time (with a menu like flow bundle in VISUM) even with the needed storage. Being able to do both (save for later or direct link analysis during assignment like Jake commits) would be awesome!

If we keep both solutions, I would be happy to try to work on the use of path_file. But I don't think I'll be really relevant about the creation of path_file itslef (structure, duckdb or feather, ...). Maybe somebody could have a brillant idea for a quick-win based on this PR for path file savinf ? (@janzill, @Jake-Moss or maybe @djfrancesco ?)

I think we already have all the pieces for this, but there are some improvements that would help make this more useable.

  1. Changes to path files. Currently, these are in feather format by default and we partition over origins by including the origin id in the file name. I suggest we switch to using parquet and hive-partitioning files over origins (i.e., origins are not part of the files or file names themselves but stored as the folder name where all data in a folder is understood to refer to one origin). Duckdb supports this if we want to go down that path.
  2. I never implemented path weights for CFW and BFW, but I think the logic to generate these might exist already in extract turning movements #358

@pedrocamargo pedrocamargo deleted the branch master February 24, 2024 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Work in Progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants