Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fastest way to access skims #8

Closed
fscottfoti opened this issue Dec 12, 2014 · 17 comments
Closed

Fastest way to access skims #8

fscottfoti opened this issue Dec 12, 2014 · 17 comments

Comments

@fscottfoti
Copy link
Contributor

@jiffyclub At some point pretty soon we'll want to diagnose the fastest way to access skims. Given that we store the skims in OMX format (we might want to consider packing multiple matrices into a single h5 for convenience?), the big question is how to store/access them in memory.

Given our recent history with the .loc command I'm guessing storing zone_ids directly is basically a non-starter. Fortunately, we're storing a dense matrix so we can make sure every zone_id is in position 1 greater than it's index (i.e. zone 1 is in index 0). That way we can either 1) have a dataframe with a multi-index and call .take or 2) have a 2-D numpy array and access then directly, but only for one column at a time. Do we think that 1) is slower than 2) because 1) is definitely more attractive from a code perspective. I guess this "stacked" vs "unstacked" format.

At any rate, we should probably write a small abstraction to hide this from the user. Basically we pass in one of the formats above with dimension N and then pass in two series of "origin" and "destination" zone ids and get back the values.

@jiffyclub
Copy link
Contributor

We can definitely have more than one matrix per file. I imagine having two stores: one pandas.HDFStore for tables and Series, and another OMX HDF5 file for NumPy arrays.

@jiffyclub
Copy link
Contributor

One thing to keep in mind about .take is that it works with location based indexes, you're doing essentially the same procedure you'd be doing with NumPy arrays. If you had zone IDs you'd still need to translate them into locations before using .take.

@fscottfoti
Copy link
Contributor Author

Right, I assume we have to make the assumption that zone ids have a deterministic relationship with indexes in order to make this work fast.

@jiffyclub
Copy link
Contributor

With that assumption we're definitely going to get the best performance from NumPy arrays. Pandas is slower at pretty much any size, but indexing Pandas things really doesn't scale well with size.

screenshot 2014-12-15 12 18 57

@fscottfoti
Copy link
Contributor Author

OK - we should create a small matrix object to wrap up the numpy methods we'll need. There won't be too many, at least at first.

@jiffyclub
Copy link
Contributor

What are the approximate sizes of the things we're talking about? I think you've said skims will be roughly ~1000 x ~1000. How many numbers will we be pulling out of that at once?

@fscottfoti
Copy link
Contributor Author

Right now it's about 1500x1500, but liable to go up significantly in the future. I would imagine it would have to fit in memory or it would be quite slow. We will pull on the order of several million numbers at once, if we can.

@e-lo
Copy link
Collaborator

e-lo commented Dec 19, 2014

Apologies if this comment is off base...

Multiple tables/file
While this simplifies things for a specific situation, it also is a tad inflexible. At SFCTA, we have opted to not do this for several reasons that we have encountered thusfar:

  • If we want to distribute something to another machine or open a file...it is a lot better to be specific for network I/O, RAM, and CPU
  • Sending and using skim files for post-processing utilities or giving them out to consultants or other users. Smaller and more specific the better.
  • Easier to see where differences are between runs and troubleshooting when they are separate (i.e. by using Git binary diff or other)

There are probably other reasons to keep them separate (and plenty of reasons to keep them together), but that is my 2 cents.

@e-lo
Copy link
Collaborator

e-lo commented Dec 19, 2014

Just another comment that any abstractions or things that would be useful to go into OMX itself should go there instead of ActivitySim :-)

@DavidOry
Copy link

Two potentially helpful notes:

(1) Roughly speaking, in current practice we tend to keep square matrices with spatial units in the 5k to 10k range and move to non-square matrices with spatial units in the 10k to 40k range. In the Travel Model Two system, we have three spatial systems rather than one, with approximate sizes: 6,000 (auto movements), 5,000 (transit stops), and 40,000 (human movements).

(2) Many of these matrices are sparse. As you go, you'll learn that we often do not need to store zero values.

@e-lo
Copy link
Collaborator

e-lo commented Dec 24, 2014

@DavidOry - it might be helpful to clarify the units of your sizes (#rows/columns, cells, or MB)

@DavidOry
Copy link

So in Travel Model Two, we have:

(1) Travel Analysis Zones, ~6,000. We skim automobile travel time, distance, cost, etc. So we end up with a whole bunch of 6,000 x 6,000 matrices.

(2) Transit Access Points, ~5,000. We skim transit stop-to-stop times, fares, transfers, etc. So we end up with a whole bunch of 5,000 x 5,000 matrices.

(3) Micro-Analysis Zones, ~30,000. We skim nearby automobile travel time, walk time, bicycle time, walk/drive to Transit Access Point time. So we have 30,000 rows by X column data files, where X is a function of distance thresholds, but is around ~5,000. So we have a bunch of 30,000 rows x ~5,000 column data files.

All of these are read into memory.

@e-lo : Does that help?

@e-lo
Copy link
Collaborator

e-lo commented Dec 24, 2014

Exactly @DavidOry

And from our call last week it sounded like currently in CT RAMP all of those are read into memory at once at the outset of each full iteration of the travel model. I'm sure there is probably a way to simplify this task other than having to open them for each instance in the event that we parallelize the problem. maybe i hope?

@DavidOry
Copy link

Currently CT-RAMP reads them in whenever it encounters a model that uses them. Because mode choice uses just about all of them and one of the first tasks is to compute mode choice logsums, they are, for all intents and purposes, all read in and stored in memory at the beginning and updated at each global iteration. Importantly, a copy of the matrices is then made on each computer doing the parallel computations. So if the matrices have a memory footprint of, say, 50 GB, you need 50 GB of RAM on each computer you want to contribute to the effort. If your other model components have a footprint of say, 75 GB, you end up needing several computers with 125 GB of RAM to do the computations. That's where we are at now and part of @danielsclint frustration with not being able to distribute the tasks to smaller CPUs.

@fscottfoti
Copy link
Contributor Author

Wanted to renew this thread as well. A few separate issues...

  • I'm not too concerned with the dense (square) vs sparse matrices issue. Pandas has a stack and unstack operation which essentially transforms from one to the other if used with the appropriate indexes. stack would go from dense to sparse and unstack would do the opposite. For now we will do the square matrices but we will keep in mind that sparse matrices will be used when the matrices get too large.
  • For the issue of storing one or more matrices per file, I think that's not too big a deal either. hdf5 is a pretty lightweight file format - we can choose to put one matrix in there or multiple matrices in there for different use cases. Generally, I can definitely see passing one matrix at a time from machine to machine, but if we end up using most of the matrices for the bottleneck of the model it might be convenient to store them together (it's essentially like a zip file, not really a format).
  • As for the question of the assumption of whether zone_ids can be indexes (zero based and contiguous) it sounds like that is not guaranteed at all. From a computational perspective though, it is required, so we will have a step at the beginning to convert labels to indexes, and for the early stages of the project we will assume that zone ids will be zero-based and contiguous (and that we'll have to fix that before we're through).
  • My biggest concern is just how many skims will be necessary for which models, and if there are any ways to reduce memory or parallelize computation. Generally we should get this working functionally first and we will keep an eye out for problems that can be made more efficient as we go. It's likely these issues will come up again and again.

@guyrousseau
Copy link

Currently @arc with CT-RAMP, in theory, the JPPF Driver application balances the load across the nodes so as to avoid the situation of machines sitting and waiting for others to finish. However, the balancing algorithm depends upon the configuration of a set of parameters specified in the jppf properties files. We have not had the opportunity to re-examine and fine-tune the settings since our new server was added to our cluster. Similarly, the thread number settings could also be further tweaked. So in our case, typically a single, best performing server, does a lot of the heavy lifting and probably handles 50% of the tasks. Thus, the workload being handled by each of the remaining machines (servers) probably does not justify the associated overhead relating to skim loading and task communications across the network.

@guyrousseau
Copy link

sorry I did not mean to close this thread of conversation

@guyrousseau guyrousseau reopened this Feb 10, 2015
@bstabler bstabler closed this as completed Apr 7, 2016
jpn-- added a commit that referenced this issue Jan 28, 2023
jpn-- added a commit that referenced this issue Jan 31, 2024
* initial commit

* Input checker runs, but log file is written incompletely

* updated input checker to write out full log file

* blacked input_checker.py

* data model integration initial commit

* initial pandera commit

* initial cleanup

* tested on full mtc sample with landuse and skim check

* lu types in enums & input cleanup

* initial commit of both pydantic & pandera

* formatting

* Getting into sync with ActivitySim/develop (#8)

* BayDAG Contributions (#657)

* code changes to add joint tour utilities in cdap

* add joint tour frequency composition component

* post process jtfc result to tours

* update joint tour participation

* update #inmtf to tm2 specs

* allow alternative id > 127

* set up testing infrastructure

* cdap test script

* cdap configs

* set up testing infrastructure

* joint tour test script

* joint tour configs

* set up testing infrastructure

* nm tour frequency test script

* nm tour frequency configs

* add jtfc alt table dictionary to yaml

* update jtfc yaml

* jtfc move coef values to coef.csv

* jtfc update preprocessor

* add jtfc alt table dictionary to yaml

* nmtf consolidate all -999 to one coef

* code changes to fix bug in parking location choice model

* Revert "code changes to fix bug in parking location choice model"

This reverts commit 5a38ebb.

* parking location choice bug fix

* move coefficients to csv file

* restore the original mandatory channels

* restore the original mandatory channels

* RSG Phase 7 Development (#49)

* school escorting initial commit

* bundles created

* added pure escort tours after non_mand_sched

* created escorting tours & trips

* integrated escort tours and trips

* Differentiate examples between quality and validity of example models (#571)

* change examples that are no longer tied to an agency to fictional place names

* change name of full example also

* add back missing output dir

* restore test output dir also

* more empty dirs that got lost

* clean up docs

* example_mtc -> prototype_mtc

* Prototype MTC extended

* add all the ignored files

* add test output dirs

* remove superfluous example_test

* prototype_sf

* prototype_arc

* prototype_marin

* move dirs

* psrc

* semcog

* sandag_xborder

* placeholder_sandag

* placeholder_multiple_zone

* no more coveralls

* repair docs

* clean up example names

* black and isort (#578)

* black and isort

* stop emitting output dir

it fails later tests

* trace files in nested directories

because windows

* swap files for xborder model

* repair ARC MP

* downstream model integration

* print checksum even when not used

* add hashes for sandag_xborder_full

* fix dtype in university hack

* fix persons to match tours

* repair ARC

* initial commit of flexible tour/trip ids

* pycodestyle

* black formatting

* Bump version: 1.0.4 → 1.1.0

* adding frequency alternatives configs required for test system

* added additional unit tests

* added setup function for test configs dir

* formatting

* handling missing alts files in examples

* error catching around extension probs

* passing tests

* still passing tests around missing config files

* accounting for missing mtf spec in marin example

* nm tour scheduling passing

* updating stop freq in tours table

* added check for mand tour overlap with pe tours

* num chauffeurs and escortees

* fixed defaults not getting used if file missing

* merging canonical ids from flexible id work

* setting escorting tour and trip ids

* remove unneeded nmtf settings file

* black formatting

* excluding escort trips from trip scheduling

* fixing bug where mand tours not getting assigned correct id

* adding school_escorting to mp model list

* Added mwcog small area

* missed one edit

* adding school escort tour flavors as own category

* reformatting

* updating timetable windows with pure escort tours

* additional logging

* Update README.MD

* call as module

* github actions tests

* adding non-mand destination changes

* pre commit hooks

* pyproject toml

* limit numpy

* docs for mamba instead of conda

* ignore generated files

* add nbmake to test env

* fix dupe line

* fixing bad tdd merge

* repair test multiple zones for github actions

* publish develop docs

* fix docbuild env

* merging tdd alts to all tours

* adding ride share threshold to unvailability for pure escort time overlap

* cleanup

* Update .travis.yml

* fixed testing files

* fixed testing files (again)

* fixed test script again

* mins per time bin

* black formatting

* black formatting

* fixing reindex import bug

* fixing missed import

* replacing trips test table

* inserting default setting if no models in config for tests

* publish docs to branch name

* adding setup function to tests to set configs_dir injectable

* documentation

* updated testing scripts (note shorter travis script for now)

* fixed slash (windows vs. linux testing issue)

* added output folders

* updated travis script to run all tests, should pass

* docs cleaning

* docs re-style

* rebuild

* dynamic versioning docs

* version switcher

* blacken

* fix switcher url

* fix conf

* switcher update

* master to main

* deployment actions

* actions

* build wheel

* fix for testpypi

* blacken

* manual switcher

* branch docs service [makedocs]

* syntax [makedocs]

* travis depends

* checkout v3, fix versioning in docs

* only build develop docs once

* failsafe version

* documentation repairs

* python-simplified-semver

* front cards

* end testing w travis

* add mwcog test to gh-actions

* add mwcog to docs

* fixed origin bug and missing outbound trip to work

* point to data not copy it

* sort dependencies

* req sh 2.2.4

* account for variance across platforms in trip dest est

* copy bike skims for sandag test

* proper cleanup when trips that get removed if schedule failed

* changed failed trip scheduling option for example

* updating regress table

* blacken

* extending nmtf and stop frequency to demonstrate flexible ids

* param on rtol

* fix sandag_2 test files

* fix test file names

* allowing multiple escort tours in the same period

* sandag 3-zone example fix

* fixing expression for missing escort participants sandag 3_zone

* cleanup

* blacken

* Update trip_destination.py

applying valid primary origin default value

* Update trip_destination.py

blacken

* fixing bug for inbound destination for second escortee

* blacken

* Disaggregate Accessibilities (#5)

* initial commit with basic working proto pop gen

* fixed bad dependency range

* fixed vary_on method to vary all with mapped fields after

* added check if already dataframe

* added dataframe check if the table passed is already a dataframe

* working proto pop gen. still debugging mandatory work/school model

* check all table is df

* workplace runs, next testing school

* began setting up to get all logsums, not just chosen ones

* restructured to run as either one-off or model step

* revert to last merge. Removed dataframe check in iterate, no longer needed to modify this file

* extracted logsums, need to inject into pipeline or save output?

* working standalone or step model

* extractable accessibilities for fixed and nonmandatory

* extractable accessibilities for fixed and nonmandatory

* cleanup and add documentation

* resolved duplicates of old 'example' folders

* model uses write_tables functionality

* disaggregate accessibility runs as model step or standalone. Runs as subprocess

* working model test. table initialization not yet working

* added logsum short circuit to get logsums and avoid drawing unneccesary samples

* working model plus major cleanup

* working model plus major cleanup

* override sample size

* fixed trace output files

* skip_choice default to False

* fixed empty logsums bug

* cleanup redundent parameters

* fixed list index instead of set

* coordinated model run files

* added default skip to false

* added multiprocessing line

* began setup for multiprocessing

* fixed sampling problem. Pipeline for multi processing, NOT WORKING YET

* deleted run file, not needed anymore

* update release instructions

* auto-build docs on release

* remind to update switcher

* need checkout

* added helper functions for handling suffix args

* to enable multiprocessing, overhauled pipeline method to include proto_ tables rather than overwriting main tables

* working multiprocess, but requires debugging. fails on other models

* working multiprocess, but requires debugging. fails on other models

* cleanup of mp table registering

* fixed tracing and slicing issue

* removed old 'run' model

* setup example 2 zones

* minimum working 2 zone

* multizone basic working

* fixed buggy settings

* multiprocessing working, but empty final tables

* fixed blank output bug and cleaned up settings to use verbose table names (not just 'disaggregate_accessibility')

* cleanup run scripts a bit

* fixed missing base path for full run

* fixed missing path

* fixed typo

* fixed 1 per taz sampling

* fixing person merge bug and filtering workers

* fixed duplicate tours!

* updated settings for initialize disagg

* removed obsolete file

* cleaned up file and moved initialize to tables/...

* added find nearest node function

* added initialize disaggregate_accessibility to initialize_households

* moved initialize steps to 'tables/disaggregate_accessibility'

* updated settings to include initialize disaggregate accessibilities as part of initialize_households

* moved initialize disagg accessibilities to tables. Created working merge step using naive bayes and 'hard' join

* add mp back in

* PEP formatting revisions

* fixed logsums merge on households instead of persons

* fixed _accessibility suffix

* fixed conflict with persons_merged

* updated yaml to use simpler join method

* PEP formatting fixes

* refreshed example folder from develop

* added missing line at end of file

* black fixes

* black fixes to disaggregate accessibility changes

* fixed missing column pipeline error

* merged disagg accessibilities into mtc_extended and added doc section

* ran black on disaggregate_accessibility.py

* updated dependencies

* removing sklearn imports

* blacken

* add if none catch

* fixed None suffix default

* moved order of get_table(persons_merged) to avoid pulling prematurely in xborder

* tested and cleaned up rng channels

* setup injectable suffixes to allow add_size_table as model step not just function

* removed accessibility output from test

* re blacken py scripts

* fixed tracing typo

* added variable index name suffix to pass optionally

* pipeline housekeeping to cleanup any tables, traceables, or channels that were touched during disagg

* added multiprocess testing

* blacken updates

* updated test scripts to include MP, problem with vehicle model in mp (even without disagg acc?)

* added improved origin sampling, resolved issue with merging with sample <100%. Need to address random seed for origins

* added sci-kit learn to test depends

* fixed person merging error causing pytest fail, uses inject method to ensure consistency

* cleanup comments

* fixed pytest to include accessibility table in regress

* setup for mp test, but needs debugging

* 'blacken'

* 'blacken'

* cleanup example folder

* fixed pipeline NoneType bug for disagg accessibility table

* fixed pytest fail on mp, due to exept:True on mp_simulate

* 'blacken'

* added weighted k-means method

* created run script for multiple sampling scenarios

* blacken changes

* blacken

* fixed copy script

* blacken sampling script

* fixed n_zone when integer

* fixed typo

* update sampling script

* 'blacken'

* fixed replacement sample bug

* updated documentation

* more flexible scenario testing

* blacken

Co-authored-by: Nick Fournier <nick.fournier@rsginc.com>
Co-authored-by: Jeffrey Newman <jeffnewman@camsys.com>
Co-authored-by: Nick Fournier <99497883+nick-fournier-rsg@users.noreply.github.com>

* Shadow Pricing Enhancements (#7)

* updated scripts to include simulation-based shadow pricing

* blacken

* Updated shadow_pricing.yaml for mtc example

* code cleanup

* more cleanup

* documentation and passing tests

* passing tests

* passing tests

* updated doc on shadow pricing

* 2nd Update model doc on shadow pricing

* more doc update on shadow pricing

* fixing pandas future warning

* blacken

* bug in trying to access shadow price settings when not running shadow pricing

* limiting pandas version

* always updating choices

* testing removal of lognormal for hh vot

* putting hh vot back in

* updating to match sharrow test versions

* raw person table for buffer instead of injectable

* adding segmentation, output by iteration, and external worker removal

* formatting & documentation

Co-authored-by: aletzdy <58451076+aletzdy@users.noreply.github.com>

* updating mtc_extended test files

* no sp for non-work or school, better logging, weighting options

* sample TAZ only if available MAZ when shadow pricing

* adding missed weight column option

* cleaning up comments

Co-authored-by: Jeffrey Newman <jeffnewman@camsys.com>
Co-authored-by: Andrew Rohne <andrew@siliconcreek.net>
Co-authored-by: Nick Fournier <nick.fournier@rsginc.com>
Co-authored-by: Nick Fournier <99497883+nick-fournier-rsg@users.noreply.github.com>
Co-authored-by: aletzdy <58451076+aletzdy@users.noreply.github.com>

* set up testing infrastructure (#40)

Co-authored-by: Lisa Zorn <lzorn@bayareametro.gov>

* BayDAG auto ownership configs and test (#41)

* set up testing infrastructure

* auto ownership test script

* auto ownership configs

* ao move coef values to coef.csv

Co-authored-by: Lisa Zorn <lzorn@bayareametro.gov>

* blacken

* more formatting

* blacken

* blacken

* blacken

* BayDAG parking location choice configs and test (#45)

* code changes to fix bug in parking location choice model

* parking location choice configs

* Revert "code changes to fix bug in parking location choice model"

This reverts commit 5a38ebb.

* parking location choice bug fix

* parking location choice config updates

move coefficients to csv file and activitysim variable naming consistency

* parking location choice summary

Co-authored-by: Lisa Zorn <lzorn@bayareametro.gov>

* blacken

* cdap bug fixes

* adding sklearn to dependencies

* Update activitysim-dev.yml

fixing bad reference to scikit-learn

* Update activitysim-test.yml

fixing bad reference to scikit-learn

* Update setup.cfg

fixing bad reference to scikit-learn

* Memory Bug Fix

* working with 2k no sp

* optimize nearest zone determination

* Changed data type of participant_id in candidates in joint_tour_participation_candidates() to unsigned 64-bit integer to prevent overflow

* better comments

* fix overflow in jtp participant id

* blacken

* forcing participation

* Trip Scheduling (#51)

* Differentiate examples between quality and validity of example models (#571)

* change examples that are no longer tied to an agency to fictional place names

* change name of full example also

* add back missing output dir

* restore test output dir also

* more empty dirs that got lost

* clean up docs

* example_mtc -> prototype_mtc

* Prototype MTC extended

* add all the ignored files

* add test output dirs

* remove superfluous example_test

* prototype_sf

* prototype_arc

* prototype_marin

* move dirs

* psrc

* semcog

* sandag_xborder

* placeholder_sandag

* placeholder_multiple_zone

* no more coveralls

* repair docs

* clean up example names

* black and isort (#578)

* black and isort

* stop emitting output dir

it fails later tests

* trace files in nested directories

because windows

* swap files for xborder model

* repair ARC MP

* print checksum even when not used

* add hashes for sandag_xborder_full

* fix dtype in university hack

* fix persons to match tours

* repair ARC

* Bump version: 1.0.4 → 1.1.0

* Added mwcog small area

* missed one edit

* reformatting

* Update README.MD

* call as module

* github actions tests

* pre commit hooks

* pyproject toml

* limit numpy

* docs for mamba instead of conda

* ignore generated files

* add nbmake to test env

* fix dupe line

* repair test multiple zones for github actions

* publish develop docs

* fix docbuild env

* Update .travis.yml

* fixed testing files

* fixed testing files (again)

* fixed test script again

* publish docs to branch name

* updated testing scripts (note shorter travis script for now)

* fixed slash (windows vs. linux testing issue)

* added output folders

* updated travis script to run all tests, should pass

* docs cleaning

* docs re-style

* rebuild

* dynamic versioning docs

* version switcher

* blacken

* fix switcher url

* fix conf

* switcher update

* master to main

* deployment actions

* actions

* build wheel

* fix for testpypi

* blacken

* manual switcher

* branch docs service [makedocs]

* syntax [makedocs]

* travis depends

* checkout v3, fix versioning in docs

* only build develop docs once

* failsafe version

* documentation repairs

* python-simplified-semver

* front cards

* end testing w travis

* add mwcog test to gh-actions

* add mwcog to docs

* point to data not copy it

* sort dependencies

* req sh 2.2.4

* account for variance across platforms in trip dest est

* copy bike skims for sandag test

* param on rtol

* fix sandag_2 test files

* fix test file names

* added pre-processor option to trip scheduling

* trip scheduling relative mode initial commit

* moved everything to mwcog example

* adding output analysis notebook

* adding additional segmentation

* testing and documentation

* blacken

* not assuming scheduling mode is set

* fixing merge trip scheduling

* still fixing merge

* reverting regression trips

* fixing bad merge and updating test

Co-authored-by: Jeffrey Newman <jeffnewman@camsys.com>
Co-authored-by: Andrew Rohne <andrew@siliconcreek.net>

* Trip Scheduling Bug & Trip Mode Choice Annotate (#55)

* other sp methods price updates

* Trip Scheduling into Resident Debug (#54)

* Differentiate examples between quality and validity of example models (#571)

* change examples that are no longer tied to an agency to fictional place names

* change name of full example also

* add back missing output dir

* restore test output dir also

* more empty dirs that got lost

* clean up docs

* example_mtc -> prototype_mtc

* Prototype MTC extended

* add all the ignored files

* add test output dirs

* remove superfluous example_test

* prototype_sf

* prototype_arc

* prototype_marin

* move dirs

* psrc

* semcog

* sandag_xborder

* placeholder_sandag

* placeholder_multiple_zone

* no more coveralls

* repair docs

* clean up example names

* black and isort (#578)

* black and isort

* stop emitting output dir

it fails later tests

* trace files in nested directories

because windows

* swap files for xborder model

* repair ARC MP

* print checksum even when not used

* add hashes for sandag_xborder_full

* fix dtype in university hack

* fix persons to match tours

* repair ARC

* Bump version: 1.0.4 → 1.1.0

* Added mwcog small area

* missed one edit

* reformatting

* Update README.MD

* call as module

* github actions tests

* pre commit hooks

* pyproject toml

* limit numpy

* docs for mamba instead of conda

* ignore generated files

* add nbmake to test env

* fix dupe line

* repair test multiple zones for github actions

* publish develop docs

* fix docbuild env

* Update .travis.yml

* fixed testing files

* fixed testing files (again)

* fixed test script again

* publish docs to branch name

* updated testing scripts (note shorter travis script for now)

* fixed slash (windows vs. linux testing issue)

* added output folders

* updated travis script to run all tests, should pass

* docs cleaning

* docs re-style

* rebuild

* dynamic versioning docs

* version switcher

* blacken

* fix switcher url

* fix conf

* switcher update

* master to main

* deployment actions

* actions

* build wheel

* fix for testpypi

* blacken

* manual switcher

* branch docs service [makedocs]

* syntax [makedocs]

* travis depends

* checkout v3, fix versioning in docs

* only build develop docs once

* failsafe version

* documentation repairs

* python-simplified-semver

* front cards

* end testing w travis

* add mwcog test to gh-actions

* add mwcog to docs

* point to data not copy it

* sort dependencies

* req sh 2.2.4

* account for variance across platforms in trip dest est

* copy bike skims for sandag test

* param on rtol

* fix sandag_2 test files

* fix test file names

* added pre-processor option to trip scheduling

* trip scheduling relative mode initial commit

* moved everything to mwcog example

* adding output analysis notebook

* adding additional segmentation

* testing and documentation

* blacken

* not assuming scheduling mode is set

* fixing merge trip scheduling

* still fixing merge

* reverting regression trips

* fixing bad merge and updating test

Co-authored-by: Jeffrey Newman <jeffnewman@camsys.com>
Co-authored-by: Andrew Rohne <andrew@siliconcreek.net>

* adding annotate to trip mode choice

* no earliest change if previous trip fail

Co-authored-by: David Hensle <davidh@sandag.org>
Co-authored-by: Jeffrey Newman <jeffnewman@camsys.com>
Co-authored-by: Andrew Rohne <andrew@siliconcreek.net>

* adding locals_dict to annotate

* chooser cols in final trips table

* BayDAG merge with ActivitySim v1.2  (#56)

* fixed logsums merge on households instead of persons

* code cleanup

* more cleanup

* documentation and passing tests

* passing tests

* fixed _accessibility suffix

* fixed conflict with persons_merged

* updated yaml to use simpler join method

* PEP formatting fixes

* refreshed example folder from develop

* added missing line at end of file

* black fixes

* black fixes to disaggregate accessibility changes

* passing tests

* ext cli arg

* memory sidecar

* restart if resume_after checkpoint is missing

* no pandas 1.5 yet

* stop the sidecar

* updated doc on shadow pricing

* 2nd Update model doc on shadow pricing

* more doc update on shadow pricing

* minor repairs

* skip when household_income is None

* fixed missing column pipeline error

* fixing pandas future warning

* blacken

* merged disagg accessibilities into mtc_extended and added doc section

* ran black on disaggregate_accessibility.py

* bug in trying to access shadow price settings when not running shadow pricing

* limiting pandas version

* updated dependencies

* always updating choices

* removing sklearn imports

* blacken

* add if none catch

* fixed None suffix default

* moved order of get_table(persons_merged) to avoid pulling prematurely in xborder

* testing removal of lognormal for hh vot

* putting hh vot back in

* tested and cleaned up rng channels

* setup injectable suffixes to allow add_size_table as model step not just function

* removed accessibility output from test

* re blacken py scripts

* updating to match sharrow test versions

* fixed tracing typo

* added variable index name suffix to pass optionally

* pipeline housekeeping to cleanup any tables, traceables, or channels that were touched during disagg

* added multiprocess testing

* blacken updates

* updated test scripts to include MP, problem with vehicle model in mp (even without disagg acc?)

* added improved origin sampling, resolved issue with merging with sample <100%. Need to address random seed for origins

* added sci-kit learn to test depends

* fixed person merging error causing pytest fail, uses inject method to ensure consistency

* cleanup comments

* fixed pytest to include accessibility table in regress

* setup for mp test, but needs debugging

* 'blacken'

* 'blacken'

* cleanup example folder

* fixed pipeline NoneType bug for disagg accessibility table

* raw person table for buffer instead of injectable

* add mp test for prototype_mtc_extended

* fixed pytest fail on mp, due to exept:True on mp_simulate

* 'blacken'

* added weighted k-means method

* created run script for multiple sampling scenarios

* blacken changes

* blacken

* fixed copy script

* blacken sampling script

* fixed n_zone when integer

* fixed typo

* Update random.py

* Updated test environment to reflect changes in pandas

* Removed random random seed generation from placeholder_sandage 1-zone test configuration to get example to pass automated test

* update sampling script

* 'blacken'

* fixed replacement sample bug

* adding segmentation, output by iteration, and external worker removal

* updated documentation

* more flexible scenario testing

* blacken

* Added test_random_seed.py

* blacken test_random_seed.py

* Moved random seed generation testing out of prototype_mtc folder and edited core_tests.yaml to call it

* Corrected name of check_outputs()

* Added print statements to debug script

* Blacken test_random_seed.py

* keep alternative attributes in parking location

* move coeff values to coeff csv

* Added pipe character to call that tests the random generator

* Added explicit call for pytest to run test_random_seed.py

* Added print statement at start of script for debugging

* Replaced single quotes with double quotes in print statement

* reformat with Black

* Moved random seed test into activitysim folder

* Edited core_tests.yml to reflect change in location of random seed test

* Renamed tests folder to test to reflect call in core-tests.yml

* formatting & documentation

* Added test_random_seed function for PyTest to read

* Fixed defition of seeds list and removed print statements

* updating github test env

* responding to review comments

* Added line to create output folder as that's not being committed

* create_rng_configs() now copies entire example configs directory instead of just settings.yaml

* Added try except statement when creating the output directory in case it already exists

* additional logging, random seed option, landuse sample weight column

* ensuring TAZ is not selected if no available MAZ

* adding logic to skip external location choice models

* Applied suggested changes

* Removed activitysim\test\random_seed\configs directory to reduce confusion

* Moved random seed test directory to  folder that had already existed

* temporarily constrain pandas #608

* work around psutil runtimeerror

* temp constrain pandas

* add mwcog full example

* review responses

* revisions per Sijia's comments, except for the proto-table-template implementation, still working.

* added template pop option

* blacken

* set back to create_tables to pass test against old tables

* updated proto-pop test data

* resolved bad test data issue

* warn don't fail on duplicate skims

* include sharrow in mwcog example

* cleanup

* zarr digital encoding for mwcog

* include reference in psrc mini

* fow window ids to int64 because windows

* handle expected warnings

* unpin dependencies

* columns cannot be a set

* unpin pandas

* columns cannot be a set

* numeric_only args in groupby.sum

* docs: correct the unit of `chunk_size`.

* change 'arry' to more readable 'sizearray'

* repair tests

* catch FileNotFound

* add check that land use is zero-based when using sharrow

* better docs on skims in shared memory

* better docstrings for skimdataset

* remove old code

* remove junk

* add logging for recoding

* optimize nearest zone determination

* consistent multiprocessing results

* blacken

* updating regression trips

* adding sandag 2-zone test

* blacken

* fix merge error

* allow pandas 1.5 again

* sharrow compatibility with #606

* remove outdated code in comments

* note on tracing limitation

* allow repeating key cols

* fix decoding in MP output

* escortee modes match chauf modes

* updating regression tests

* fixing bad var name in inbound expressions

* add note about `condense_parameters`

* change default condense_parameters to False

* Update .gitignore

Co-authored-by: Sijia Wang <wangsijia0628@gmail.com>

* ignore caches

* remove special install instructions

* remove sharrow_skip's

* Revert "remove sharrow_skip's"

This reverts commit 6b52af2.

* add note about why sharrow_skip is activated

* Update school_escorting.yaml

* Update config.py

* arc trip destination spec skim correction

* update regression trips

* require sharrow 2.5

* fix mtc_ext for sharrow

* fix test consistent with escort mode alignments

* allow sharrow skipping

* remove stray indent

* delete expression values after sharrow testing to make debugging easier

* require sharrow 2.5.2

* recode only once

* testing repairs

* note about why sharrow_skip

* fork mtc_extended testing on shadow pricing and sharrow

* note on sharrow_skip

* size term scaling only for full run

* reordering

* multiprocessing hangs!

* never scale size terms for disagg

* tally pending persons

* tests cleanup

* fix error with orca

* let pipeline tables be dropped

* check on available MAZs, comment cleanup

* work zone to -1 if wfh, removed unneeded extensions

* fix tests for sandag 2zone

* ignore deprecation warning in tests

not clear why this started being a problem suddenly in only the xborder-MP model

* add scikit-learn to formal dependencies

* docs update for installer

* fixing black to 22.12.0

* missed merge conflicts

* fixing merge issues in prototype_semcog

* fixing prototype_arc merge

* fixing placeholder_sandag merge issues

* blacken

* fixing mwcog merge issues

* skims not found during estimation test

---------

Co-authored-by: Nick Fournier <nick.fournier@rsginc.com>
Co-authored-by: Jeff Newman <jeffnewman@camsys.com>
Co-authored-by: aletzdy <58451076+aletzdy@users.noreply.github.com>
Co-authored-by: Jeffrey Newman <jeff@newman.me>
Co-authored-by: Nick Fournier <99497883+nick-fournier-rsg@users.noreply.github.com>
Co-authored-by: Joe Flood <joejimflood@gmail.com>
Co-authored-by: Sijia Wang <wangsijia0628@gmail.com>
Co-authored-by: amarin <17020181+asiripanich@users.noreply.github.com>

* do not require all alts in data file

* int8 to int16 lost in v1.2 merge

---------

Co-authored-by: Lisa Zorn <lzorn@bayareametro.gov>
Co-authored-by: Sijia Wang <wangsijia0628@gmail.com>
Co-authored-by: Ashish Kulshrestha <ashish.kulshrestha@wsp.com>
Co-authored-by: Jeffrey Newman <jeffnewman@camsys.com>
Co-authored-by: Andrew Rohne <andrew@siliconcreek.net>
Co-authored-by: Nick Fournier <nick.fournier@rsginc.com>
Co-authored-by: Nick Fournier <99497883+nick-fournier-rsg@users.noreply.github.com>
Co-authored-by: aletzdy <58451076+aletzdy@users.noreply.github.com>
Co-authored-by: David Hensle <davidh@sandag.org>
Co-authored-by: JoeJimFlood <joejimflood@gmail.com>
Co-authored-by: Jeffrey Newman <jeff@newman.me>
Co-authored-by: amarin <17020181+asiripanich@users.noreply.github.com>

* docs: apply a minor correction to user guides (#659)

* Trip scheduling logic (#660)

* trip scheduling logic_version

* in testing ignore the warning we just added

* cannot use logic_version 1 with relative scheduling

* Pin Dependencies (#665)

* pinderella

* pandas in setup.cfg deps

* Updated SEMCOG Example (#603)

* initial example_semcog commit

* documentation

* single process passing but not multiprocess

* testing infrastructure

* blacken

* formatting

* implemented stable sample for university extensions

* adding ignored output folder

* removed squared term and re-estimated

* updating test env

* calibrated configs

* fixing bad merge

* updated configs

* reading in extension folder

* updated shadowpricing yaml file

* updated test regress

* Revert "updated test regress"

This reverts commit 8d5af7d.

* updated regress test table

* updated test regress table

* updated cropped land use and skim

* updated shadow pricing method

* updated network_los to read segmented skims if needed

* updated example_manifest.yaml

* fixing conflicts

* making fixes per review comments

* push mistake!

* Revert "making fixes per review comments"

This reverts commit c05c7f3.

* Revert "Revert "making fixes per review comments""

This reverts commit d83019a.

* added back accidentally untracked output folder + updated regress table

* misc small fixes

---------

Co-authored-by: Ali Etezady <58451076+aletzdy@users.noreply.github.com>

* syncronize (#680)

issue templates into develop

---------

Co-authored-by: Lisa Zorn <lzorn@bayareametro.gov>
Co-authored-by: Sijia Wang <wangsijia0628@gmail.com>
Co-authored-by: Ashish Kulshrestha <ashish.kulshrestha@wsp.com>
Co-authored-by: Jeffrey Newman <jeffnewman@camsys.com>
Co-authored-by: Andrew Rohne <andrew@siliconcreek.net>
Co-authored-by: Nick Fournier <nick.fournier@rsginc.com>
Co-authored-by: Nick Fournier <99497883+nick-fournier-rsg@users.noreply.github.com>
Co-authored-by: aletzdy <58451076+aletzdy@users.noreply.github.com>
Co-authored-by: David Hensle <davidh@sandag.org>
Co-authored-by: JoeJimFlood <joejimflood@gmail.com>
Co-authored-by: Jeffrey Newman <jeff@newman.me>
Co-authored-by: amarin <17020181+asiripanich@users.noreply.github.com>

* added documentation

* no input checker to test develop merge

* testing state object in input checker

* adding pandera to dependencies

* getting settings in sync

* data_model_dir, checker logging, network data

* allowing None for trace and resume_after settings

* documentation

* blacken

* data_model dir added to test

* cleanup

* multiple errors & warnings

* clarified warning msgs, fixed error counts

* added auxiliary logging to input_checker

* Added more descriptive input checker test names

* Added info-level input checker logging

* in-line documentation cleanup

* test_input_checker initial commit (not passing)

* Clean up input checker log message styles

* Add MTC example input checks, cleanup

* reformat code with black

* Fixed pytest fixture scope issue

* change networklinks path to platform-agnostic path

* add input checker to mtc extended test

* Input checker bugfixes, added SEMCOG example

- Updated pytables dependency version to fix deprecation warnings
- Added flexible resolve for external input checker file paths (can be
absolute paths or paths relative to one of the data directories)
- Added data_model directory to prototype_mtc_example files
- Added input_checker config yamls to SEMCOG example
- Added sample network data to SEMCOG example

* Add SEMCOG input check files

* Formatting previous commit

* Add missing file handling to input checker

* Blackening previous commit

* Added input checking relative to calling dir

* cleanup and documentation

* Input checker path bugfix, MTC reversion

* suspend buggy test

Suspend test per ActivitySim issue #781

* fix progressive test

* fix external url

* drop extra file

* remove extra imports

* remove vscode file

* pydantic not implemented comments

---------

Co-authored-by: Ali Etezady <58451076+aletzdy@users.noreply.github.com>
Co-authored-by: Lisa Zorn <lzorn@bayareametro.gov>
Co-authored-by: Sijia Wang <wangsijia0628@gmail.com>
Co-authored-by: Ashish Kulshrestha <ashish.kulshrestha@wsp.com>
Co-authored-by: Jeffrey Newman <jeffnewman@camsys.com>
Co-authored-by: Andrew Rohne <andrew@siliconcreek.net>
Co-authored-by: Nick Fournier <nick.fournier@rsginc.com>
Co-authored-by: Nick Fournier <99497883+nick-fournier-rsg@users.noreply.github.com>
Co-authored-by: David Hensle <davidh@sandag.org>
Co-authored-by: JoeJimFlood <joejimflood@gmail.com>
Co-authored-by: Jeffrey Newman <jeff@newman.me>
Co-authored-by: amarin <17020181+asiripanich@users.noreply.github.com>
Co-authored-by: Will <will.alexander@rsginc.com>
Co-authored-by: Jeff Newman <jeff@driftless.xyz>
jpn-- pushed a commit that referenced this issue Feb 13, 2024
relax progressive checks for categorical dtypes
jpn-- added a commit that referenced this issue Mar 19, 2024
…c_choice

BayDAG Contribution #8: Landuse and Reindex available in location choice
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants