Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress report and discussion topics for 12-19-14 meeting #10

Closed
fscottfoti opened this issue Dec 17, 2014 · 4 comments
Closed

Progress report and discussion topics for 12-19-14 meeting #10

fscottfoti opened this issue Dec 17, 2014 · 4 comments

Comments

@fscottfoti
Copy link
Contributor

Progress since the last meeting

Infrastructure

  • Pull request for (YAML-based) general choice modeling in Urbansim
  • Defaults for some standard data sources and variable definitions that are usable for all regions
  • A notebook for moving data from the directory structure into a single HDF5 file
  • A notebook for browsing the registered tables and running the example models
  • A pull request for integrating OMX Python support
  • A pull request to wrap a 2D matrix with a Skim object

Specific Models

  • We now have auto ownership and workplace location choice basically working on MTC data - this last uses the OMX and Skim objects. There are a few design decisions that can be made at this point. Here is a proposal for how the models could work.
    • An example implementation directory (these are client specific)
    • The configuration is still a csv file with the same basic form as the current UPCs.
    • Note that this is different from the current YAML approach in UrbanSim and is discussed in this issue
    • In short models are specified in the csv file, and configured in Python code like this. Transformations that are more complicated than those allowed by Patsy can be specified in the CSV file - we use DataFrame.eval or straight Python eval when there is an @ as the first character. Allows more flexibility in the csv than is provided in current yaml files.
    • Additionally, there are simulation settings here
    • An example of using a skim object is here where OMX is read here, the specific matrix here, these are injected and configured here
    • A dictionary of skims is passed to "simple simulate" - and a specified column (e.g. TAZ) should occur in the choosers and is used as the origin id and the alternatives and is used as destination id
  • A few design decisions to make
    • YAML vs CSV
    • and related: how much in configuration vs in code - how comfortable are folks with Python at this level?
    • also related: dependencies in UrbanSim right now are basically the sim framework and low-level mnl routines. 3rd library or is everyone ok with that?
    • main concern at this point is performance - we're not slow per se, but not fast and a lot of memory is used. Did I hear that folks actually parallelize households on different processors?
  • Next 2 weeks (actually first 2 weeks of Jan)
    • If we're comfortable with this proposal, we can comment, document, and test the activitysim code
    • Size variables in workplace location choice
    • Next model is Coordinated daily activity pattern unless we want to head to something complicated

Thoughts?

@fscottfoti
Copy link
Contributor Author

One other thing to add to this list is naming conventions - PEP8 conventions say variables should be named number_persons_age_5_to_15 while most of the variables currently are named like numPersAge5to15. I've changed most of these in the code to match PEP8 conventions but want to make sure people are OK with this. Column names in tables might need to be changed too which I haven't done yet.

@danielsclint
Copy link

@fscottfoti Thank you for pulling together this thorough update on the progress for the last two weeks. It's been nice to follow the progress you and Matt are making via the frequent documentation in GitHub.

A few thoughts on where we are and where we may want to go on the issues outlined above.

Code

  • I would like to walk through some of the classes and configurations setup over the past two weeks.

Configuration Files:

  • I NEVER want to see another binary XLSX configuration file ever again. You can't imagine how often we have been burned by losing track of binary file changes in Github.
  • It may be beyond the scope of this work product, but I would love for configuration / estimation information to be stored in a database with a list of definitive default values and isolated overrides for each scenario. This past year we have done over 350 unique model runs where input parameters are largely the same across runs. However, since all inputs are stored in a file based system, we are copying the same information from scenario folder to scenario folder wasting space. A more ideal solution would be store the inputs in a database keyed by scenario where unless an override was explicitly asserted, the model would use default values.
  • How much configuration in code is a balancing act. I would advocate for as agnostic an implementation as possible in code with much of the configuration outside the python code base. However, not being as familiar with the Python system, I would be interested in discussing the pros and cons.

Relationship to UrbanSim

  • I would rather pull common libraries where possible into a common-util type of project that is shared between UrbanSim and ActivitySim. We are wading into the UrbanSim world at SANDAG, but I don't think I should have to have UrbanSim "installed" to run ActivitySim.

Performance

  • Let's see how well the model is working from an implementation perspective. Is it estimating well? Is it producing reasonable results? If performance becomes an issue (and I would advocate we determine what slow is first), we can look at that a little later.
  • I hate buying expensive computers / servers. I have an ungodly amount of un-used computing capacity sitting on the desks of SANDAG staff. I would love to explore a SETI-like system where I could run ActivitySim in the background of some accountants computer while their at lunch.
  • The parallelization in CT-RAMP is not implemented particularly well. It does distribute the load, but it's not terribly effective at keeping track of requests (we occasionally lose work packets), and I also believe (but cannot prove yet) that the parallel processing is the root cause of some of simulation noise. In effect, I don't think CT-RAMP is doling out the random number seed chain in the same order to each of the nodes from run to run even with a fixed seed. We have tapped this down quite a bit in our implementation, but it is something to keep in mind with parallelization. I think this problem has been resolved (or at least claims to be) in the latest implementations of DaySim.

Naming Conventions

  • PEP8 Standards +1

Next Models

  • I defer to the modelers.

@e-lo
Copy link
Collaborator

e-lo commented Dec 19, 2014

I concur with @danielsclint that this update was very fun to review and I appreciate the sending of this report with links to everything. Pretty Sweet.

One thing that may be useful in the future is to discus specific design questions in their own issues so they are easier to follow. But for now, I will follow Clint's lead and just add my thoughts here.

Config file formats

  • I agree with Clint about "no binary" and prefer YAML slightly to CSV
  • I disagree with Clint about wanting to have a database with specifications. I feel this is overly complex and less traceable. We use git to track different specs at SFCTA. However, I think an in depth discussion about the relationship between /examples/ and defaults will be important. The way I see it eventually working is that ActivitySim is a code base that I (well the future person playing my role at SFCTA) would reference to implement CHAMP. i.e. CHAMP is it's own git repo full of specifications and set of routines to run champ, but it is just using ActiviySIM building blocks (that will probably look a lot like /example/TravelModelOne
    I would reallllly like to separate the "codebase" from the "travel model".

Performance

  • @danielsclint plan to take over the accountant's computers on their lunch hour is intriguing (and something I've done before using Cube Cluster) ...and I think this points to a more disaggregate file system and extremely careful random number seed alignment.
  • I like for parallelization to be flexible. Sometimes I have 30 CPUs and 8 models to run, sometimes 100 CPUs and 2 models to run (or something like that but with better math). Our SF-CHAMP implementation can deal with either flexibly and we just name which computers we want it to use when we start the run...the list could be 1 or dozens.

@guyrousseau
Copy link

It was interesting to keep up with software development over the past couple of weeks, very transparent process, much appreciated, many thanks. That said, a few comments:

I agree that we should not have to have Urbansim installed in order to run Activitysim, though leveraging the Urbansim architecture is most definitely ok and the way to go.

Distributed processing across servers should be as flexible as possible.

CDAP as the next model component to tackle is one more step heading in the right direction with this project moving forward.

bstabler pushed a commit that referenced this issue Oct 20, 2021
Merge in changes to Develop
jpn-- pushed a commit that referenced this issue Feb 13, 2024
Bring in the latest Develop
jpn-- pushed a commit that referenced this issue Apr 1, 2024
* NMTF person available periods

* NMTF person available periods

* blacken

* remove bad path to annotate.py

* remove bad path to annotate.py

* time_periods_available unit test

* removing outdated comment

* estimation mode tour checking
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants