Skip to content

Project Meeting 2023.04.13

Michelle Bina edited this page Apr 18, 2023 · 13 revisions

Agenda

  • Discuss Short-term solutions for ending the activitysim_resources git monorepo.
  • Discussion Questions:
    • Do we want to continue to develop/maintain 3-zone model code?
    • Is validation testing against TM2 results at scale something we want to do?

Meetings

Admin Items

  • Still looking for volunteers to help build ActivitySim website. Alex to coordinate.
  • Short-term solutions for ending the activitysim_resources git monorepo (Jeff)

Data Format / Parquet (Jeff)

Discussion questions

  • Do we want to continue to develop/maintain 3-zone model code?
    • If no participating agency is planning on using this formulation, we should consider ending support for it ASAP to allow that time to be spent supporting other more useful things.
    • What Jeff built now has consistent and stable results. However, are the results accurate? Should Jeff spend time/resources confirming results by hand?
    • Consortium agreement: The 3-zone model tests are complete but holding on whether or not the results are correct,
    • Going forward, document config lines more clearly. Must release information regarding bug to ActivitySim users outside of consortium (i.e. Graduate students at UC Irvine utilize 3-zone model system).
  • Is validation testing against TM2 results at scale a thing we want to do?
    • The BayDAG PR included a number of component level tests that are backed by comparisons against TM2 results at scale. The datasets for these tests are large and it is unclear whether they can be supported in the CI package.
    • Currently running these tests requires getting the specialized data files (TM2 outputs) from ActivitySim resources and/or sharepoint
    • Does the Consortium want this kind of testing in our repository?
    • Datasets for BayDAG PR component level tests are large and may not be supported by the CI package.
    • Noted: Lack of decision. No further attempt to address BayDAG PR component level tests.

Short-term solutions for ending the activitysim_resources git monorepo:

  • Create or leverage external example to pull in data from just the estimation example - the only test that runs inside GitHub.
  • We will be able to pull this data file without touching our bandwidth quota.

Data Format for Checkpointing Structures:

  • Developers Checkpointing Guide: https://camsys.github.io/activitysim/generic-whale/dev-guide/checkpointing.html
  • In the new implementation, there are currently two data file formats available for checkpointing:
    • HDF5
    • Parquet (Default)
  • Checkpointing done using a pluggable class instead of line-by-line changes.
  • User perspective change regarding the Parquet: In the output file directory, a pipeline.parquetpipeline directory has individual folders, each with a set of Parquet files named based on the component that the checkpoint was written for.
  • Writing to a Parquet file format could fail because the Parquet storage does not allow for arbitrary objects in the datafile (i.e. a mix of datatypes).
Clone this wiki locally