Skip to content

Conversation

DavidOry
Copy link
Member

@DavidOry DavidOry commented Jul 8, 2025

Background
The motivation for the explicit error term (EET) approach is that model owners want changes in choices across scenarios to be a function of the systematic utility component, rather the random utility component. To operationalize EET, we must therefore identify a choice in the build scenario that is the same as a choice in the base scenario (i.e., we cannot use the same error term across two choices unless we identify which two choices are the same). In practice, analysts would rarely, if ever, compare individual choices. But they may want to aggregate over a small number of choices and it is therefore important to know and think through the best way to join choices across scenarios. For the balance of this note, I use the word "sameness" to define how choices across scenarios are paired up.

To illustrate the challenges and opportunities for various sameness definitions, this analysis uses a single ActivitySim component: tour mode choice. Identical issues exist for any ActivitySim model component that is run more than once for a given household or person. (For model components, such as automobile ownership, that are only run once per simulation, the definition of sameness is clear and unambiguous: there is only one automobile ownership choice per household in the base and build scenarios, so pairing them is trivial.) The analysis uses the transit example created as part of the broader EET testing with the SANDAG model.

The definition of sameness used in the EET prototype code is designed for Monte Carlo simulation. Error terms for Monte Carlo simulation have the following two requirements:

  1. Uniqueness across choices that were assumed to be independent in model estimation. For example, if a single person engages in two shopping tours, when the tour mode choice model was estimated, these two shopping tours were treated as independent observations. We should therefore give these two shopping tours independent error terms in the simulation, so that the estimation and application are consistent.
  2. Generation of seeds from artifacts of the subject choice, which facilitates reproducibility. It’s easy to satisfy requirement (1) Uniqueness, as you can just keep drawing random numbers. But to satisfy this requirement, we craft random number seeds from household_id, person_id, model_id, etc. -- artifacts of the subject choice. This facilitates reproducibility for a single scenario. ActivitySim uses counting indices, e.g., the first shopping tour gets an index of one, the second shopping tour gets an index of two, so that requirement (1) Uniqueness continues to be satisfied.

With the explicit error term approach, we must satisfy a third requirement when creating random number seeds, which is (3) Sameness. To compare choices across scenarios, we must identify choices that are the same.

In tour mode choice, the Monte Carlo method to creating random seeds creates a unique seed for each tour, specific to each household, person, and tour purpose. If, for example, Person X engages in two shopping tours, the first shopping tour is given an index of 1 and the second shopping tour is given an index of zero. The random seed is then a random integer based on household_id, person_id, purpose, and tour_type_num. For the prototype EET code, this approach was retained. Depending on an analyst's view of sameness, this approach is potentially problematic. Consider the following example:

Base Scenario:
Household ID: A
Person ID: X
Shopping Tour 1: Departs at 9 am for TAZ 1, returns at 10 am
Shopping Tour 2: Departs at 7 pm for TAZ 18, returns at 8 pm

Build Scenario:
Household ID: A
Person ID: X
Shopping Tour 1: Departs at 7 pm for TAZ 18, returns at 8 pm

In this example, a change in the build scenario motivated the elimination of this person's first shopping tour in the base scenario. Using the prototype EET code, the same error term for shopping tour 1 in the base scenario is used for shopping tour 1 in the build scenario. To most analysts, this is an error: shopping tour 2 in the base scenario should be compared with shopping tour 1 in the build scenario. Said another way, if an analyst were asked, which of the two shopping tours in the base scenario is the same (i.e., should be compared directly to) as the one shopping tour in the build scenario, most analysts would select base shopping tour 2.

The bad news is that, for any model component that runs more than once in the simulation, there is no perfect definition of sameness: every definition is subjective and the pros and cons of any definition can be debated/discussed. The good news is that it is not hard to create and implement definitions that many would view as superior to the prototype code. For example, one could define sameness for tour mode choice as two tours that have the same household_id, person_id, purpose, depart_time, arrive_time, and primary_destination_taz.

Analysis
The Jupyter notebook tour-mode-choice-sameness-assessment.ipynb included in this repository does the following:

  1. Reads in the base and build tour mode choice output files (final_tours.csv) for the transit scenario tested as part of the EET prototype assessment.
  2. The error terms for the EET method are based on the tour_id column. We therefore know, across the base and build scenarios, which choices are identified by the prototype code as the same.
  3. Five distinct definitions of sameness are explored. Each of the five is described below.
  4. For each of the five definitions, the performance of the prototype code is assessed.
  5. The results are summarized in a large CSV file that is rendered with the sameness-investigations.twb Tableau workbook.
  6. The Jupyter notebook also highlights examples that are in line with the two shopping tour/one shopping tour example that motivated this examination, i.e., this is both a theoretical and practical problem.

The five definitions explored in the notebook are as follows:

  1. "Basics", which defines sameness as tours with the same household_id, person_id, and purpose.
  2. "Basics + tour_type_num", which is the same as (1), but with tour_type_num, which is a purpose-specific unique index for each tour. This is the definition used in the prototype code.
  3. "Basics + destination", which is the same as (1), but with the tour destination_taz added.
  4. "Basics + start_time", which is the same as (1), but with the tour start_time added.
  5. "Basics + start_time + duration + origin + destination", which is the same as (1), but with start_time, duration, origin, and destination added.

Mechanically, the analysis assesses the performance of the prototype code against these definitions as follows:

  1. Create a definition.
  2. Join the tour mode choice outcomes using this definition for the base and build scenarios.
  3. Using the tour_id variable, determine if the joined tours received the same error term in the prototype code.

Once joined, each base scenario tour mode choice decision is given one of four outcomes, as follows:

  1. "Success: Correct match to build" - If, per the proposed sameness definition, the base and build joined tours received the same error term using the prototype code.
  2. "Success: Nothing comparable in build" - If, for example, Person A engaged in one shopping tour in the base scenario and no shopping tours in the build scenario, there is nothing comparable in the build scenario to compare to the subject tour in the base scenario.
  3. "Failure: Incorrect match in build" - If two tours are joined using the subject sameness definition, but these tours have different error terms.
  4. "Failure: Error term not generated in build" - If, for example, Person A engaged in two shopping tours in the base scenario and one shopping tour in the build scenario, and the second shopping tour in the base scenario was deemed the same, per the subject sameness definition, as the first shopping tour in the build scenario. The error term we want in this example does not exist in the build scenario.

Key Results
The "Basics + tour_type_num" sameness definition, which is definition number 2, should have performed perfectly, i.e., all the choices should be labeled "Success": it is the definition used in the prototype code. And it nearly does, as shown in the table below: it succeeds 99.9 percent of the time. The failures are due to either (a) a coding error or, more likely, (b) a misunderstanding on my part as to the meta data needed to define sameness. The minor differences are not relevant to the key findings of the analysis.

Definition of Sameness Success: Correct match to build Success: Nothing comparable in build Failure: Incorrect match to build Failure: Error term not generated in build Total
Basics (hh_id, person_id, purpose) 99.77% 0.09% 0.00% 0.14% 100.00%
Basics + tour_type_num 99.73% 0.16% 0.05% 0.07% 100.00%
Basics + destination 98.13% 0.19% 1.65% 0.04% 100.00%
Basics + start_time 98.92% 0.20% 0.86% 0.02% 100.00%
Basics + start_time + duration + origin + destination 97.62% 0.22% 2.16% 0.01% 100.00%

The other sameness definitions explored in the analysis are oriented towards behavioral definitions of sameness. My preferred definition for tour mode choice is "Basics + start_time + duration + origin + destination". Here, if an analyst assumes this definition of sameness across a base and build scenario, the prototype code will succeed about 97.8 percent of the time. It will fail more than 2 percent of the time, which is about 97,000 tours in the SANDAG example.

Practical Implications
If an analyst assumes a sameness definition of "Basics + start_time + duration + origin + destination" and the ActivitySim EET implementation fails about 2 percent of the time, does this matter? The short answer is probably not. But because the EET method requires making pairwise comparisons across scenarios, it seems wise to take the short amount of time required to come up with a thoughtful definition of sameness for each model component.

For tour mode choice, I suggest that seeding the error term generator with a value that considers household_id, person_id, purpose, start_time, duration, origin, and destination is a better way forward than using the prototype code's current approach. Said another way: analysts using ActivitySim are more likely to consider tours with the same purpose, start time, duration, origin, and destination to be the same than they would tours that have the same, for each person, sequential index by purpose.

More or less strict: IIA Violations
As noted previously, there is no perfect definition of sameness for any model component run more than once for a household or person in the simulation. All definitions are subjective and the pros/cons of all definitions can be debated. However, less strict definitions are more likely to result in IIA violations. For example, two shopping tours made by the same person in the same day are very likely to be treated as two, independent observations in the tour mode choice model estimation. If the selected sameness method, e.g., the "Basics" shown above, uses the same error term in application for two shopping tours made by the same person, the application will have violated the assumptions used in estimation. To estimate the quantity of potential IIA violations, I summarize the number of tour mode choice outcomes that, when viewed through each sameness definition, appear more than once in the base SANDAG dataset. The results are shown in the table below with the frequency of IIA violations shown in the columns (e.g., because some persons make 9 tours of the same purpose, the IIA count goes up to 9 for the "Basics" definition).

Definition of Sameness 0 1 2 3 4 5 6 7 8 9 Total
Basics (hh_id, person_id, purpose) 71.344% 18.690% 4.241% 3.532% 1.357% 0.654% 0.118% 0.046% 0.013% 0.004% 100.000%
Basics + tour_type_num 99.986% 0.014% 100.000%
Basics + destination 86.863% 12.611% 0.307% 0.200% 0.014% 0.005% 100.000%
Basics + start_time 98.609% 1.357% 0.033% 0.001% 100.000%
Basics + start_time + duration + origin + destination 99.981% 0.019% 100.000%

Using a less strict definition of sameness, e.g., "Basics", "Basics + start_time", results in more potential IIA violations than the more strict definitions. (As before, the "Basics + tour_type_num" definition should perform perfectly on this measure and fails either due to a coding error or a misunderstanding on my part re: available meta data).

More or less strict: Response variability
Theoretically, IIA violations are unattractive. But practically, many model owners may not be bothered by them, in particular because there is an upside to accepting more IIA violations, which is a reduction in what I'll call model response variability.

To illustrate this point, consider an extreme example: every tour mode choice decision in the simulation gets the same error terms by mode. Meaning, the random utility component for transit is, say, -0.50, for every tour mode choice decision, and the random utility component for drive alone is, say, 0.25, for every tour mode choice decision, and on and on. In this case, if transit level-of-service is improved, the response of the model would be sharp. Meaning, for any two travelers with similar systematic utilities across modes, the elasticities with respect to the transit level-of-service change would be similar. The heterogeneity of the response to the level-of-service change would be diminished.

This extreme example is only presented to demonstrate the point. Using the same error term for every tour mode choice decision is a bad idea, for many reasons, including that it would over- or under-state the model's response to the change, depending on the random number seed.

Practically, however, analysts may prefer less model response variability in exchange for some IIA violations. Assessing the practical pros and cons of this would be an interesting and useful study, but beyond the scope of the present work.

A rough-and-ready way to assess the likely simulation variability is by counting the unique error terms in the SANDAG tour mode choice outcomes. This is the opposite side of the coin of IIA violations. These results are shown in the table below. The stricter the definition of sameness, the more unique error terms, meaning the more variability in the component response. (As before, the "Basics + tour_type_num" should be perfect).

Definition of Sameness Share of Error Terms that are Unique
Basics (hh_id, person_id, purpose) 83.336%
Basics + destination 92.970%
Basics + start_time 99.280%
Basics + start_time + duration + origin + destination 99.989%
Basics + tour_type_num 99.993%

Optimal
So what choice is optimal? There is no perfect definition of sameness, and any definition has its pros and cons. In my view, we should select a definition that resonates with most analysts as reasonable and expected. If the promise of the EET approach is that, for any given choice, the change across scenarios is due to the systematic utility change, not the random utility change, we should use a definition of sameness that most travel modelers would expect. For tour mode choice, using a seed based on household_id, person_id, purpose, start_time, origin, and destination strikes me as reasonable. For those concerned about IIA violations, this analysis suggest the IIA concerns are reasonable, but practically very, very small (873 tours out of 4.51 million would be given the same error terms).

fyi: @jpn--, @i-am-sijia, @andkay, @americalexander, @jfdman, John G., @dhensle, @janzill

@dhensle dhensle merged commit 897f250 into main Sep 29, 2025
@dhensle dhensle deleted the sameness-assessment-tour-mc branch September 29, 2025 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants