Sameness Assessment for Tour Mode Choice #7
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
The motivation for the explicit error term (EET) approach is that model owners want changes in choices across scenarios to be a function of the systematic utility component, rather the random utility component. To operationalize EET, we must therefore identify a choice in the build scenario that is the same as a choice in the base scenario (i.e., we cannot use the same error term across two choices unless we identify which two choices are the same). In practice, analysts would rarely, if ever, compare individual choices. But they may want to aggregate over a small number of choices and it is therefore important to know and think through the best way to join choices across scenarios. For the balance of this note, I use the word "sameness" to define how choices across scenarios are paired up.
To illustrate the challenges and opportunities for various sameness definitions, this analysis uses a single ActivitySim component: tour mode choice. Identical issues exist for any ActivitySim model component that is run more than once for a given household or person. (For model components, such as automobile ownership, that are only run once per simulation, the definition of sameness is clear and unambiguous: there is only one automobile ownership choice per household in the base and build scenarios, so pairing them is trivial.) The analysis uses the transit example created as part of the broader EET testing with the SANDAG model.
The definition of sameness used in the EET prototype code is designed for Monte Carlo simulation. Error terms for Monte Carlo simulation have the following two requirements:
With the explicit error term approach, we must satisfy a third requirement when creating random number seeds, which is (3) Sameness. To compare choices across scenarios, we must identify choices that are the same.
In tour mode choice, the Monte Carlo method to creating random seeds creates a unique seed for each tour, specific to each household, person, and tour purpose. If, for example, Person X engages in two shopping tours, the first shopping tour is given an index of 1 and the second shopping tour is given an index of zero. The random seed is then a random integer based on
household_id
,person_id
,purpose
, andtour_type_num
. For the prototype EET code, this approach was retained. Depending on an analyst's view of sameness, this approach is potentially problematic. Consider the following example:In this example, a change in the build scenario motivated the elimination of this person's first shopping tour in the base scenario. Using the prototype EET code, the same error term for shopping tour 1 in the base scenario is used for shopping tour 1 in the build scenario. To most analysts, this is an error: shopping tour 2 in the base scenario should be compared with shopping tour 1 in the build scenario. Said another way, if an analyst were asked, which of the two shopping tours in the base scenario is the same (i.e., should be compared directly to) as the one shopping tour in the build scenario, most analysts would select base shopping tour 2.
The bad news is that, for any model component that runs more than once in the simulation, there is no perfect definition of sameness: every definition is subjective and the pros and cons of any definition can be debated/discussed. The good news is that it is not hard to create and implement definitions that many would view as superior to the prototype code. For example, one could define sameness for tour mode choice as two tours that have the same
household_id
,person_id
,purpose
,depart_time
,arrive_time
, andprimary_destination_taz
.Analysis
The Jupyter notebook
tour-mode-choice-sameness-assessment.ipynb
included in this repository does the following:final_tours.csv
) for the transit scenario tested as part of the EET prototype assessment.tour_id
column. We therefore know, across the base and build scenarios, which choices are identified by the prototype code as the same.CSV
file that is rendered with thesameness-investigations.twb
Tableau workbook.The five definitions explored in the notebook are as follows:
household_id
,person_id
, andpurpose
.tour_type_num
, which is a purpose-specific unique index for each tour. This is the definition used in the prototype code.destination_taz
added.start_time
added.start_time
,duration
,origin
, anddestination
added.Mechanically, the analysis assesses the performance of the prototype code against these definitions as follows:
tour_id
variable, determine if the joined tours received the same error term in the prototype code.Once joined, each base scenario tour mode choice decision is given one of four outcomes, as follows:
Key Results
The "Basics + tour_type_num" sameness definition, which is definition number 2, should have performed perfectly, i.e., all the choices should be labeled "Success": it is the definition used in the prototype code. And it nearly does, as shown in the table below: it succeeds 99.9 percent of the time. The failures are due to either (a) a coding error or, more likely, (b) a misunderstanding on my part as to the meta data needed to define sameness. The minor differences are not relevant to the key findings of the analysis.
The other sameness definitions explored in the analysis are oriented towards behavioral definitions of sameness. My preferred definition for tour mode choice is "Basics + start_time + duration + origin + destination". Here, if an analyst assumes this definition of sameness across a base and build scenario, the prototype code will succeed about 97.8 percent of the time. It will fail more than 2 percent of the time, which is about 97,000 tours in the SANDAG example.
Practical Implications
If an analyst assumes a sameness definition of "Basics + start_time + duration + origin + destination" and the ActivitySim EET implementation fails about 2 percent of the time, does this matter? The short answer is probably not. But because the EET method requires making pairwise comparisons across scenarios, it seems wise to take the short amount of time required to come up with a thoughtful definition of sameness for each model component.
For tour mode choice, I suggest that seeding the error term generator with a value that considers
household_id
,person_id
,purpose
,start_time
,duration
,origin
, anddestination
is a better way forward than using the prototype code's current approach. Said another way: analysts using ActivitySim are more likely to consider tours with the same purpose, start time, duration, origin, and destination to be the same than they would tours that have the same, for each person, sequential index by purpose.More or less strict: IIA Violations
As noted previously, there is no perfect definition of sameness for any model component run more than once for a household or person in the simulation. All definitions are subjective and the pros/cons of all definitions can be debated. However, less strict definitions are more likely to result in IIA violations. For example, two shopping tours made by the same person in the same day are very likely to be treated as two, independent observations in the tour mode choice model estimation. If the selected sameness method, e.g., the "Basics" shown above, uses the same error term in application for two shopping tours made by the same person, the application will have violated the assumptions used in estimation. To estimate the quantity of potential IIA violations, I summarize the number of tour mode choice outcomes that, when viewed through each sameness definition, appear more than once in the base SANDAG dataset. The results are shown in the table below with the frequency of IIA violations shown in the columns (e.g., because some persons make 9 tours of the same purpose, the IIA count goes up to 9 for the "Basics" definition).
Using a less strict definition of sameness, e.g., "Basics", "Basics + start_time", results in more potential IIA violations than the more strict definitions. (As before, the "Basics + tour_type_num" definition should perform perfectly on this measure and fails either due to a coding error or a misunderstanding on my part re: available meta data).
More or less strict: Response variability
Theoretically, IIA violations are unattractive. But practically, many model owners may not be bothered by them, in particular because there is an upside to accepting more IIA violations, which is a reduction in what I'll call model response variability.
To illustrate this point, consider an extreme example: every tour mode choice decision in the simulation gets the same error terms by mode. Meaning, the random utility component for transit is, say, -0.50, for every tour mode choice decision, and the random utility component for drive alone is, say, 0.25, for every tour mode choice decision, and on and on. In this case, if transit level-of-service is improved, the response of the model would be sharp. Meaning, for any two travelers with similar systematic utilities across modes, the elasticities with respect to the transit level-of-service change would be similar. The heterogeneity of the response to the level-of-service change would be diminished.
This extreme example is only presented to demonstrate the point. Using the same error term for every tour mode choice decision is a bad idea, for many reasons, including that it would over- or under-state the model's response to the change, depending on the random number seed.
Practically, however, analysts may prefer less model response variability in exchange for some IIA violations. Assessing the practical pros and cons of this would be an interesting and useful study, but beyond the scope of the present work.
A rough-and-ready way to assess the likely simulation variability is by counting the unique error terms in the SANDAG tour mode choice outcomes. This is the opposite side of the coin of IIA violations. These results are shown in the table below. The stricter the definition of sameness, the more unique error terms, meaning the more variability in the component response. (As before, the "Basics + tour_type_num" should be perfect).
Optimal
So what choice is optimal? There is no perfect definition of sameness, and any definition has its pros and cons. In my view, we should select a definition that resonates with most analysts as reasonable and expected. If the promise of the EET approach is that, for any given choice, the change across scenarios is due to the systematic utility change, not the random utility change, we should use a definition of sameness that most travel modelers would expect. For tour mode choice, using a seed based on
household_id
,person_id
,purpose
,start_time
,origin
, anddestination
strikes me as reasonable. For those concerned about IIA violations, this analysis suggest the IIA concerns are reasonable, but practically very, very small (873 tours out of 4.51 million would be given the same error terms).fyi: @jpn--, @i-am-sijia, @andkay, @americalexander, @jfdman, John G., @dhensle, @janzill