Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define rules for propagating delays across trips in same block #110

Conversation

barbeau
Copy link
Collaborator

@barbeau barbeau commented Oct 18, 2018

The GTFS-realtime spec currently defines how StopTimeUpdates (i.e., delays) should be propagated between stops in the same trip.

However, GTFS-rt has remained silent on whether propagation should occur between two trips within the same block.

This proposal defines rules for propagating delays (late, on time, and early) across trips in same block. It also gives names to all the propagation examples to make it easier to follow the logic in each example.

The primary goal of this proposal is to avoid hiding delays from riders, which can inadvertently happen if propagation across trips in the same block is not implemented. For example, if Trips A and B are in the same block, and there is a large delay for Trip A, the rider waiting at the first stop of Trip B should see this info in their mobile app before the vehicle completes Trip A.

I've drafted a Google Drawing to help explain this issue:

propagating delays across trips in same block 3

The proposed rules in this pull request are modeled on the behavior currently implemented in OneBusAway, which has been deployed across a number of cities including New York, Puget Sound (Seattle), San Diego, Tampa, York Region (CA), and Washington, D.C.

Note that these propagation rules only apply in the absence of explicit StopTimeUpdates for Trip B - the producer can always specify the exact prediction to use for any stop by providing a TripUpdate with a StopTimeUpdate referencing that stop_sequence.

Please provide feedback! I'd especially like feedback on the currently proposed rule for vehicles running early. The proposed rule is based on OneBusAway's behavior, which is:

  private int propagateScheduleDeviationForwardWithSlack(int scheduleDeviation, int slack) {
    /**
     * If the vehicle is running early and there is slack built into the
     * schedule, we guess that the vehicle will take that opportunity to pause
     * and let the schedule catch back up. If there is no slack, assume we'll
     * continue to run early.
     */
    if (scheduleDeviation < 0) {
      if (slack > 0)
        return 0;
      return scheduleDeviation;
    }
    ...

This handles the situation when an agency arbitrarily splits a trip into two (e.g., to change headsigns) but there isn't a layover in the schedule, and therefore the bus doesn't hold and continues to run early. If there is a planned break in the schedule, the driver is more likely to hold and wait to depart until the scheduled departure time.

An alternate rule is that vehicles running early never get propagated across trips, but this design risks dropping real-time info in the middle of a passenger-perceived trip when the agency arbitrarily splits a trip in their GTFS. And, riders would rather know about an early vehicle than miss this information, and as a result miss their bus. Therefore, IMHO it seems better to err on the side of propagating early vehicle info rather than dropping it.

Related issues:

  • Delay propagation across trips in the same block (specifically on time and late) has been discussed and tentatively agreed upon as an upcoming feature in OpenTripPlanner (see this thread).

Announced on GTFS-realtime Google Group at https://groups.google.com/forum/#!topic/gtfs-realtime/D5taM5vhEJQ.

@gcamp
Copy link
Contributor

gcamp commented Oct 18, 2018

It's a bit of a surprise to me that other interpreted the current spec as real time propagation across trips on the same block. It seems logical for producer to fulfill that responsibility since they know (or at least have better access to) the logic that drivers use in different agencies.

It seems way simpler to simply don't propagate the delays at all between trips, and explicitly mention it in the references. It simplifies the spec significantly and avoid a lot of ambiguity. The two arguments for the proposal that I can see are 1) smaller GTFS-rt, which seems a small benefit and 2) having a good default for producer that can't put more efforts in their feed, but I don't feel the cost of complication is worth it. My impression is that for most producer, the default won't be good enough so they'll end up coding their own logic anyway.

However, if other don't agree with this view (especially producers) the proposed default seems reasonable to me.

@barbeau
Copy link
Collaborator Author

barbeau commented Oct 19, 2018

@gcamp I wouldn't call this as much of an interpretation of the spec but more design driven by end-user requirements in OneBusAway. If a producer only provides real-time info on the current trip, it's a really bad user experience if that information isn't propagated to subsequent trips in the same block. As mentioned above, users see real-time "pop in" right before the bus reaches their stop, while the system actually contained that information much further in advance.

I agree that in an ideal world producers would provide high-quality predictions for all trips and stops the vehicle is currently assigned to, but from what I've seen we have a way to go before we get there. In the meantime this PR helps define what the expected behavior should be for these systems.

Looking forward, though, perhaps in the spec or in GTFS-realtime Best Practices we should advocate more strongly for including predictions across multiple stops and trips when high-quality information is available?

@prhod
Copy link

prhod commented Oct 24, 2018

At our side, we are mitigated with this proposal. It's interesting, but will lead to false propagation of RT in some cases. For example, if a vehicle breaks, causing a huge delay for the trip, and it's replaced at the start of the next part of the block_id on time, the resulting propagation could lead to an irrelevant delay ?

As we use the block_id as a "you can stay in the vehicle" flag, this kind of situation will lead to either technical difficulties for algorithms or inconsistent display journey information, no ?

@harringtonp
Copy link

While the concepts and illustration of the problems are good, I'd be weary of implementing this as a GTFS consumer. There can be an arbitrary number of trips in a block and if there is a 15 minute delay for the last stop in the first trip of the block you would be propagating this delay for the next 4 vehicle trips if there were 5 in the block. Even though the propagation makes sense for the early part of the following trip (2nd in block) it would look silly to the user to see delays marked for later trips which are a long time away from starting.

Even within a single trip with a delay at an early stop, I'm not a fan of propagating the delay all the way down the trip. As mentioned above the producer should ideally drive this by providing explicit delays as and when necessary (and the better ones do). To flip back however to your specific example, there is a definite benefit in some propagation in a block but once you start the propagation where do you stop.

@abyrd
Copy link

abyrd commented Oct 26, 2018

@gcamp I don't think this is an interpretation of the existing spec, it's more of a desired feature / addition to the spec that some people have already implemented because it's useful in their context.

I wouldn't state it the same way as @barbeau though, that it's "driven by end-user requirements in OneBusAway". The end users' needs can be met in several different ways, and the underlying problem seems to be that real-time prediction systems are still lacking critical features.

Personally I cling to the idea that arrival prediction should be handled by a sufficiently sophisticated module that is capable of outputting totally unambiguous predictions. The situation described here should yield separate trip updates for each affected trip in the block (and ideally with complete predictions, for every stop in every trip).

The concern about the compactness of GTFS-RT which may be unnecessary - it's already a compact binary format, and in larger systems differential push delivery is very lightweight even for large volumes of information.

I agree with the statement from @gcamp that "[i]t seems logical for producer to fulfill that responsibility since they know (or at least have better access to) the logic that drivers use in different agencies." I'm wary of baking lots of assumptions and heuristics into the interpretation of real-time data. It's so much better if the producers can explicitly configure their system to perform propagation according to the operational procedures followed by their drivers and the structure of their schedule data.

The more effort we put into guessing what producers' data means, the less pressure is on them to improve the quality of the data. These extra propagation rules seem like extra stopgap features of a real-time data repair/inference pipeline, not part of the specification itself.

In short, it doesn't seem to me that the specification should contain rules on how to infer/repair missing or bad data.

@barbeau
Copy link
Collaborator Author

barbeau commented Nov 2, 2018

Please note there was an error in my original diagram - on Trip B, if the producer doesn't supply any Trip B trip_update then without propagation the rider would only see schedule information or would not see any information at all (not "on-time", as I originally had listed). I've corrected it above - I'm not sure if that changes anyone's interpretation of what I'm proposing here.

@barbeau
Copy link
Collaborator Author

barbeau commented Nov 28, 2018

I went back and reviewed the GTFS-realtime spec, and it doesn't seem like it strongly suggests/requires producers to publish predictions for upcoming trips as well as the current trip. So I'll be working on a separate proposal for that language to try and address the root of the problem here.

However, we still have the question of how consumers should be handling existing "legacy" feeds that don't publish upcoming trip predictions before the current trip ends (i.e., what this proposal is addressing).

We analyzed existing feeds (using TransitFeeds.com for discovery), and 20 of the 46 TripUpdate feeds (43%) are only providing one trip_update per vehicle. Another 10 feeds are suspect (they don't include a vehicle_id, or less than 2% of vehicles have more than one trip_update per vehicle), so the number of feeds that aren't publishing upcoming trip predictions could be as high as 30 (65%).

It seems to be that we should have some guidance for how these legacy feeds should be handled to have a consistent experience across apps and attempt to give the end user the best experience despite the data gaps.

Do the above statistics have any impact on your feelings towards including this guidance in the spec itself? Or would you rather see this in a GTFS-realtime Best Practices document (similar to GTFS Best Practices)?

@harringtonp
Copy link

I'd be of the opinion that it certainly should be in a GTFS realtime Best Practices document. The same document should probably emphasize that for highest quality predictions, a producer should strive to give predictions for all stops in the current trip and all stops in the next trip in the block. Is there ever any value in providing predictions for trips after the upcoming trip in a block ? If not then it could be mentioned that there is no need to predict beyond the upcoming trip.

A benefit of having it in the GTFS spec itself is that producers would be at least aware of how consumers are predicting for upcoming trips when no data is provided. And this may in some cases (e.g. their own drivers take the layover regardless of schedule) be a trigger for the producer to provide upcoming trip predictions.

Just because it is in the spec, consumers don't have to implement it but they at least will be aware of the issue.

@tleboulenge
Copy link

Hi all,
I'm excited (and a bit sorry too) to write my first comment on this repo. I am Thierry, tech lead of the Google eng team who maintains the software that consumes GTFS realtime and turns them into updates for Maps users.

On this particular issue:
We all agree that the ideal handling of delays across trips in a block transfer is by providers giving the full picture, including upcoming trips.

However, consumers need to have a default behaviour when this data is unspecified. At the moment, it's left to the consumer's discretion, resulting in an unpublished and inconsistent experience. So IMO it's clearly the responsibility of the spec to define what is that default behaviour once and for all.

We need make sure it cover all cases in the matrix [vehicle delay (early, on time, late)] x [layover (none, shorter than delay, longer than delay)].
I think the only decision that needs to be taken is whether a layover is a fixed time (e.g. corresponds to a coffee break for the driver, or servicing a vehicle), or a buffer time (in order to re-sync with the schedule in most cases of minor time deviations).

For instance, if there's a 10 minutes layover, and the bus is...
[A: fixed time: propagate delay literally]

  1. Running early by 5 min, it will depart early by 5 min
  2. Running late by 5 min, it will depart late by 5 min
  3. Running late by 20 min, it will depart late by 20 min

[B: Buffer: revert to schedule as much as possible]

  1. Running early by 5 min, it will depart on-time
  2. Running late by 5 min, it will depart on-time
  3. Running late by 20 min, it will depart late by 10 min (and not wait at the layover point).

I think scenario [B] makes more sense, but this is more a question for the producers' side of the equation.

And to respond to @harringtonp 's comment above (we might have to propagate a delay all the way to the end of the service day), yes this is technically true (although somewhat mitigated by the auto revert-to-normal feature of scenario B), but consumers might also decide to discard delays that are too far in the future as "irrelevant". It's not much different than what already happens today on a long-distance (think India's multi-day) train or bus: does a 15 minutes delay at stop 2 really means 15 minutes delay at arrival in 36h?

barbeau added a commit that referenced this pull request Mar 20, 2020
…#206)

Previous discussions (1) (2) have raised the question of whether more than one TripUpdate can be included in a GTFS-realtime feed for a vehicle in the case where the vehicle is serving more than one trip in the same block.

These discussions have concluded that multiple TripUpdates per vehicle are clearly beneficial in this case to avoid prediction "pop-in" for riders as the vehicle transitions from one trip to another and also give riders advance notice of delays that impact downstream trips (e.g., when the known delay exceeds planned layover times between trips).

However, this issue currently isn't addressed anywhere in the spec.

This proposal adds language to the spec recommending that multiple TripUpdates are included in a GTFS-realtime feed for vehicles running more than one trip in the same block. It also normalizes trip-updates.md and reference.md language related to blocks (future work may combine these files).

(1) GitHub PR proposal - "Define rules for propagating delays across trips in same block" - #110
(2) Google Groups post - "trip is in TripUpdates but not in VehiclePositions feed" - https://groups.google.com/forum/#!topic/gtfs-realtime/lVPOmi9A5vQ
@stale
Copy link

stale bot commented Aug 21, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more. label Aug 21, 2021
@stale
Copy link

stale bot commented Aug 28, 2021

This pull request has been closed due to inactivity. Pull requests can always be reopened after they have been closed. See the Specification Amendment Process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GTFS Realtime Issues and Pull Requests that focus on GTFS Realtime Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants