Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: overlapping triplegs #607

Merged
merged 11 commits into from
Mar 11, 2024

Conversation

munterfi
Copy link
Contributor

This PR introduces a new method overlap_staypoints for generating triplegs.

Although the default method between_staypoints is suitable for most cases, there are instances where the temporal
resolution of positionfixes is very coarse with an irregular interval between the positionfixes, resulting in large gaps
between staypoints and triplegs. These are not actual missing data in the record, but rather a consequence of the coarse
temporal resolution. Such occurrences can complicate further analysis.

Our use case involves GPS data of locomotives with positionfixes at irregular intervals, ranging from a few seconds to
several hours when active, and possibly reducing to one positionfix per day when the locomotive is inactive for extended
periods.

Adds

  • Option method="overlap_staypoints" in generate_triplegs().
  • Separate test class TestGenerate_triplegs_overlap.
  • Updates the relevant docstrings.

Approach

The logic follows the "between_staypoints" method, which already sets the "finished_at" time of the staypoint to
the "tracked_at" timestamp of the following positionfix (which is the first positionfix of the next tripleg, and
thereby also defines its "started_at" time). The "finished_at" time of the tripleg is now also extended to
the "tracked_at" timestamp of the first positionfix of the next staypoint:

generate_triplegs_methods

To ensure a spatial overlap of the geometries, a new column "tripleg_id_geom" had to be introduced for the grouping
the points of the positionfixes into a linestring.

id elevation tracked_at user_id accuracy staypoint_id tripleg_id tripleg_id_geom longitude latitude
591 44.196 2008-10-23 09:55:16+00:00 0 2 2 116.321781 40.009273
592 46.9392 2008-10-23 09:55:21+00:00 0 2 2 116.321871 40.009313
593 48.4632 2008-10-23 09:55:26+00:00 0 2 2 116.321952 40.009344
594 49.9872 2008-10-23 09:55:31+00:00 0 0 2 2 116.322007 40.009356
595 49.6824 2008-10-23 09:55:36+00:00 0 0 116.322103 40.009386
596 50.292 2008-10-23 09:55:41+00:00 0 0 116.322121 40.009356
597 50.292 2008-10-23 09:55:46+00:00 0 0 116.322135 40.009353
598 50.292 2008-10-23 09:55:51+00:00 0 0 116.322152 40.009319
599 51.2064 2008-10-23 09:55:56+00:00 0 0 116.322162 40.009394
600 51.2064 2008-10-23 09:56:01+00:00 0 0 116.322179 40.009399
601 50.9016 2008-10-23 09:56:06+00:00 0 0 116.32219 40.009344
602 50.5968 2008-10-23 09:56:11+00:00 0 0 116.322177 40.009342
603 49.9872 2008-10-23 09:56:16+00:00 0 0 116.32219 40.009318
604 49.3776 2008-10-23 09:56:21+00:00 0 0 116.322206 40.009287
605 50.292 2008-10-23 09:56:26+00:00 0 0 116.32222 40.00927
606 48.1584 2008-10-23 10:02:04+00:00 0 0 116.321916 40.009351
607 47.8536 2008-10-23 10:02:09+00:00 0 0 116.321838 40.009336
608 45.72 2008-10-23 10:02:14+00:00 0 0 116.321811 40.009331
609 46.3296 2008-10-23 10:02:19+00:00 0 0 116.321823 40.009314
610 46.0248 2008-10-23 10:02:24+00:00 0 0 116.321833 40.009314
611 45.72 2008-10-23 10:02:29+00:00 0 0 3 116.32185 40.009316
612 42.672 2008-10-23 10:03:39+00:00 0 3 3 116.320888 40.009428
613 52.4256 2008-10-23 10:03:44+00:00 0 3 3 116.321493 40.008854
614 54.864 2008-10-23 10:03:49+00:00 0 3 3 116.321349 40.008848

When a staypoint consists of only one positionfix, the previous tripleg will have a spatial overlap with that staypoint.
However, the following tripleg will not spatially overlap with the staypoint. Otherwise, duplicating the positionfix of
the staypoint would be necessary.

id elevation tracked_at user_id accuracy staypoint_id tripleg_id tripleg_id_geom longitude latitude
2683 66.4464 2008-10-24 00:14:47+00:00 1 13 13 116.324841 39.978782
2684 63.0936 2008-10-24 00:14:52+00:00 1 13 13 116.324875 39.978813
2685 62.7888 2008-10-24 00:14:57+00:00 1 13 13 116.325005 39.978826
2686 62.7888 2008-10-24 00:15:00+00:00 1 6 13 13 116.325174 39.978897
2687 61.2648 2008-10-24 00:20:39+00:00 1 14 14 116.325483 39.97896
2688 61.2648 2008-10-24 00:20:44+00:00 1 14 14 116.325549 39.979008
2689 61.2648 2008-10-24 00:20:49+00:00 1 14 14 116.325562 39.979019

Note: I have not explored how selecting the "overlap_staypoints" method impacts the later processing and
analyses phases in the trackintel framework.

munterfi and others added 7 commits June 7, 2022 15:07
-Tripleg_geometry from start to end staypoint
- tripleg time from start positionfix to start of next staypoint
* TST: introduce separate test class for overlap staypoints method

* DOC: adjust docstring for overlap staypoints method
Copy link
Member

@NinaWie NinaWie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for contributing to Trackintel! I think the new method is a great addition to the existing tripleg generation and will probably be very useful for several applications :)
I had a look at the code and the implementation looks good to me in general, but I'm a bit confused about why the additional column tripleg_id_geom is necessary. It would be best if we can avoid this, since it's confusing for users when there are several tripleg_id columns. I assume the reason can be seen in your drawn table, where the time in the overlap methods only starts after the staypoint ended, but the geometry already includes the pfs in the staypoint? In that case, wouldn't it be more consistent to just make the start at the same time, so that started_at is also set to the tracked_at time of the last pfs in the staypoint? I hope that makes sense. To deal with empty geometries, one could use the first or latest positionfix of the staypoint where the geometry is non-empty.

trackintel/preprocessing/positionfixes.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Mar 4, 2024

Codecov Report

Attention: Patch coverage is 98.93048% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 93.44%. Comparing base (7ee696e) to head (0524ef1).

Files Patch % Lines
trackintel/preprocessing/positionfixes.py 98.93% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #607      +/-   ##
==========================================
+ Coverage   93.40%   93.44%   +0.04%     
==========================================
  Files          33       33              
  Lines        2061     2076      +15     
  Branches      364      367       +3     
==========================================
+ Hits         1925     1940      +15     
  Misses        126      126              
  Partials       10       10              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@munterfi
Copy link
Contributor Author

munterfi commented Mar 4, 2024

Thanks for the feedback!

The reason why the tripleg_id_geom was introduced is, that the staypoint ends at the first position fix of the next
tripleg, but the geometry has to include the last positionfix of the staypoint, to ensure overlapping.

So depending on if the temporal or spatial perspective is considered, the tripleg ids would be assigned differently.
Unfortunately setting the last positionfix of the staypoint as the started_at time of the tripleg, would result in
a negative duration between staypoint end and tripleg start. We could fix this by altering the end times of the already
generated staypoints, but I am not sure if this complies with the intended process of the framework.

As a solution the temporal tripleg_id is now replaced with the spatial tripleg_id_geom.

  • Positionfixes:

    id elevation tracked_at user_id accuracy staypoint_id tripleg_id longitude latitude
    591 44.196 2008-10-23 09:55:16+00:00 0 2 116.321781 40.009273
    592 46.9392 2008-10-23 09:55:21+00:00 0 2 116.321871 40.009313
    593 48.4632 2008-10-23 09:55:26+00:00 0 2 116.321952 40.009344
    594 49.9872 2008-10-23 09:55:31+00:00 0 0 2 116.322007 40.009356
    595 49.6824 2008-10-23 09:55:36+00:00 0 0 116.322103 40.009386
    596 50.292 2008-10-23 09:55:41+00:00 0 0 116.322121 40.009356
    597 50.292 2008-10-23 09:55:46+00:00 0 0 116.322135 40.009353
    598 50.292 2008-10-23 09:55:51+00:00 0 0 116.322152 40.009319
    599 51.2064 2008-10-23 09:55:56+00:00 0 0 116.322162 40.009394
    600 51.2064 2008-10-23 09:56:01+00:00 0 0 116.322179 40.009399
    601 50.9016 2008-10-23 09:56:06+00:00 0 0 116.32219 40.009344
    602 50.5968 2008-10-23 09:56:11+00:00 0 0 116.322177 40.009342
    603 49.9872 2008-10-23 09:56:16+00:00 0 0 116.32219 40.009318
    604 49.3776 2008-10-23 09:56:21+00:00 0 0 116.322206 40.009287
    605 50.292 2008-10-23 09:56:26+00:00 0 0 116.32222 40.00927
    606 48.1584 2008-10-23 10:02:04+00:00 0 0 116.321916 40.009351
    607 47.8536 2008-10-23 10:02:09+00:00 0 0 116.321838 40.009336
    608 45.72 2008-10-23 10:02:14+00:00 0 0 116.321811 40.009331
    609 46.3296 2008-10-23 10:02:19+00:00 0 0 116.321823 40.009314
    610 46.0248 2008-10-23 10:02:24+00:00 0 0 116.321833 40.009314
    611 45.72 2008-10-23 10:02:29+00:00 0 0 3 116.32185 40.009316
    612 42.672 2008-10-23 10:03:39+00:00 0 3 116.320888 40.009428
    613 52.4256 2008-10-23 10:03:44+00:00 0 3 116.321493 40.008854
    614 54.864 2008-10-23 10:03:49+00:00 0 3 116.321349 40.008848
  • Staypoints:

    id user_id started_at finished_at elevation
    0 0 2008-10-23T09:55:31.000 2008-10-23T10:03:39.000 49.9872
    1 0 2008-10-23T10:20:56.000 2008-10-23T10:26:35.000 53.0352
    2 0 2008-10-23T10:32:35.000 2008-10-23T10:44:31.000 25.6032
  • Triplegs:

    id user_id started_at finished_at
    0 0 2008-10-23T02:53:04.000 2008-10-23T03:05:15.000
    1 0 2008-10-23T04:08:07.000 2008-10-23T04:34:52.000
    2 0 2008-10-23T09:42:25.000 2008-10-23T09:55:31.000
    3 0 2008-10-23T10:03:39.000 2008-10-23T10:20:56.000
    4 0 2008-10-23T10:26:35.000 2008-10-23T10:32:35.000

@munterfi munterfi requested a review from NinaWie March 4, 2024 15:48
Copy link
Member

@NinaWie NinaWie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! I see the problem, but I like the new version with the geometry-viewpoint, where the tripleg ID is based on the geometries.
In the end, the method is called overlap, so it makes sense that the tripleg_id refers to the overlapping geometries even if the start and end time of the tripleg is not set this way.
From my side it is therefore fine; I just made one more comment where it would be good for future edits if we have a bit more explanation how the code works. @hongyeehh , do you want to have a look as well? If yes, note that the diff is a bit messed up; the only thing that is really new is the method _generate_triplegs_overlap_staypoints (and the test class). The rest is mainly marked because of minor changes that required different indent.

trackintel/preprocessing/positionfixes.py Outdated Show resolved Hide resolved
@hongyeehh hongyeehh self-assigned this Mar 7, 2024
@hongyeehh
Copy link
Member

Thanks for the PR! I like the idea of overlapping tripleg generation and think the current code is quite well structured.

We need to add more docstring to explain our definition of triplegs' time and geometry, and more test cases for the new method. But we can do this in a separate PR as this one is already quite heavy.

Thanks for contributing to trackintel. I will merge this now!

@hongyeehh hongyeehh merged commit 59e3e1c into mie-lab:master Mar 11, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants