Fix several problems with the calculation of working time per client event #7511

SpecLad · 2024-02-23T11:09:10Z

Motivation and context

The first problem is that in the following case, the algorithm would accrue more working time than was actually spent:

          A
    |-----------|
---------+------------+------> time
         B            C

A, B and C are events. Let's say that Te(A) is the timestamp of the end of A, while T(B) and T(C) are the timestamps of B and C, respectively.

The current code in ClientEventsSerializer.to_internal_value adjusts last_timestamp after processing every event. After A is processed, last_timestamp is set to Te(A). After B is processed, last_timestamp goes backwards to T(B). So when the algorithm calculates the working time for C, it gets T(C) - T(B), when the correct answer is T(C) - Te(A). The span from T(B) to Te(A) gets counted twice.

Fix this by rewriting the algorithm, so that last_timestamp (now renamed previous_end_timestamp) can only go forwards.

The second problem is that the algorithm is unable to calculate the working time for the first event in each batch that the client sends. This is because to calculate working time for an event, you need the timestamp/duration of the previous event, and this information is unavailable for the first event in the batch.

Fix this by resending the most recently sent event along with each batch, and using it to initialize the algorithm.

In addition, sort the incoming event array by timestamp. I don't think this matters much in practice, since the UI should be accumulating them in chronological order anyway, but if in some obscure case it sends them out of order, this should help.

How has this been tested?

I added unit tests to test the algorithm. I also manually tested the UI changes to make sure that the previous event is actually sent.

Checklist

I submit my changes into the develop branch
I have created a changelog fragment
~~[ ] I have updated the documentation accordingly~~
I have added tests to cover my changes
~~[ ] I have linked related issues (see GitHub docs)~~
I have increased versions of npm packages if it is necessary
(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.

codecov · 2024-02-23T11:52:57Z

Codecov Report

Merging #7511 (a7c9479) into develop (7f92660) will increase coverage by 0.00%.
Report is 1 commits behind head on develop.
The diff coverage is 80.95%.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #7511   +/-   ##
========================================
  Coverage    83.52%   83.52%           
========================================
  Files          372      372           
  Lines        39666    39661    -5     
  Branches      3718     3724    +6     
========================================
- Hits         33130    33128    -2     
+ Misses        6536     6533    -3

Components	Coverage Δ
cvat-ui	`79.41% <80.41%> (+0.01%)`	⬆️
cvat-server	`87.31% <78.94%> (-0.01%)`	⬇️

bsekachev · 2024-02-26T10:00:28Z

          A
    |-----------|
---------+------------+------> time
         B            C

In your example, actually, I am not sure how event A is appended into events collection before event B, because continuing events are added only when .close() method is called (in Te(A) timestamp in this example)

UPD: When sorting by timestamp added, on server side, it makes sense

bsekachev · 2024-02-26T10:25:58Z

cvat/apps/events/serializers.py

    timestamp = serializers.DateTimeField()
    _TIME_THRESHOLD = datetime.timedelta(seconds=100)
    _WORKING_TIME_RESOLUTION = datetime.timedelta(milliseconds=1)
    _COLLAPSED_EVENT_SCOPES = frozenset(("change:frame",))

+    @classmethod
+    def _end_timestamp(cls, event: dict) -> datetime.datetime:
+        if event["scope"] in cls._COLLAPSED_EVENT_SCOPES:


Why do you consider duration only for events from COLLAPSED_EVENT_SCOPES?
For example if the last event, sent in the previous request was a draw event with duration 30 seconds, we probably need the event timestamp + 30 seconds as prev end timestamp

I just kept the original logic as it was in this case. I didn't want to change the behavior where is wasn't related to the specific problems I was fixing.

bsekachev · 2024-02-26T10:41:48Z

cvat/apps/events/serializers.py

+            end_timestamp = self._end_timestamp(event)
+            if end_timestamp > previous_end_timestamp:
+                working_time += end_timestamp - previous_end_timestamp


Potential hack to increase working time with huge value in duration
In current code for change:frame event, but considering my comment above - in all events

But as I see, the problem existed in the previous implementation also

There's no "hack-proof" way to implement something like this. There's no way to verify the data that the client is sending.

bsekachev · 2024-02-26T11:16:06Z

I feel like implementation would seem easier if we consider only:

start timestamp for the oldest event as a default value for default value previous_end_timestamp
only end timestamps for all other events
sort data not by timestamp, but by timestamp + duration

I may be wrong

SpecLad · 2024-02-26T11:16:42Z

In your example, actually, I am not sure how event A is appended into events collection before event B, because continuing events are added only when .close() method is called (in Te(A) timestamp in this example)

I don't know exactly how this happened, but I did observe it happening in a real event trace. Note that A in this example was not really a continuing event, but a compressed sequence of change:frame events.

I didn't really examine how the UI records such events, so there might be another bug there, but I figured I might as well rewrite the code to handle this case robustly.

bsekachev · 2024-02-26T11:23:05Z

I don't know exactly how this happened, but I did observe it happening in a real event trace. Note that A in this example was not really a continuing event, but a compressed sequence of change:frame events.

Yep, I looked. For change:frame that handled additionally inside the logger by ignore rules, it is a possible case.

SpecLad · 2024-02-26T11:58:30Z

I feel like implementation would seem easier if we consider only:

start timestamp for the oldest event as a default value for default value previous_end_timestamp

only end timestamps for all other events

sort data not by timestamp, but by timestamp + duration

I may be wrong

This would be easier, but if you ignore start timestamps, that would mean that some events would no longer accrue working time, where they do so now:

                     B
       A         |--------|
-------+------------------------

If Ts(B) - T(A) < _TIME_THRESHOLD, but Te(B) - T(A) > _TIME_THRESHOLD, then in the current implementation, B would accrue working time, but in your proposal, it wouldn't. I think the current implementation is preferable in this case.

bsekachev · 2024-02-26T12:09:03Z

That is true, just _TIME_THRESHOLD is too small in this case. In general it may be something about 5 minutes.
I do not think we have events that take more time to finish.

But it was an idea requiring some experiments, not proposal for current patch.

SpecLad · 2024-02-26T12:14:39Z

Hmm, now that I think about it, I probably shouldn't have added sorting at all. It's not necessary to solve either one of the issues I was trying to solve, and it could cause other issues by itself, because it makes it so that the last event sent by the UI is not necessarily the last event processed by the server.

I think I'm going to remove it and just let the UI determine the order that the events are processed in. Any bugs in that can be fixed on the client side.

…event The first problem is that in the following case, the algorithm would accrue more working time than was actually spent: A |-----------| ---------+------------+------> time B C A, B and C are events. Let's say that Te(A) is the timestamp of the end of A, while T(B) and T(C) are the timestamps of B and C, respectively. The current code in `ClientEventsSerializer.to_internal_value` adjusts `last_timestamp` after processing every event. After A is processed, `last_timestamp` is set to Te(A). After B is processed, `last_timestamp` goes _backwards_ to T(B). So when the algorithm calculates the working time for C, it gets T(C) - T(B), when the correct answer is T(C) - Te(A). The span from T(B) to Te(A) gets counted twice. Fix this by rewriting the algorithm, so that `last_timestamp` (now renamed `previous_end_timestamp`) can only go forwards. The second problem is that the algorithm is unable to calculate the working time for the first event in each batch that the client sends. This is because to calculate working time for an event, you need the timestamp/duration of the previous event, and this information is unavailable for the first event in the batch. Fix this by resending the most recently sent event along with each batch, and using it to initialize the algorithm.

SpecLad force-pushed the better-timekeeping branch from 3bb1d88 to 32e2de7 Compare February 23, 2024 11:17

SpecLad marked this pull request as ready for review February 23, 2024 11:20

SpecLad requested review from bsekachev, Marishka17 and nmanovic as code owners February 23, 2024 11:20

bsekachev reviewed Feb 26, 2024

View reviewed changes

SpecLad force-pushed the better-timekeeping branch from 32e2de7 to 9afce8d Compare February 26, 2024 12:17

bsekachev approved these changes Feb 26, 2024

View reviewed changes

SpecLad force-pushed the better-timekeeping branch from 9afce8d to a7c9479 Compare February 27, 2024 12:24

SpecLad merged commit ae9f474 into cvat-ai:develop Feb 27, 2024
34 checks passed

SpecLad deleted the better-timekeeping branch February 27, 2024 17:48

cvat-bot bot mentioned this pull request Mar 5, 2024

Release v2.11.1 #7554

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix several problems with the calculation of working time per client event #7511

Fix several problems with the calculation of working time per client event #7511

SpecLad commented Feb 23, 2024 •

edited

Loading

codecov bot commented Feb 23, 2024 •

edited

Loading

bsekachev commented Feb 26, 2024 •

edited

Loading

bsekachev Feb 26, 2024

SpecLad Feb 26, 2024

bsekachev Feb 26, 2024

bsekachev Feb 26, 2024

SpecLad Feb 26, 2024

bsekachev commented Feb 26, 2024

SpecLad commented Feb 26, 2024

bsekachev commented Feb 26, 2024 •

edited

Loading

SpecLad commented Feb 26, 2024

bsekachev commented Feb 26, 2024

SpecLad commented Feb 26, 2024

Fix several problems with the calculation of working time per client event #7511

Fix several problems with the calculation of working time per client event #7511

Conversation

SpecLad commented Feb 23, 2024 • edited Loading

Motivation and context

How has this been tested?

Checklist

License

codecov bot commented Feb 23, 2024 • edited Loading

Codecov Report

bsekachev commented Feb 26, 2024 • edited Loading

bsekachev Feb 26, 2024

Choose a reason for hiding this comment

SpecLad Feb 26, 2024

Choose a reason for hiding this comment

bsekachev Feb 26, 2024

Choose a reason for hiding this comment

bsekachev Feb 26, 2024

Choose a reason for hiding this comment

SpecLad Feb 26, 2024

Choose a reason for hiding this comment

bsekachev commented Feb 26, 2024

SpecLad commented Feb 26, 2024

bsekachev commented Feb 26, 2024 • edited Loading

SpecLad commented Feb 26, 2024

bsekachev commented Feb 26, 2024

SpecLad commented Feb 26, 2024

SpecLad commented Feb 23, 2024 •

edited

Loading

codecov bot commented Feb 23, 2024 •

edited

Loading

bsekachev commented Feb 26, 2024 •

edited

Loading

bsekachev commented Feb 26, 2024 •

edited

Loading