-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix several problems with the calculation of working time per client event #7511
Conversation
3bb1d88
to
32e2de7
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## develop #7511 +/- ##
========================================
Coverage 83.52% 83.52%
========================================
Files 372 372
Lines 39666 39661 -5
Branches 3718 3724 +6
========================================
- Hits 33130 33128 -2
+ Misses 6536 6533 -3
|
In your example, actually, I am not sure how event A is appended into events collection before event B, because continuing events are added only when .close() method is called (in Te(A) timestamp in this example) UPD: When sorting by timestamp added, on server side, it makes sense |
timestamp = serializers.DateTimeField() | ||
_TIME_THRESHOLD = datetime.timedelta(seconds=100) | ||
_WORKING_TIME_RESOLUTION = datetime.timedelta(milliseconds=1) | ||
_COLLAPSED_EVENT_SCOPES = frozenset(("change:frame",)) | ||
|
||
@classmethod | ||
def _end_timestamp(cls, event: dict) -> datetime.datetime: | ||
if event["scope"] in cls._COLLAPSED_EVENT_SCOPES: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you consider duration
only for events from COLLAPSED_EVENT_SCOPES?
For example if the last event, sent in the previous request was a draw event with duration 30 seconds, we probably need the event timestamp + 30 seconds as prev end timestamp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just kept the original logic as it was in this case. I didn't want to change the behavior where is wasn't related to the specific problems I was fixing.
end_timestamp = self._end_timestamp(event) | ||
if end_timestamp > previous_end_timestamp: | ||
working_time += end_timestamp - previous_end_timestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential hack to increase working time with huge value in duration
In current code for change:frame
event, but considering my comment above - in all events
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But as I see, the problem existed in the previous implementation also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no "hack-proof" way to implement something like this. There's no way to verify the data that the client is sending.
I feel like implementation would seem easier if we consider only:
I may be wrong |
I don't know exactly how this happened, but I did observe it happening in a real event trace. Note that A in this example was not really a continuing event, but a compressed sequence of I didn't really examine how the UI records such events, so there might be another bug there, but I figured I might as well rewrite the code to handle this case robustly. |
Yep, I looked. For |
This would be easier, but if you ignore start timestamps, that would mean that some events would no longer accrue working time, where they do so now:
If Ts(B) - T(A) < |
That is true, just _TIME_THRESHOLD is too small in this case. In general it may be something about 5 minutes. But it was an idea requiring some experiments, not proposal for current patch. |
Hmm, now that I think about it, I probably shouldn't have added sorting at all. It's not necessary to solve either one of the issues I was trying to solve, and it could cause other issues by itself, because it makes it so that the last event sent by the UI is not necessarily the last event processed by the server. I think I'm going to remove it and just let the UI determine the order that the events are processed in. Any bugs in that can be fixed on the client side. |
32e2de7
to
9afce8d
Compare
…event The first problem is that in the following case, the algorithm would accrue more working time than was actually spent: A |-----------| ---------+------------+------> time B C A, B and C are events. Let's say that Te(A) is the timestamp of the end of A, while T(B) and T(C) are the timestamps of B and C, respectively. The current code in `ClientEventsSerializer.to_internal_value` adjusts `last_timestamp` after processing every event. After A is processed, `last_timestamp` is set to Te(A). After B is processed, `last_timestamp` goes _backwards_ to T(B). So when the algorithm calculates the working time for C, it gets T(C) - T(B), when the correct answer is T(C) - Te(A). The span from T(B) to Te(A) gets counted twice. Fix this by rewriting the algorithm, so that `last_timestamp` (now renamed `previous_end_timestamp`) can only go forwards. The second problem is that the algorithm is unable to calculate the working time for the first event in each batch that the client sends. This is because to calculate working time for an event, you need the timestamp/duration of the previous event, and this information is unavailable for the first event in the batch. Fix this by resending the most recently sent event along with each batch, and using it to initialize the algorithm.
9afce8d
to
a7c9479
Compare
Motivation and context
The first problem is that in the following case, the algorithm would accrue more working time than was actually spent:
A, B and C are events. Let's say that Te(A) is the timestamp of the end of A, while T(B) and T(C) are the timestamps of B and C, respectively.
The current code in
ClientEventsSerializer.to_internal_value
adjustslast_timestamp
after processing every event. After A is processed,last_timestamp
is set to Te(A). After B is processed,last_timestamp
goes backwards to T(B). So when the algorithm calculates the working time for C, it gets T(C) - T(B), when the correct answer is T(C) - Te(A). The span from T(B) to Te(A) gets counted twice.Fix this by rewriting the algorithm, so that
last_timestamp
(now renamedprevious_end_timestamp
) can only go forwards.The second problem is that the algorithm is unable to calculate the working time for the first event in each batch that the client sends. This is because to calculate working time for an event, you need the timestamp/duration of the previous event, and this information is unavailable for the first event in the batch.
Fix this by resending the most recently sent event along with each batch, and using it to initialize the algorithm.
In addition, sort the incoming event array by timestamp. I don't think this matters much in practice, since the UI should be accumulating them in chronological order anyway, but if in some obscure case it sends them out of order, this should help.
How has this been tested?
I added unit tests to test the algorithm. I also manually tested the UI changes to make sure that the previous event is actually sent.
Checklist
develop
branch[ ] I have updated the documentation accordingly[ ] I have linked related issues (see GitHub docs)(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.