Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend sleep if it doesn't scheduled date #95

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

omochi
Copy link

@omochi omochi commented Mar 3, 2021

This patch fixes a bug #94 which I reported.

If eventLoop.scheduleRepeatedTask completes before scheduled time, it reschedules.
So it guaranteed for user that scheduled job doesn't start before scheduled time and prevent unwanted immediate double execution.

To implement this, I changed AnyScheduledJob.Task to class.
Because underlying NIO.RepeatedTask is recreated and I need to take new instance of it.

I implement carefully to avoid race condition about var isCancelled, var innerTask by guarded them with EventLoop.

@jdmcd
Copy link
Member

jdmcd commented Mar 3, 2021

@omochi thanks so much for your contribution here! This is a bug others have faced in the past that I'm sure they will be glad is fixed. The implementation looks pretty good to me but would like to get eyes on it from either @0xTim or @gwynne.

@gwynne
Copy link
Member

gwynne commented Mar 3, 2021

Looking over the original issue, it seems as though there are three underlying issues:

  1. A task can start execution before its scheduled time. This is the result of using the system monotonic clock (referred to by NIO as the uptimeNanoseconds) to determine whether a task's deadline has been met, which is by definition unlikely to be synchronized with "wall time". @Lukasa provides more details here. This is not a bug in NIO, but rather a flaw in how Queues uses Date (and perhaps an inevitable one).

  2. There is no guard against a task unexpectedly being executed more than once at the same time or more often than requested. This is also the responsibility of Queues to handle.

  3. The issue you originally reported in ScheduledJob run twice quickly because actual sleep duration is shorter than expected. #94, that concurrent execution of your task triggered race conditions, is technically not a bug in Queues at all; there is (to the best of my knowledge) no guarantee in the API that a task may not be run more than once concurrently if circumstances otherwise permit. Your task needs to either be reentrant or explicitly avoid executing reentrantly.

Unfortunately, the solution you've provided in this PR has two significant issues:

  • It does not address the underlying scheduling problem. In the worst-case scenario, it permits a pathological case where the event loop spins at extremely high speed (consuming CPU in the process) calling the retry logic again and again and causing runaway memory allocation (via promise creation and task object allocation). This is admittedly not likely, but it can happen.

  • You have created at least one retain cycle that I can see at a glance, possibly others, by capturing allocated task objects in an indirectly self-referential closure of indeterminate lifetime.

I very much agree that the issue you're trying to solve needs to be addressed, but I don't believe this is the best solution. Even without addressing the fundamental design flaw with how the scheduler functions, it should be possible to work around the immediate problem within the scheduler logic without having to place additional "buffering" logic around the jobs themselves.

(P.S.: If this post reads as disparaging or in any other way discouraging, please understand it isn't at all intended to be that way. Yours and everyone's contributions are always welcome, even when there may be issues with them!)

@omochi
Copy link
Author

omochi commented Mar 3, 2021

Thank you for detailed response and review.

I agree what you say.
I response for each topic below.

  1. ... This is not a bug in NIO, but rather a flaw in how Queues uses Date.

May be I understand it.
Underlying NIO doesn't have machanism to handle wall time.

  1. ... This is also the responsibility of Queues to handle.

Yes.
I think that Queue library should work as people naturally expected.

  1. ... Your task needs to either be reentrant or explicitly avoid executing reentrantly.

Yes.
I am also working to fix our job logic as reentrant safe.
I want to tell here is this issue is really unexpected and hard to get what happens.
I beleave to solve this problem helps many people
and important to spread Server Side Swift with Vapor for business.

In the worst-case scenario, ...

Ah.. I understand it.
I think that it won't happen in most situations.
But theoretically it't correct and we should avoid this hole if we can.

You have created at least one retain cycle

Oh sorry.
I need to calm down more.

By the way, I get a new idea that is way to more smart from your comment.
I retry to write patch.

Base automatically changed from master to main March 12, 2021 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants