Fixed acquire_jobs method in MongoDB DataStore. #948

HK-Mattew · 2024-07-29T18:35:58Z

Changes

Fixes #.

Checklist

If this is a user-facing code change, like a bugfix or a new feature, please ensure that
you've fulfilled the following conditions (where applicable):

You've added tests (in tests/) added which would fail without your patch
You've updated the documentation (in docs/, in case of behavior changes or new
features)
You've added a new changelog entry (in docs/versionhistory.rst).

If this is a trivial change, like a typo fix or a code reformatting, then you can ignore
these instructions.

Updating the changelog

If there are no entries after the last release, use **UNRELEASED** as the version.
If, say, your patch fixes issue #999, the entry should look like this:

* Fix big bad boo-boo in the async scheduler (#999 <https://github.com/agronholm/apscheduler/issues/999>_; PR by @yourgithubaccount)

If there's no issue linked, just link to your pull request instead by updating the
changelog after you've created the PR.

for more information, see https://pre-commit.ci

agronholm

Left some comments, and you still need to add a test that would fail without the fix, and a changelog note.

agronholm · 2024-07-31T19:18:48Z

src/apscheduler/datastores/mongodb.py

+                    task_slots_left = task_job_slots_left.get(job.task_id, float("inf"))
+                    if (
+                        not task_slots_left
+                        or running_job_count_increments[job.task_id] == task_slots_left
+                    ):


I think this would be the correct fix?

Suggested change

task_slots_left = task_job_slots_left.get(job.task_id, float("inf"))

if (

not task_slots_left

or running_job_count_increments[job.task_id] == task_slots_left

):

if task_job_slots_left.get(job.task_id, float("inf")):

Hello,

I did this because more jobs were being executed than they should be. It was running without limits.

Let me analyze your change suggestion better. I haven't looked at it yet.

The problem with my current approach with MongoDB is the lack of atomicity. Other schedulers might change the task's running jobs count while this scheduler is running acquire_jobs(), as there is no locking involved. I think a better way to do this is to run a conditional update that only updates the running job count if it's lower than the maximum job count. My initial attempt at this failed because find_and_update(), for whatever bizarre reason, doesn't support referencing existing fields in the filter. But the next better way would be to use the previously fetched maximum running job counts, as those rarely change.

From what I understand, your change to

if task_job_slots_left.get(job.task_id, float("inf")):

Would cause all tasks with a positive value, such as: 1, 2, 3 etc..., to be skipped.

Am I right or not?

From what I understand, your change to

if task_job_slots_left.get(job.task_id, float("inf")):

Would cause all tasks with a positive value, such as: 1, 2, 3 etc..., to be skipped.

Am I right or not?

You're right – I was admittely tired when writing that reply. But I think that part of my original logic was correct (if not task_job_slots_left.get(job.task_id, float("inf")):). What was wrong was my calculation of free slots (task_job_slots_left[doc["_id"]] = doc["max_running_jobs"]), as I should've deducted the number of slots in use from the max slots.

Right. Your initial/original version was correct.

.gitignore

agronholm · 2024-08-01T16:40:48Z

Actually, I think I know a better way to do this with MongoDB.

HK-Mattew · 2024-08-01T17:23:40Z

Actually, I think I know a better way to do this with MongoDB.

Using aggregation with filters?

HK-Mattew · 2024-08-01T20:36:24Z

@agronholm

You mentioned the problem of not having atomicity when incrementing running_jobs in the task. So I tried some possibilities to increment running_jobs atomically in mongodb. And I got a result that seems good to me.

See my gist, if you are interested: https://gist.github.com/HK-Mattew/098f714191b348ba6f424ea7add83a1a

This way it is possible to do the update operation atomically.

agronholm · 2024-08-01T20:52:40Z

@agronholm

You mentioned the problem of not having atomicity when incrementing running_jobs in the task. So I tried some possibilities to increment running_jobs atomically in mongodb. And I got a result that seems good to me.

See my gist, if you are interested: https://gist.github.com/HK-Mattew/098f714191b348ba6f424ea7add83a1a

This way it is possible to do the update operation atomically.

I tried something like this, only to find out that you can't use $expr in an update filter. Did you actually try that query?

Another criticism is that if only one slot is available and the initial query picks two jobs for that task, both would be dropped here.

HK-Mattew · 2024-08-01T20:57:44Z

I tried something like this, only to find out that you can't use $expr in an update filter. Did you actually try that query?

I haven't tested the code itself. But I tested the mongodb query and it worked perfectly.

HK-Mattew · 2024-08-01T21:02:07Z

Another criticism is that if only one slot is available and the initial query picks two jobs for that task, both would be dropped here.

Indeed. However, I see that it is the best option at the moment. At least to avoid increasing running_jobs incorrectly and adding more jobs.

agronholm · 2024-08-01T21:13:28Z

Another criticism is that if only one slot is available and the initial query picks two jobs for that task, both would be dropped here.

Indeed. However, I see that it is the best option at the moment. At least to avoid increasing running_jobs incorrectly and adding more jobs.

What I tried myself was doing the increments one by one, and then if the update count is 0, then we can conclude that the task didn't have any more room, yes?

HK-Mattew · 2024-08-01T21:18:57Z

What I tried myself was doing the increments one by one, and then if the update count is 0, then we can conclude that the task didn't have any more room, yes?

It works too.

It would just be a few more operations in the database, but maybe that wouldn't be a problem. At least not for me, but I don't know what other people's use cases might be.

agronholm · 2024-08-01T21:21:10Z

What I tried myself was doing the increments one by one, and then if the update count is 0, then we can conclude that the task didn't have any more room, yes?

It works too.

It would just be a few more operations in the database, but maybe that wouldn't be a problem. At least not for me, but I don't know what other people's use cases might be.

When you say "it works too", what other approach works? As I pointed out, your approach would work incorrectly if two jobs for the same task are acquired and the task only has one open slot. Ditto with 2 open slots, 5 jobs acquired etc.

HK-Mattew · 2024-08-01T21:26:45Z

What I tried myself was doing the increments one by one, and then if the update count is 0, then we can conclude that the task didn't have any more room, yes?

It works too.
It would just be a few more operations in the database, but maybe that wouldn't be a problem. At least not for me, but I don't know what other people's use cases might be.

When you say "it works too", what other approach works? As I pointed out, your approach would work incorrectly if two jobs for the same task are acquired and the task only has one open slot. Ditto with 2 open slots, 5 jobs acquired etc.

Yes, I understand what you said.

Your implementation will certainly be the best option and will work 100%. My idea would work, but not 100%.

agronholm · 2024-08-01T21:28:06Z

Alright, sounds like we're on the same page now. I'll experiment with your query tomorrow, or you can do it if you like in this PR.

HK-Mattew · 2024-08-01T21:34:21Z

Alright, sounds like we're on the same page now. I'll experiment with your query tomorrow, or you can do it if you like in this PR.

I could do that. However, I don't really know how to work with pull requests very well. This was the first pull request I created 😶

agronholm · 2024-08-02T09:03:48Z

This is what I ended up with (passes all the existing tests):

                    # Try to increment the task's running jobs count
                    update_task_result = await to_thread.run_sync(
                        lambda: self._tasks.update_one(
                            {
                                "_id": job.task_id,
                                "$or": [
                                    {"max_running_jobs": None},
                                    {
                                        "$expr": {
                                            "$gt": [
                                                "$max_running_jobs",
                                                "$running_jobs",
                                            ]
                                        }
                                    },
                                ],
                            },
                            {"$inc": {"running_jobs": 1}},
                            session=session,
                        )
                    )
                    if not update_task_result.matched_count:
                        self._logger.debug(
                            "Skipping job %s because task %r has the maximum number of "
                            "jobs already running",
                            job.id,
                        )
                        skipped_job_ids.append(job.id)
                        continue

HK-Mattew · 2024-08-02T09:15:00Z

This is what I ended up with (passes all the existing tests):

                    # Try to increment the task's running jobs count
                    update_task_result = await to_thread.run_sync(
                        lambda: self._tasks.update_one(
                            {
                                "_id": job.task_id,
                                "$or": [
                                    {"max_running_jobs": None},
                                    {
                                        "$expr": {
                                            "$gt": [
                                                "$max_running_jobs",
                                                "$running_jobs",
                                            ]
                                        }
                                    },
                                ],
                            },
                            {"$inc": {"running_jobs": 1}},
                            session=session,
                        )
                    )
                    if not update_task_result.matched_count:
                        self._logger.debug(
                            "Skipping job %s because task %r has the maximum number of "
                            "jobs already running",
                            job.id,
                        )
                        skipped_job_ids.append(job.id)
                        continue

Well, I would do it the same way.

Just the debug log that is missing some args.

agronholm · 2024-08-02T09:16:36Z

Right, the job.task_id argument was missing. I've added that now. I'll also add a regression test that fails with the unmodified code.

agronholm · 2024-08-02T09:45:23Z

Alright, it's fixed now. Thanks!

HK-Mattew and others added 2 commits July 29, 2024 15:31

Fixed acquire_jobs method in MongoDB DataStore.

f8339f9

[pre-commit.ci] auto fixes from pre-commit.com hooks

66659d3

for more information, see https://pre-commit.ci

agronholm requested changes Jul 31, 2024

View reviewed changes

correcting condition.

975e148

agronholm closed this in c9a0017 Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed acquire_jobs method in MongoDB DataStore. #948

Fixed acquire_jobs method in MongoDB DataStore. #948

HK-Mattew commented Jul 29, 2024

agronholm left a comment

agronholm Jul 31, 2024

HK-Mattew Aug 1, 2024

HK-Mattew Aug 1, 2024

agronholm Aug 1, 2024

HK-Mattew Aug 1, 2024

agronholm Aug 1, 2024

HK-Mattew Aug 1, 2024

agronholm commented Aug 1, 2024

HK-Mattew commented Aug 1, 2024

HK-Mattew commented Aug 1, 2024

agronholm commented Aug 1, 2024

HK-Mattew commented Aug 1, 2024

HK-Mattew commented Aug 1, 2024

agronholm commented Aug 1, 2024

HK-Mattew commented Aug 1, 2024

agronholm commented Aug 1, 2024

HK-Mattew commented Aug 1, 2024

agronholm commented Aug 1, 2024

HK-Mattew commented Aug 1, 2024

agronholm commented Aug 2, 2024

HK-Mattew commented Aug 2, 2024

agronholm commented Aug 2, 2024

agronholm commented Aug 2, 2024

Fixed acquire_jobs method in MongoDB DataStore. #948

Fixed acquire_jobs method in MongoDB DataStore. #948

Conversation

HK-Mattew commented Jul 29, 2024

Changes

Checklist

Updating the changelog

agronholm left a comment

Choose a reason for hiding this comment

agronholm Jul 31, 2024

Choose a reason for hiding this comment

HK-Mattew Aug 1, 2024

Choose a reason for hiding this comment

HK-Mattew Aug 1, 2024

Choose a reason for hiding this comment

agronholm Aug 1, 2024

Choose a reason for hiding this comment

HK-Mattew Aug 1, 2024

Choose a reason for hiding this comment

agronholm Aug 1, 2024

Choose a reason for hiding this comment

HK-Mattew Aug 1, 2024

Choose a reason for hiding this comment

agronholm commented Aug 1, 2024

HK-Mattew commented Aug 1, 2024

HK-Mattew commented Aug 1, 2024

agronholm commented Aug 1, 2024

HK-Mattew commented Aug 1, 2024

HK-Mattew commented Aug 1, 2024

agronholm commented Aug 1, 2024

HK-Mattew commented Aug 1, 2024

agronholm commented Aug 1, 2024

HK-Mattew commented Aug 1, 2024

agronholm commented Aug 1, 2024

HK-Mattew commented Aug 1, 2024

agronholm commented Aug 2, 2024

HK-Mattew commented Aug 2, 2024

agronholm commented Aug 2, 2024

agronholm commented Aug 2, 2024