Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix randomly failing test test_enqueue_once_after_enqueue #17950

Closed
jeremystretch opened this issue Nov 7, 2024 · 2 comments · Fixed by #18062
Closed

Fix randomly failing test test_enqueue_once_after_enqueue #17950

jeremystretch opened this issue Nov 7, 2024 · 2 comments · Fixed by #18062
Assignees
Labels
netbox status: accepted This issue has been accepted for implementation type: housekeeping Changes to the application which do not directly impact the end user

Comments

@jeremystretch
Copy link
Member

Proposed Changes

The test netbox.tests.test_jobs.EnqueueTest.test_enqueue_once_after_enqueue occasionally fails for an unknown reason (see this example). This needs to be investigated and resolved.

Justification

CI tests should always pass reliably.

@jeremystretch jeremystretch added type: housekeeping Changes to the application which do not directly impact the end user status: needs owner This issue is tentatively accepted pending a volunteer committed to its implementation labels Nov 7, 2024
@bctiemann
Copy link
Contributor

The exception is being raised from here, in rq/job.py --

        if refresh:
            status = self.connection.hget(self.key, 'status')
            if not status:
                raise InvalidJobOperation(f"Failed to retrieve status for job: {self.id}")
            self._status = JobStatus(as_text(status))

self.connection at that point when running locally is a redis.client.Redis instance. What is the setup in CI? Does it have Redis available? Or does this connection need to be mocked?

It seems like whatever caching backend is present in CI is intermittently failing to return a result for the key of the job being enqueue_once'd:

        job1 = TestJobRunner.enqueue(instance, schedule_at=self.get_schedule_at())
        job2 = TestJobRunner.enqueue_once(instance, schedule_at=self.get_schedule_at(2))
            # If the job parameters haven't changed, don't schedule a new job and keep the current schedule. Otherwise,
            # delete the existing job and schedule a new job instead.
            if (schedule_at and job.scheduled == schedule_at) and (job.interval == interval):
                return job
            job.delete()

We are trying to delete the job (because it is being called with new parameters), but its key is not in the cache when the above rq code is called, so it raises the exception. Maybe this is because the first instance of the job has already completed by the time the second one is enqueued? But I would think the key would still be present and it would have a status of finished, rather than not being there at all.

Since I have Redis in my local environment, this always works properly, but I suspect the setup in CI is different.

@jsenecal
Copy link
Contributor

HA! It was driving me mad yesterday, thanks for flagging this @jeremystretch

@jeremystretch jeremystretch added the netbox label Nov 19, 2024 — with Linear
@jeremystretch jeremystretch added status: accepted This issue has been accepted for implementation and removed status: needs owner This issue is tentatively accepted pending a volunteer committed to its implementation labels Nov 21, 2024
jeremystretch pushed a commit that referenced this issue Nov 21, 2024
…st (#18062)

* Wait until job1 exists in Redis before enqueueing job2

* Job can exist but not have status

* Catch InvalidJobOperation and use as trigger for retry

* Catch InvalidJobOperation when deleting/canceling job

* Remove testing code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
netbox status: accepted This issue has been accepted for implementation type: housekeeping Changes to the application which do not directly impact the end user
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants