-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix infinite lock for chunk preparing #8769
Conversation
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes primarily involve the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant MediaCache
participant CacheSystem
User->>MediaCache: Request to create chunk job
MediaCache->>CacheSystem: Enqueue create chunk job
CacheSystem->>MediaCache: Lock acquisition attempt
alt Lock acquired
MediaCache->>CacheSystem: Create cache item
else Lock not acquired
MediaCache->>User: Return timeout error
end
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
086e9c5
to
7252020
Compare
7252020
to
d6c2ef2
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #8769 +/- ##
===========================================
- Coverage 73.97% 73.94% -0.04%
===========================================
Files 409 409
Lines 43931 43950 +19
Branches 3985 3985
===========================================
Hits 32498 32498
- Misses 11433 11452 +19
|
cvat/apps/engine/default_settings.py
Outdated
@@ -24,3 +24,8 @@ | |||
""" | |||
Sets the frequency of checking the readiness of the chunk | |||
""" | |||
|
|||
CVAT_CHUNK_LOCK_TIMEOUT = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lock timeout is lock ttl, but here it is a timeout for lock acquisition.
I also think that you need to change the default value of blocking_timeout
inside get_rq_lock_for_job
because IMHO it is a bad practice to allow endless waiting for a lock at least by default.
b36b1be
to
337cfa3
Compare
337cfa3
to
49b6736
Compare
cvat/apps/engine/utils.py
Outdated
@@ -210,7 +210,7 @@ def get_rq_lock_by_user(queue: DjangoRQ, user_id: int, *, timeout: Optional[int] | |||
) | |||
return nullcontext() | |||
|
|||
def get_rq_lock_for_job(queue: DjangoRQ, rq_id: str, *, timeout: Optional[int] = 60, blocking_timeout: Optional[int] = None) -> Lock: | |||
def get_rq_lock_for_job(queue: DjangoRQ, rq_id: str, *, timeout: Optional[int] = 60, blocking_timeout: Optional[int] = 60) -> Lock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't blocking_timeout
be less than the request timeout? (to minimize 504 responses)
cvat/apps/engine/default_settings.py
Outdated
@@ -24,3 +24,8 @@ | |||
""" | |||
Sets the frequency of checking the readiness of the chunk | |||
""" | |||
|
|||
CVAT_CHUNK_LOCK_ACQUISITION_TIMEOUT = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please clarify the benefit of introducing this constant? It looks like this timeout could be different depending on where the code is executed (but now the same value is used). For instance, in the case of request handling, this value should be less than request timeout (otherwise, we get 504), but in the case of running inside a worker process (I mean e.g. export process), I guess this could be more than lock TTL.
So, if we are not planning now to have the different values depending on where a code is running, I guess it's better just to use default values from get_rq_lock_for_job
.
cvat/apps/engine/utils.py
Outdated
@@ -210,7 +210,7 @@ def get_rq_lock_by_user(queue: DjangoRQ, user_id: int, *, timeout: Optional[int] | |||
) | |||
return nullcontext() | |||
|
|||
def get_rq_lock_for_job(queue: DjangoRQ, rq_id: str, *, timeout: Optional[int] = 60, blocking_timeout: Optional[int] = None) -> Lock: | |||
def get_rq_lock_for_job(queue: DjangoRQ, rq_id: str, *, timeout: Optional[int] = 60, blocking_timeout: Optional[int] = 50) -> Lock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_rq_lock_for_job(queue: DjangoRQ, rq_id: str, *, timeout: Optional[int] = 60, blocking_timeout: Optional[int] = 50) -> Lock: | |
def get_rq_lock_for_job(queue: DjangoRQ, rq_id: str, *, timeout: int = 60, blocking_timeout: int = 50) -> Lock: |
+ assert timeout and blocking_timeout are not None
Quality Gate passedIssues Measures |
Motivation and context
How has this been tested?
Checklist
develop
branch(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.
Summary by CodeRabbit
New Features
CVAT_CHUNK_LOCK_TIMEOUT
to manage lock acquisition duration during chunk cache operations.Improvements
Bug Fixes