-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-2026. Overlapping chunk region cannot be read concurrently #1349
Conversation
/label ozone |
💔 -1 overall
This message was automatically generated. |
/retest |
@anuengineer @bshashikant @elek please review |
LGTM, I will test it to make sure it works as expected. I will also wait for others to comment and then commit. |
💔 -1 overall
This message was automatically generated. |
Thanks @adoroszlai for working on this. The locking logic seems good to me. @anuengineer @elek @mukul1987 what do you think? There is also a race condition which exists in the system where a chunk might get deleted by BlockDeleting service in Datanode while readChunk is happening which is where i think we need to have synchronization on a container level (not precisely at the chunk level) for closed containers but this problem is beyond the scope of this jira. |
bq. But in Ozone world, a chunk once written is immutable , and hence we might not need the lock at all while reading a chunk. |
@adoroszlai I have committed this to the trunk. Thanks for the contribution. @bshashikant Thanks for the comments. |
Thanks @anuengineer and @bshashikant for reviews, and @anuengineer for committing it. |
What changes were proposed in this pull request?
Only allow a single read/write operation for the same path in
ChunkUtils
, to avoidOverlappingFileLockException
due to concurrent reads. This allows concurrent reads/writes of separate files (as opposed to simply synchronizing the methods). It might be improved later by storing and reusing the file lock.Use plain
FileChannel
instead ofAsynchronousFileChannel
for reading, too, since it was used in synchronous fashion (by calling.get()
) anyway.https://issues.apache.org/jira/browse/HDDS-2026
How was this patch tested?
Added unit test.
Used improved Freon tool from #1341 to perform read of same key from multiple threads (which revealed the bug in the first place).