Skip to content

Commit

Permalink
[Core] Fix deadlock in accelerated DAG channel read (#46288)
Browse files Browse the repository at this point in the history
Signed-off-by: Jack Humphries <1645405+jackhumphries@users.noreply.github.com>
  • Loading branch information
jackhumphries authored Jun 27, 2024
1 parent 9e1d4fe commit d539c3e
Showing 1 changed file with 6 additions and 5 deletions.
11 changes: 6 additions & 5 deletions src/ray/core_worker/experimental_mutable_object_manager.cc
Original file line number Diff line number Diff line change
Expand Up @@ -285,14 +285,15 @@ Status MutableObjectManager::ReadAcquire(const ObjectID &object_id,
"Channel has not been registered (cannot get semaphores)");
}

std::unique_ptr<plasma::MutableObject> &object = channel->mutable_object;
// Check whether the channel has an error set before checking that we are the only
// reader. If the channel is already closed, then it's OK to ReadAcquire and
// ReadRelease in any order.
std::unique_ptr<plasma::MutableObject> &object = channel->mutable_object;
RAY_RETURN_NOT_OK(object->header->CheckHasError());
// The channel is still open. This lock ensures that there is only one reader
// at a time. The lock is released in `ReadRelease()`.
channel->lock->lock();
do {
RAY_RETURN_NOT_OK(object->header->CheckHasError());
// The channel is still open. This lock ensures that there is only one reader
// at a time. The lock is released in `ReadRelease()`.
} while (!channel->lock->try_lock());
channel->reading = true;

int64_t version_read = 0;
Expand Down

0 comments on commit d539c3e

Please sign in to comment.