Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove recursive read locking of program cache #1616

Merged
merged 1 commit into from
Jun 5, 2024
Merged

Conversation

pgarg66
Copy link

@pgarg66 pgarg66 commented Jun 5, 2024

Problem

The transaction processor is read locking program cache recursively.

Summary of Changes

Use the existing read locked program cache instance, instead of locking it again.

Fixes #

Copy link

@Lichtso Lichtso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved to fix the immediate bug, however TransactionBatchProcessor::get_environments_for_epoch() containing a hidden read lock remains a footgun.

@pgarg66
Copy link
Author

pgarg66 commented Jun 5, 2024

Approved to fix the immediate bug, however TransactionBatchProcessor::get_environments_for_epoch() containing a hidden read lock remains a footgun.

The idea was to contain the exposure of program cache within SVM bounds. So that the user of SVM are agnostic of program cache operations. We can refactor it in a different way to address the footgun.

@steviez
Copy link

steviez commented Jun 5, 2024

Approved to fix the immediate bug, however TransactionBatchProcessor::get_environments_for_epoch() containing a hidden read lock remains a footgun.

Good point. We could shift signature of this function to force the caller to obtain the lock:

    /// Returns the current environments depending on the given epoch
    pub fn get_environments_for_epoch(program_cache: &ProgramCache, epoch: Epoch) -> ProgramRuntimeEnvironments {
        program_cache
            .get_environments_for_epoch(epoch)
    }

The idea was to contain the exposure of program cache within SVM bounds.

My proposal would seemingly be in disagreement with this goal tho ... I would lean towards safety over convenience tho

@pgarg66
Copy link
Author

pgarg66 commented Jun 5, 2024

Approved to fix the immediate bug, however TransactionBatchProcessor::get_environments_for_epoch() containing a hidden read lock remains a footgun.

Good point. We could shift signature of this function to force the caller to obtain the lock:

    /// Returns the current environments depending on the given epoch
    pub fn get_environments_for_epoch(program_cache: &ProgramCache, epoch: Epoch) -> ProgramRuntimeEnvironments {
        program_cache
            .get_environments_for_epoch(epoch)
    }

The idea was to contain the exposure of program cache within SVM bounds.

My proposal would seemingly be in disagreement with this goal tho ... I would lean towards safety over convenience tho

We can change it to try lock, and return the option to the caller.

@steviez
Copy link

steviez commented Jun 5, 2024

We can change it to try lock, and return the option to the caller.

That seems reasonable to me

@pgarg66 pgarg66 marked this pull request as ready for review June 5, 2024 18:51
Copy link

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I tested the patch on the node that was reliably dead-locking, and this patch avoid the deadlock.

Also, a little more detail from our debug thread about why #1582 introduced the deadlock

image

As of Rust v1.62.0, writers are preferred in reader/writer contention scenarios to avoid starvation. See rust-lang/rust#95801

but modified to prefer writers and spin before sleeping
...
It always prefers writers, as we decided rust-lang/rust#93740 (comment).

@pgarg66 pgarg66 merged commit 538b693 into anza-xyz:master Jun 5, 2024
42 checks passed
@pgarg66 pgarg66 deleted the svm branch June 5, 2024 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants