Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix - program loading with effective slot at epoch boundary #35283

Conversation

Lichtso
Copy link
Contributor

@Lichtso Lichtso commented Feb 21, 2024

Problem

The testnet outage 2024/02/21 was caused by a feature activation which affected RBPF and thus changed the program runtime environment. A bug it the program loading caused programs of the new environment after the epoch boundary to be discarded as they had the same effective slot as programs of the old environment before the epoch boundary.

Summary of Changes

  • Limit effective slot to the begin of the current epoch when there is a relevant feature activation.
  • Adds test_feature_activation_loaded_programs_epoch_transition().

@Lichtso Lichtso force-pushed the fix/program_load_effective_slot_at_epoch_boundary branch 4 times, most recently from 8e4e6e8 to 3d4dcb7 Compare February 22, 2024 00:23
@Lichtso Lichtso added the v1.18 PRs that should be backported to v1.18 label Feb 22, 2024
Copy link
Contributor

mergify bot commented Feb 22, 2024

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

@t-nelson
Copy link
Contributor

t-nelson commented Feb 22, 2024

much more readable! 😍

thoughts on taking this to 1.17? we don't want our hands tied should we need to emergency patch something that affects jit output

Copy link

codecov bot commented Feb 22, 2024

Codecov Report

Attention: Patch coverage is 97.36842% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 81.6%. Comparing base (537c3d8) to head (78f7d6a).
Report is 9 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #35283   +/-   ##
=======================================
  Coverage    81.6%    81.6%           
=======================================
  Files         834      834           
  Lines      224798   224835   +37     
=======================================
+ Hits       183485   183550   +65     
+ Misses      41313    41285   -28     

@pgarg66
Copy link
Contributor

pgarg66 commented Feb 23, 2024

Since we are planning to backport this to v1.17, is there a smaller patch that fixes the problem? The cleanup can be done in a follow on PR, that we don't backport. Also, the change that's going into to v1.17 should be audited before making a release out of it.

@alessandrod
Copy link
Contributor

Rewrites test_feature_activation_loaded_programs_recompilation_phase().

Saying what you do is great, but please always say why (and when a change is complex, say how).

I don't think the test change belongs to this PR? Does the (unmodified) test pass with the effective slot change? Changing existing tests when fixing something is not ideal. Why can't we add a new regression test for the specific issue we found, which clearly fails without the fix? Then we can refactor other tests separately.

@alessandrod
Copy link
Contributor

(The actual code change looks fine to me, but I'd like to see a regression test :P)

@Lichtso Lichtso force-pushed the fix/program_load_effective_slot_at_epoch_boundary branch from b15170f to 78f7d6a Compare February 23, 2024 10:37
@Lichtso Lichtso added the v1.17 PRs that should be backported to v1.17 label Feb 23, 2024
Copy link
Contributor

mergify bot commented Feb 23, 2024

Backports to the stable branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule.

@Lichtso Lichtso merged commit 2891ce8 into solana-labs:master Feb 23, 2024
36 checks passed
@Lichtso Lichtso deleted the fix/program_load_effective_slot_at_epoch_boundary branch February 23, 2024 14:15
mergify bot pushed a commit that referenced this pull request Feb 23, 2024
* Always limit effective slot to the begin of the current epoch.

* Adds comments.

* Optimizes to avoid having two entries if there is no relevant feature activation.

* Adds test_feature_activation_loaded_programs_epoch_transition().

(cherry picked from commit 2891ce8)

# Conflicts:
#	svm/src/transaction_processor.rs
mergify bot pushed a commit that referenced this pull request Feb 23, 2024
* Always limit effective slot to the begin of the current epoch.

* Adds comments.

* Optimizes to avoid having two entries if there is no relevant feature activation.

* Adds test_feature_activation_loaded_programs_epoch_transition().

(cherry picked from commit 2891ce8)

# Conflicts:
#	svm/src/transaction_processor.rs
Lichtso added a commit that referenced this pull request Feb 23, 2024
* Always limit effective slot to the begin of the current epoch.

* Adds comments.

* Optimizes to avoid having two entries if there is no relevant feature activation.

* Adds test_feature_activation_loaded_programs_epoch_transition().

(cherry picked from commit 2891ce8)
Lichtso added a commit that referenced this pull request Feb 23, 2024
* Always limit effective slot to the begin of the current epoch.

* Adds comments.

* Optimizes to avoid having two entries if there is no relevant feature activation.

* Adds test_feature_activation_loaded_programs_epoch_transition().

(cherry picked from commit 2891ce8)
Lichtso added a commit that referenced this pull request Feb 23, 2024
* Always limit effective slot to the begin of the current epoch.

* Adds comments.

* Optimizes to avoid having two entries if there is no relevant feature activation.

* Adds test_feature_activation_loaded_programs_epoch_transition().

(cherry picked from commit 2891ce8)
Lichtso added a commit that referenced this pull request Feb 23, 2024
* Always limit effective slot to the begin of the current epoch.

* Adds comments.

* Optimizes to avoid having two entries if there is no relevant feature activation.

* Adds test_feature_activation_loaded_programs_epoch_transition().

(cherry picked from commit 2891ce8)
Lichtso added a commit that referenced this pull request Feb 26, 2024
…ackport of #35283) (#35302)

Fix - program loading with effective slot at epoch boundary (#35283)

* Always limit effective slot to the begin of the current epoch.

* Adds comments.

* Optimizes to avoid having two entries if there is no relevant feature activation.

* Adds test_feature_activation_loaded_programs_epoch_transition().

(cherry picked from commit 2891ce8)

Co-authored-by: Alexander Meißner <AlexanderMeissner@gmx.net>
jeffwashington pushed a commit to jeffwashington/solana that referenced this pull request Feb 27, 2024
…abs#35283)

* Always limit effective slot to the begin of the current epoch.

* Adds comments.

* Optimizes to avoid having two entries if there is no relevant feature activation.

* Adds test_feature_activation_loaded_programs_epoch_transition().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v1.17 PRs that should be backported to v1.17 v1.18 PRs that should be backported to v1.18
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants