- 
                Notifications
    You must be signed in to change notification settings 
- Fork 27
Freeze DB process test #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Codecov Report
 @@            Coverage Diff             @@
##             main      #39      +/-   ##
==========================================
+ Coverage   60.32%   62.45%   +2.13%     
==========================================
  Files           6        6              
  Lines         809      815       +6     
  Branches      119      122       +3     
==========================================
+ Hits          488      509      +21     
+ Misses        293      276      -17     
- Partials       28       30       +2     
 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow so clean and very well implemented 🤩! Great job Marcelo! Minor comments only
        
          
                src/cluster.py
              
                Outdated
          
        
      | if member["name"] == member_name: | ||
| ip = member["host"] | ||
| break | ||
| return ip | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if member["name"] == member_name: | |
| ip = member["host"] | |
| break | |
| return ip | |
| if member["name"] == member_name: | |
| return member["host"] | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated on 29eae5b.
        
          
                src/cluster.py
              
                Outdated
          
        
      | break | ||
| return primary | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| break | |
| return primary | |
| return primary | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated on 29eae5b.
        
          
                src/cluster.py
              
                Outdated
          
        
      | return primary | ||
|  | ||
| def _get_alternative_server_url(self, attempt: AttemptManager) -> str: | ||
| """Get an alternative URL from another member each time.""" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is an alternative URL and why is it needed? Sorry I am a PG noob 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry Mia. Now I added an improved description. The alternative server URL method is just a way to call the REST API from another cluster member when, for example, the current member DB process is frozen (not able to receive requests). In that case the API is still working in the other members, so the URL from one of them is used. Updated on 2108c4a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fascinating! Great descriptions thanks Marcelo! :)
| await send_signal_to_process(ops_test, primary_name, process, "SIGCONT") | ||
|  | ||
| # Verify that the database service got restarted and is ready in the old primary. | ||
| assert await postgresql_ready(ops_test, primary_name) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add checks that verify:
- that the old primary is now the secondary
- all units are in the same replica set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion. Added both on 6a1e39b. I also added those checks on the kill DB process test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really awesome work here Marcelo!
| else: | ||
| raise MemberNotUpdatedOnClusterError() | ||
| except RetryError: | ||
| return False | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great function 🤩
* Add alternative servers for primary and members retrieval * Test working * Test working * Cleanup the code * More cleanup * Small adjustments * Add unit tests * Improve comments * Use down unit * Improve alternative URL description * Add additional checks * Improve returns
Issue
Solution
Context
src/cluster.pyto enable the charm to find the primary (that is used to update the relation application databag with the primary and the replicas list) when the unit that had the process frozen is the leader.Testing
tests/integration/ha_tests/test_self_healing.py. It tests the charm in a very similar way the Charmed MongoDB Operator is tested.Release Notes