Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(networking): add bootstrap cache for peer discovery #2459

Closed
wants to merge 4 commits into from

Conversation

dirvine
Copy link
Member

@dirvine dirvine commented Nov 21, 2024

Bootstrap Cache Implementation

Summary

This PR introduces a bootstrap cache mechanism to improve network bootstrapping efficiency. The cache persists information about previously known peers, enabling faster network rejoining and reducing cold-start latency.

Motivation

When nodes join the network, they need to discover and connect to existing peers. Without a persistent cache, nodes must rediscover the network from scratch each time they restart, which is inefficient and time-consuming. This implementation addresses that limitation.

Technical Details

  • Implements a persistent cache storing known peer information
  • Cache is maintained in the bootstrap_cache directory
  • Entries include peer IDs and their last known addresses
  • Cache is automatically updated as peers are discovered
  • Stale entries are pruned periodically

Impact

  • Faster network rejoining for nodes
  • Reduced network bootstrap time
  • More resilient peer discovery process
  • Lower network overhead during node startup

Testing

The implementation includes:

  • Unit tests for cache operations
  • Integration tests for peer discovery
  • Persistence verification tests

Future Considerations

  • Cache entry expiration policy could be tuned based on network metrics
  • Additional peer metadata could be stored for better peer selection

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devskim found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@rreive
Copy link

rreive commented Nov 21, 2024

How is security addressed in this model?

@dirvine dirvine marked this pull request as draft November 22, 2024 08:56
@dirvine
Copy link
Member Author

dirvine commented Nov 22, 2024

How is security addressed in this model?

This is not a security related issue per se' It's a mechanism to reconnect to the network easily. It is cross any client/node on the machine.

The security aspect would be who you connect to. If somebody broke into your machine and replaced the file with all bad nodes you could end up on the wrong network, but in that case I imagine you would quickly know.

In terms of connecting to a wrong network clients would not find their data and does woudl likely not be paid, but it's unlikely to cause you to reveal any secret information unless you were using a badly written app, but that app woudl still be bad on a good nework.

Add persistent bootstrap cache to maintain a list of previously known peers,
improving network bootstrapping efficiency and reducing cold-start times.

Enhance the bootstrap cache implementation with robust corruption detection
and recovery mechanisms. This change ensures system resilience when the
cache file becomes corrupted or invalid.

Key changes:
* Add explicit cache corruption detection and error reporting
* Implement cache rebuilding from in-memory peers or endpoints
* Use atomic file operations to prevent corruption during writes
* Improve error handling with specific error variants
* Add comprehensive test suite for corruption scenarios

The system now handles corruption by:
1. Detecting invalid/corrupted JSON data during cache reads
2. Attempting recovery using in-memory peers if available
3. Falling back to endpoint discovery if needed
4. Using atomic operations for safe cache updates

Testing:
* Add tests for various corruption scenarios
* Add concurrent access tests
* Add file operation tests
* Verify endpoint fallback behavior

- Add smarter JSON format detection by checking content structure
- Improve error handling with specific InvalidResponse variant
- Reduce unnecessary warnings by only logging invalid multiaddrs
- Simplify parsing logic to handle both JSON and plain text formats
- Add better error context for failed parsing attempts

All tests passing, including JSON endpoint and plain text format tests.

feat(bootstrap_cache): implement circuit breaker with exponential backoff

- Add CircuitBreakerConfig with configurable parameters for failures and timeouts
- Implement circuit breaker states (closed, open, half-open) with state transitions
- Add exponential backoff for failed request retries
- Update InitialPeerDiscovery to support custom circuit breaker configuration
- Add comprehensive test suite with shorter timeouts for faster testing

This change improves system resilience by preventing cascading failures and
reducing load on failing endpoints through intelligent retry mechanisms.
…ork isolation

* Refactor CacheStore::from_args to handle peer sources more consistently
* Ensure test network mode is properly isolated from cache system
* Fix default behavior to use URL endpoint when no peers provided
* Add proper handling for local and first node modes
* Prevent cache operations when in test network mode

This change ensures that:
- Test network peers are isolated from cache operations
- Default behavior (no args) correctly uses URL endpoints
- Local and first node modes return empty stores
- Explicit peers take precedence over default behavior
- Cache operations only occur in non-test network mode

The changes make the peer source handling more predictable and maintain
proper isolation between different network modes (test, local, default).
* Fix test_safe_peers_env to verify env var peer inclusion
  - Assert presence of env var peer in total peer set
  - Remove incorrect assertion of exact peer count

* Fix test_network_contacts_fallback isolation
  - Enable test_network mode to prevent interference from cache/endpoints
  - Verify exact peer count from mock server

* Improve from_args implementation
  - Add environment variable peer handling before other sources
  - Use empty cache path in test network mode
  - Prevent cache file operations in test network mode

These changes ensure proper test isolation and correct handling of peers
from different sources (env vars, args, cache, endpoints) across different
modes (normal, test network, local).
@loziniak
Copy link
Contributor

Looks like worth an RFC...?

@loziniak
Copy link
Contributor

Could this cache be used also by clients?

@RolandSherwin
Copy link
Member

Yes clients can use this @loziniak , refer to #2487 for the work on top of this!

@RolandSherwin
Copy link
Member

Superseded by #2487

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants