Skip to content

feat(p2p): add Kubernetes deployment manifests for P2P distribution#480

Open
slin1237 wants to merge 8 commits intomainfrom
feature/p2p-model-distribution-n/8
Open

feat(p2p): add Kubernetes deployment manifests for P2P distribution#480
slin1237 wants to merge 8 commits intomainfrom
feature/p2p-model-distribution-n/8

Conversation

@slin1237
Copy link
Collaborator

Add deployment configuration for P2P model distribution:

  • Headless Service for peer discovery via DNS
  • DaemonSet with P2P-enabled model-agent
  • Documentation with architecture overview and usage instructions

Add BitTorrent library dependency (anacrolix/torrent) and define
constants for P2P model distribution:
- Lease coordination constants (prefix, labels, durations)
- Default configuration values (ports, rates, timeouts)
- Environment variable keys for P2P configuration
Introduce the P2P distributor package with:
- Config struct with validation and defaults
- ConfigFromEnv for environment-based configuration
- Comprehensive Prometheus metrics for P2P operations
  - Download metrics (total, duration, failures)
  - Peer discovery and connection metrics
  - Lease and seeding metrics
  - Metainfo server metrics
Implement ModelDistributor for P2P model distribution:
- BitTorrent client management with rate limiting
- Peer discovery via Kubernetes headless service DNS
- Metainfo fetching from peers for torrent coordination
- Model seeding and download operations
- Active torrent tracking with proper cleanup

Fix API compatibility with anacrolix/torrent v1.57.1:
- Use bencode.Marshal instead of info.MarshalBencode
- Use t.Complete().Bool() instead of t.Complete.Bool
- Handle PeerRemoteAddr type assertion for peer addresses
@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added documentation Documentation changes model-agent Model agent changes tests Test changes config Configuration changes dependencies Dependency updates labels Dec 31, 2025
@slin1237 slin1237 force-pushed the feature/p2p-model-distribution-n/8 branch 2 times, most recently from dd6d550 to 86baf50 Compare December 31, 2025 05:01
Implement MetainfoServer to enable peers to discover available models:
- GET /metainfo/{modelHash} - serve torrent metainfo
- GET /health - health check endpoint
- GET /stats - P2P distribution statistics
- GET /models - list available models with seeding status
- Graceful shutdown support

Fix API compatibility with anacrolix/torrent v1.57.1:
- Add exists() helper function
- Use bencode.Marshal instead of info.MarshalBencode
Add comprehensive tests for the distributor package:
- Config validation tests (valid, missing fields, invalid ports)
- ConfigWithDefaults tests
- Metrics recording tests
- Stats struct tests
- Test helper functions for integration tests

Fix test config to include required LeaseDurationSeconds field.
Implement P2PLeaseManager for coordinating model downloads:
- Lease acquisition with expired lease takeover
- Lease renewal for long-running downloads
- Complete/release lifecycle management
- Ensures only one node downloads from HuggingFace

Tests cover:
- Lease acquisition (new, existing, expired)
- Lease expiration detection
- Lease name generation with hash truncation
- Renewal and holder verification
Integrate P2P model distribution into the Gopher download workflow:
- Add P2P fields to Gopher struct (distributor, lease manager, timeout)
- EnableP2P() and SetP2PTimeout() configuration methods
- computeModelHash() for consistent model identification
- downloadWithP2P() orchestrates P2P-first download strategy
- downloadWithLeaseHeld() handles HF download with lease coordination
- waitForP2PAvailability() with exponential backoff for waiting nodes
- startSeeding() begins seeding after successful download

Flow: Check P2P peers → Try P2P download → Acquire lease → HF download → Seed
Add deployment configuration for P2P model distribution:
- Headless Service for peer discovery via DNS
- DaemonSet with P2P-enabled model-agent
- Documentation with architecture overview and usage instructions
@slin1237 slin1237 force-pushed the feature/p2p-model-distribution-n/8 branch from 86baf50 to 565ed32 Compare December 31, 2025 05:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config Configuration changes dependencies Dependency updates documentation Documentation changes model-agent Model agent changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments