Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements on Ersilia-Pack #51

Open
GemmaTuron opened this issue Feb 18, 2025 · 7 comments
Open

Improvements on Ersilia-Pack #51

GemmaTuron opened this issue Feb 18, 2025 · 7 comments
Assignees

Comments

@GemmaTuron
Copy link
Member

Hi @Abellegese can you summarise the refactorings mentioned in #49 #48 #46 (and others if you have them on the pipeline) in a table with three columns:

"Feature" - Name of the feature
"Short Description" - One sentence about the feature
"Improves" - Performance / Error handling / Frontend

@Abellegese
Copy link

Abellegese commented Feb 18, 2025

Hi @miquelduranfrigola @GemmaTuron @DhanshreeA this is the summary tried my best to put things together.

Feature Detail Outcome Priority Status
shell command execution using async event loop Execute shell commands asynchronously via the event loop, avoiding blocking the main thread Faster execution and real-time error handling 1 Not Done [simple to implement]
Defining run.py router instead of writing it from template More manageable endpoint design and adherence to the Dependency Inversion Principle using a template file Flexibility, Manageability 1 Not Done [simple to implement]
Request Context Middleware Middleware that attaches request-specific context (e.g., headers, user data) to FastAPI requests for global access Enables effortless access to request context across endpoints 3 Done
General error handler Centralized error handling that catches exceptions and formats error responses uniformly Consistent and clear error reporting 1 Done
Request Adaptive Circuit Breaking Middleware that monitors request failures and temporarily halts requests when thresholds are exceeded to avoid overload Improves reliability by preventing cascading failures 3 Done
Putting endpoints, i.e., metadata into a router Consolidate endpoint metadata within a router for better organization and modularity Simplifies endpoint management and enhances API documentation 1 Done 75%
Nice UI for API doc and redoc using Material UI and custom CSS Customize API documentation interfaces using Material UI components and tailored CSS styles Improved aesthetics and usability of API docs 3 Done
Adaptive batching mechanism Dynamically group incoming inference requests into batches for efficient processing Maximizes hardware utilization and throughput 5 Not Done

@Abellegese
Copy link

The priorities are according to me, you guys just can edit them as you want.

@miquelduranfrigola
Copy link
Member

Thanks @Abellegese - as discussed, it all seems relevant now with the exception of the Inference graph. Thanks for writing this up.

@Abellegese
Copy link

Abellegese commented Feb 19, 2025

Cloud specific system design [very high level -> advantage focused]

Feature Cloud Deployment Benefits
1. Rate Limiting Prevents DDoS attacks via traffic throttling; Controls costs by limiting auto-scaling triggers; Ensures fair resource distribution in multi-tenant environments; Maintains API uptime during traffic spikes.
2. API Gateway Centralizes authentication with cloud-native services; Enables A/B testing through smart routing; Integrates auto-scaling policies; Optimizes latency via edge deployment.
3. Redis Caching Reduces latency using memory-optimized cloud instances; Lowers database costs in pay-as-you-go models; Enables stateless horizontal scaling; Handles traffic surges with auto-scaling.
4. Health Check Integrates with cloud orchestration for auto-healing; Provides native compatibility with monitoring tools; Enables intelligent load balancing; Simplifies debugging via cloud logging.
5. Metrics Optimizes cloud instance sizing; Enables alert integration with incident management systems; Drives auto-scaling based on performance data; Identifies resource bottlenecks.
6. Adaptive Batching Maximizes GPU/CPU utilization on cloud instances; Balances latency with pay-per-use pricing; Integrates with cloud message queues; Optimizes vertical scaling efficiency.
7. Worker Pool Combines VMs with serverless (ideally, but depending on ersilia EC2 plan) workers for hybrid scaling; Prevents VM overload via controlled parallelism; Supports geo-distributed processing; Enhances security through isolation.
8. CORS Policy Strengthens security with cloud WAF integration; Facilitates CDN deployments; Enforces zero-trust access principles; Secures modern web apps on cloud storage.
9. Concurrent Futures Pool Uses spot instances for cost-effective batches; Combines cloud VMs with serverless task queues; Maximizes GPU/CPU utilization; Enables dynamic worker allocation via resource metrics.

@Abellegese
Copy link

Hi @miquelduranfrigola I provide the very high level system design above. The implementation will be as simple as possible.

@Abellegese Abellegese moved this from On Hold to In Progress in Ersilia Model Hub Feb 19, 2025
@miquelduranfrigola
Copy link
Member

Thank you, @Abellegese - this is very helpful. Let's go step by step. Do you need any specific feedback?

@Abellegese
Copy link

Yes I will definitely need that. Let me implement a few of those features and you may give me feedbacks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

3 participants