pentium3 / sys_reading Public

Notifications You must be signed in to change notification settings
Fork 12
Star 235

Code
Issues 124
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: pentium3/sys_reading

template

#343 by pentium3 was closed Feb 28, 2024

Closed

Labels 40 Milestones 1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

124 Open 215 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Accelerating Retrieval-Augmented Language Model Serving with Speculation

#373 opened Aug 16, 2024 by pentium3

PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design

#372 opened Jun 10, 2024 by pentium3

Data-Juicer: A One-Stop Data Processing System for Large Language Models llm sigmod24

#371 opened Jun 2, 2024 by pentium3

LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices

#370 opened Apr 4, 2024 by pentium3

UnFaaSener: Latency and Cost Aware Offloading of Functions from Serverless Platforms atc23

#369 opened Mar 27, 2024 by pentium3

Nightcore: Efficient and Scalable Serverless Computing for Latency-Sensitive, Interactive Microservices ASPLOS21 microservice serverless

#367 opened Mar 22, 2024 by pentium3

DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving

#366 opened Mar 21, 2024 by pentium3

Cocktail: A Multidimensional Optimization for Model Serving in Cloud nsdi22

#365 opened Mar 16, 2024 by pentium3

Model Selection for Latency-Critical Inference Serving eurosys24

#364 opened Mar 14, 2024 by pentium3

Pronghorn: Effective Checkpoint Orchestration for Serverless Hot-Starts eurosys24

#363 opened Mar 14, 2024 by pentium3

Erlang: Application-Level Autoscaling for Cloud Microservices eurosys24

#362 opened Mar 14, 2024 by pentium3

GMorph: Accelerating Multi-DNN Inference via Model Fusion eurosys24

#361 opened Mar 14, 2024 by pentium3

SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models mlsys24

#360 opened Mar 14, 2024 by pentium3

Punica: Multi-Tenant LoRA Serving mlsys24

#359 opened Mar 14, 2024 by pentium3

HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices mlsys24

#358 opened Mar 14, 2024 by pentium3

DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines mlsys24

#357 opened Mar 14, 2024 by pentium3

Lancet: Accelerating Mixture-of-Experts Training by Overlapping Weight Gradient Computation and All-to-All Communication mlsys24

#356 opened Mar 14, 2024 by pentium3

Subgraph stationary hardware-software inference co-design mlsys23

#355 opened Mar 13, 2024 by pentium3

Tutel: Adaptive Mixture-of-Experts at Scale mlsys23

#354 opened Mar 13, 2024 by pentium3

Pathways: Asynchronous Distributed Dataflow for ML mlsys22

#353 opened Mar 9, 2024 by pentium3

Fast Distributed Inference Serving for Large Language Models

#351 opened Mar 8, 2024 by pentium3

Stateful Large Language Model Serving with Pensieve

#350 opened Mar 8, 2024 by pentium3

DeepSpeed-Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

#349 opened Mar 8, 2024 by pentium3

Efficiently Scaling Transformer Inference llm mlsys23

#348 opened Feb 29, 2024 by pentium3

AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness

#347 opened Feb 29, 2024 by pentium3

Previous 1 2 3 4 5 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly