-
Notifications
You must be signed in to change notification settings - Fork 12
Issues: pentium3/sys_reading
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Accelerating Retrieval-Augmented Language Model Serving with Speculation
#373
opened Aug 16, 2024 by
pentium3
PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design
#372
opened Jun 10, 2024 by
pentium3
Data-Juicer: A One-Stop Data Processing System for Large Language Models
llm
sigmod24
#371
opened Jun 2, 2024 by
pentium3
LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices
#370
opened Apr 4, 2024 by
pentium3
UnFaaSener: Latency and Cost Aware Offloading of Functions from Serverless Platforms
atc23
#369
opened Mar 27, 2024 by
pentium3
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
#366
opened Mar 21, 2024 by
pentium3
Cocktail: A Multidimensional Optimization for Model Serving in Cloud
nsdi22
#365
opened Mar 16, 2024 by
pentium3
Model Selection for Latency-Critical Inference Serving
eurosys24
#364
opened Mar 14, 2024 by
pentium3
Pronghorn: Effective Checkpoint Orchestration for Serverless Hot-Starts
eurosys24
#363
opened Mar 14, 2024 by
pentium3
Erlang: Application-Level Autoscaling for Cloud Microservices
eurosys24
#362
opened Mar 14, 2024 by
pentium3
GMorph: Accelerating Multi-DNN Inference via Model Fusion
eurosys24
#361
opened Mar 14, 2024 by
pentium3
SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models
mlsys24
#360
opened Mar 14, 2024 by
pentium3
HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices
mlsys24
#358
opened Mar 14, 2024 by
pentium3
DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines
mlsys24
#357
opened Mar 14, 2024 by
pentium3
Subgraph stationary hardware-software inference co-design
mlsys23
#355
opened Mar 13, 2024 by
pentium3
DeepSpeed-Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
#349
opened Mar 8, 2024 by
pentium3
AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness
#347
opened Feb 29, 2024 by
pentium3
Previous Next
ProTip!
Follow long discussions with comments:>50.