Skip to content

Pinned Loading

  1. FineInfer FineInfer Public

    Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)

    Python 11 1

Repositories

Showing 2 of 2 repositories
  • understanding-gpu-architecture-implications-on-llm-serving-workloads Public

    Understanding GPU Architecture Implications on LLM Serving Workloads (Master Thesis, ETH Zürich, 2024)

    llm-db/understanding-gpu-architecture-implications-on-llm-serving-workloads’s past year of commit activity
    Python 0 0 0 0 Updated Oct 24, 2024
  • FineInfer Public

    Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)

    llm-db/FineInfer’s past year of commit activity
    Python 11 MIT 1 0 0 Updated May 28, 2024

Top languages

Loading…

Most used topics

Loading…