Skip to content
Change the repository type filter

All

    Repositories list

    • triteia

      Public
      Useful Kernels for ML in Triton
      Cuda
      Apache License 2.0
      0020Updated Dec 18, 2024Dec 18, 2024
    • A native PyTorch Library for large model training
      Python
      BSD 3-Clause "New" or "Revised" License
      224000Updated Dec 18, 2024Dec 18, 2024
    • deltazip

      Public
      Compression for Foundation Models
      Jupyter Notebook
      Apache License 2.0
      31901Updated Dec 12, 2024Dec 12, 2024
    • vidur

      Public
      A large-scale simulation framework for LLM inference
      Python
      MIT License
      49000Updated Dec 11, 2024Dec 11, 2024
    • Nuts and bolts for evaluation of models trained in context of mixtera
      Python
      0000Updated Dec 11, 2024Dec 11, 2024
    • modyn

      Public
      Modyn is a research-platform for training ML models on growing datasets.
      Python
      MIT License
      535923Updated Dec 11, 2024Dec 11, 2024
    • dirigent

      Public
      Dirigent: Lightweight Serverless Orchestration
      Go
      MIT License
      32200Updated Dec 8, 2024Dec 8, 2024
    • Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
      Python
      Apache License 2.0
      66000Updated Nov 28, 2024Nov 28, 2024
    • pccheck

      Public
      Python
      MIT License
      0000Updated Nov 8, 2024Nov 8, 2024
    • Contains instructions and scripts for the ATC'24 Pecan artifact evaluation.
      Python
      Apache License 2.0
      1000Updated Oct 14, 2024Oct 14, 2024
    • fmengine

      Public
      Utilities for Training Very Large Models
      Python
      105640Updated Sep 25, 2024Sep 25, 2024
    • cachew

      Public
      ML Input Data Processing as a Service. This repository contains the source code for Cachew (built on top of TensorFlow).
      C++
      Apache License 2.0
      74k3602Updated Sep 10, 2024Sep 10, 2024
    • nanotron

      Public
      Minimalistic large language model 3D-parallelism training
      Python
      Apache License 2.0
      132000Updated Jul 19, 2024Jul 19, 2024
    • mlibc

      Public
      Portable C standard library
      C
      MIT License
      135000Updated Jul 15, 2024Jul 15, 2024
    • Hosts CGLM metadata
      0000Updated Jun 6, 2024Jun 6, 2024
    • orion

      Public
      An interference-aware scheduler for fine-grained GPU sharing
      Python
      MIT License
      1711491Updated May 12, 2024May 12, 2024
    • serving

      Public
      Kubernetes-based, scale-to-zero, request-driven compute
      Go
      Apache License 2.0
      1.2k000Updated May 7, 2024May 7, 2024
    • Starter code for semester project in Cloud Computing Architecture course at ETH Zurich
      Python
      8400Updated Mar 13, 2024Mar 13, 2024
    • Copy node connection information easily
      JavaScript
      0100Updated Mar 13, 2024Mar 13, 2024
    • rWasm

      Public
      A cross-platform high-performance provably-safe sandboxing Wasm-to-native compiler
      Rust
      6000Updated Jan 14, 2024Jan 14, 2024
    • ML Input Data Processing as a Service
      Python
      Apache License 2.0
      2801Updated Oct 20, 2023Oct 20, 2023
    • Python
      Other
      0000Updated Aug 16, 2023Aug 16, 2023
    • airflow

      Public
      Python
      Apache License 2.0
      1001Updated Feb 17, 2023Feb 17, 2023
    • varuna

      Public
      Python
      29000Updated Jul 25, 2022Jul 25, 2022
    • adaptdl

      Public
      Resource-adaptive cluster scheduler for deep learning training.
      Python
      Apache License 2.0
      79000Updated May 24, 2022May 24, 2022
    • elastic

      Public
      PyTorch elastic training
      Python
      BSD 3-Clause "New" or "Revised" License
      98000Updated Apr 14, 2022Apr 14, 2022
    • CheckFreq

      Public
      Python
      MIT License
      23000Updated Dec 20, 2021Dec 20, 2021
    • ray

      Public
      An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
      Python
      Apache License 2.0
      5.9k000Updated Sep 21, 2021Sep 21, 2021
    • Load generator for memcached (multi threaded, multi machine)
      C++
      BSD 3-Clause "New" or "Revised" License
      69000Updated Apr 14, 2021Apr 14, 2021