Skip to content
Change the repository type filter

All

    Repositories list

    • datatrove

      Public
      Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
      Python
      Apache License 2.0
      159000Updated Dec 17, 2024Dec 17, 2024
    • Container-based software environments used in the TrustLLM EU project.
      Shell
      Apache License 2.0
      2101Updated Dec 2, 2024Dec 2, 2024
    • A native PyTorch Library for large model training
      Python
      BSD 3-Clause "New" or "Revised" License
      235000Updated Nov 27, 2024Nov 27, 2024
    • streaming

      Public
      A Data Streaming Library for Efficient Neural Network Training
      Python
      Apache License 2.0
      149000Updated Nov 4, 2024Nov 4, 2024
    • Python
      Apache License 2.0
      176000Updated Oct 12, 2024Oct 12, 2024
    • composer

      Public
      Supercharge Your Model Training
      Python
      Apache License 2.0
      430000Updated Oct 12, 2024Oct 12, 2024
    • LLM training code for Databricks foundation models
      Python
      Apache License 2.0
      536000Updated Oct 12, 2024Oct 12, 2024
    • Ongoing research training transformer models at scale
      Python
      Other
      2.5k000Updated Aug 22, 2024Aug 22, 2024
    • NeMo

      Public
      A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
      Python
      Apache License 2.0
      2.6k000Updated Aug 21, 2024Aug 21, 2024