Releases: modular/modular
Modular 25.5
Modular Platform 25.5 is here, and introduces Large Scale Batch Inference: a highly asynchronous, at-scale batch API built on open standards and powered by Mammoth.
This release also features the open source launch of the MAX Graph API and expanded support for writing custom PyTorch operators directly in MAX. In addition, weβve made Modular Platform development and deployment easier with optimized Docker containers and new standalone Mojo Conda packages. Check out all of 25.5βs updates with the full MAX and Mojo changelogs.
Modular 25.4
We're excited to announce Modular Platform 25.4, a major release that brings the full power of AMD GPUs to our entire platform. This release marks a major leap toward democratizing access to high-performance AI by enabling seamless portability to AMD GPUs. Developers can now build and deploy models optimized for peak performance, with zero reliance on any single hardware vendorβunlocking greater flexibility, lower costs, and broader access to compute.
For more details, see the 25.4 changelog and the release blog post.
Modular 25.3
Modular Platform's 25.3 release introduces a unified pip install modular
package, granting access to Mojo and MAX. This release open-sources MAX Kernels and Serving APIs, totaling over 500,000 lines of code. Google Colab support is now available, enabling execution of MAX models, and a simplified community license for MAX and Mojo, aiming to lower entry barriers. This update reflects a commitment to building in the open and putting the community first.
For additional details, checkout the changelog.
MAX 25.2
Announcing MAX 25.2, featuring significant enhancements for large-scale AI deployment and GPU optimization. This release adds comprehensive NVIDIA Hopper support with high-performance kernels, multi-GPU tensor parallelism for large models like Llama-3.3-70B, and expanded model support (Phi3, Olmo, Granite). Key additions include GPTQ quantization for memory efficiency, advanced long context optimizations (in-flight batching, chunked prefill, copy-on-write), and improved kernel caching reducing compilation times up to 28%. New Mojo GPU APIs offer developers greater control and performance.
For additional details, checkout the changelog.
Mojo 25.1
Release 25.1
We're excited to announce the release of MAX 25.1, marking a significant evolution in our approach to delivering cutting-edge AI development tools to our community. This release substantially improves the developer experience for Agentic and LLM workflows, introduces a new nightly release model that includes a new GPU programming interface, and launches MAX Builds - your one-stop destination for GenAI development.
For additional details, checkout the changelog
Mojo 24.6
Release 24.6
We are excited to announce the release of MAX 24.6, featuring a preview of MAX GPU! At the heart of the MAX 24.6 release is MAX GPU β the first vertically integrated Generative AI serving stack that eliminates the dependency on vendor-specific computation libraries like NVIDIAβs CUDA.
MAX GPU is built on two groundbreaking technologies. The first is MAX Engine, a high-performance AI model compiler and runtime built with innovative Mojo GPU kernels for NVIDIA GPUsβfree from CUDA or ROCm dependencies. The second is MAX Serve, a sophisticated Python-native serving layer specifically engineered for LLM applications. MAX Serve expertly handles complex request batching and scheduling, delivering consistent and reliable performance, even under heavy workloads.
For additional details, checkout the changelog and the release announcement.
Mojo 24.5
Release 24.5
We are excited to announce the release of MAX 24.5! This release includes support for installing MAX as a conda package with magic, a powerful new package and virtual environment manager. Weβre also introducing two new Python APIs for MAX Graph and MAX Driver, which will ultimately provide the same low-level programming interface as the Mojo Graph API. MAX Engine has improved performance for Llama3, with 24.5 generating tokens for Llama an average of 15% to 48% faster. Lastly, this release also adds support for Python 3.12, and drops support for Python 3.8 and Ubuntu 20.04.
For additional details, checkout the changelog and the release announcement.