Neuron SDK Release - April 10, 2024
Neuron 2.18.1 release introduces Continuous batching(beta) and Neuron vLLM integration(beta) support in Transformers NeuronX library that improves LLM inference throughput. This release also fixes hang issues related to Triton Inference Server as well as updating Neuron DLAMIs and DLCs with this release(2.18.1). See more in Transformers Neuron (transformers-neuronx) release notes and Neuron Compiler (neuronx-cc) release notes