This repository provides an end-to-end example of using LLMOps practices on Amazon SageMaker for large language models (LLMs). The repository demonstrates a sample LLMOps pipeline for training, optimizing, deploying, monitoring, and managing LLMs on SageMaker using infrastructure as code principles.
Currently implemented:
End-to-End:
Infernece:
- Deploy Llama3 on Amazon SageMaker
- Deploy Mixtral 8x7B on Amazon SageMaker
- Scale LLM Inference on Amazon SageMaker with Multi-Replica Endpoints
- Optimizing LLMs with Quantization (coming soon)
- Monitoring and managing LLMs with CloudWatch (coming soon)
Training:
The repository currently contains:
scripts/
: Scripts for training and deploying LLMs on SageMakernotebooks/
: Examples and tutorials for using the pipeline
Before we can start make sure you have met the following requirements
- AWS Account with quota
- AWS CLI installed
- AWS IAM user configured in CLI with permission to create and manage ec2 instances
Contributions are welcome! Please open issues and pull requests.
This repository is licensed under the MIT License.