This repository contains a Jupyter notebook (FastAPI-Llama-HuggingfaceHub-Collab.ipyn
) that demonstrates how to set up and run a FastAPI server with Llama 2 model integration using Google Colab's free T4 GPU.
- Sets up a FastAPI server with Llama 2 model integration
- Uses Google Colab's free GPU for model inference
- Creates a public URL for the API using ngrok
- Provides an example of how to make API requests to the server
The notebook includes the following main sections:
- Installation of dependencies
- Setting up ngrok for creating a public URL
- Creating the FastAPI application
- Starting the FastAPI server
- Using ngrok to create a public URL for the server
- Testing the API with example requests
- Open the
FastAPI-Llama-HuggingfaceHub-Collab.ipynb
notebook in Google Colab - Follow the instructions in the notebook to set up and run the server
- Use the provided ngrok URL to make API requests to the Llama 2 model
- Google Colab account (for free GPU access)
- ngrok account (free tier is sufficient)
Make sure to shut down the server and ngrok processes when you're done using the notebook to free up resources. For more detailed instructions and code explanations, please refer to the comments within the notebook.