FastAPI Llama2 Huggingface Hub API

This repository contains a Jupyter notebook (FastAPI-Llama-HuggingfaceHub-Collab.ipyn) that demonstrates how to set up and run a FastAPI server with Llama 2 model integration using Google Colab's free T4 GPU.

Features

Sets up a FastAPI server with Llama 2 model integration
Uses Google Colab's free GPU for model inference
Creates a public URL for the API using ngrok
Provides an example of how to make API requests to the server

Usage

Open the FastAPI-Llama-HuggingfaceHub-Collab.ipynb notebook in Google Colab
Follow the instructions in the notebook to set up and run the server
Use the provided ngrok URL to make API requests to the Llama 2 model

Requirements

Google Colab account (for free GPU access)
ngrok account (free tier is sufficient)

Note

Make sure to shut down the server and ngrok processes when you're done using the notebook to free up resources. For more detailed instructions and code explanations, please refer to the comments within the notebook.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

FastAPI Llama2 Huggingface Hub API

Features

Contents

Usage

Requirements

Note

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

FastAPI Llama2 Huggingface Hub API

Features

Contents

Usage

Requirements

Note

License