Skip to content

Latest commit

 

History

History
42 lines (28 loc) · 1.5 KB

README.md

File metadata and controls

42 lines (28 loc) · 1.5 KB

FastAPI Llama2 Huggingface Hub API

This repository contains a Jupyter notebook (FastAPI-Llama-HuggingfaceHub-Collab.ipyn) that demonstrates how to set up and run a FastAPI server with Llama 2 model integration using Google Colab's free T4 GPU.

Features

  • Sets up a FastAPI server with Llama 2 model integration
  • Uses Google Colab's free GPU for model inference
  • Creates a public URL for the API using ngrok
  • Provides an example of how to make API requests to the server

Contents

The notebook includes the following main sections:

  1. Installation of dependencies
  2. Setting up ngrok for creating a public URL
  3. Creating the FastAPI application
  4. Starting the FastAPI server
  5. Using ngrok to create a public URL for the server
  6. Testing the API with example requests

Usage

  1. Open the FastAPI-Llama-HuggingfaceHub-Collab.ipynb notebook in Google Colab
  2. Follow the instructions in the notebook to set up and run the server
  3. Use the provided ngrok URL to make API requests to the Llama 2 model

Requirements

  • Google Colab account (for free GPU access)
  • ngrok account (free tier is sufficient)

Note

Make sure to shut down the server and ngrok processes when you're done using the notebook to free up resources. For more detailed instructions and code explanations, please refer to the comments within the notebook.

License

Copyright © 2024 tooniez. This project is MIT licensed.