Skip to content

Jupyter notebook to run a FastAPI server with Llama 2 model integration using Google Colab's free T4 GPU.

License

Notifications You must be signed in to change notification settings

tooniez/fastapi-llama-hub-collab

Repository files navigation

FastAPI Llama2 Huggingface Hub API

This repository contains a Jupyter notebook (FastAPI-Llama-HuggingfaceHub-Collab.ipyn) that demonstrates how to set up and run a FastAPI server with Llama 2 model integration using Google Colab's free T4 GPU.

Features

  • Sets up a FastAPI server with Llama 2 model integration
  • Uses Google Colab's free GPU for model inference
  • Creates a public URL for the API using ngrok
  • Provides an example of how to make API requests to the server

Contents

The notebook includes the following main sections:

  1. Installation of dependencies
  2. Setting up ngrok for creating a public URL
  3. Creating the FastAPI application
  4. Starting the FastAPI server
  5. Using ngrok to create a public URL for the server
  6. Testing the API with example requests

Usage

  1. Open the FastAPI-Llama-HuggingfaceHub-Collab.ipynb notebook in Google Colab
  2. Follow the instructions in the notebook to set up and run the server
  3. Use the provided ngrok URL to make API requests to the Llama 2 model

Requirements

  • Google Colab account (for free GPU access)
  • ngrok account (free tier is sufficient)

Note

Make sure to shut down the server and ngrok processes when you're done using the notebook to free up resources. For more detailed instructions and code explanations, please refer to the comments within the notebook.

License

Copyright © 2024 tooniez. This project is MIT licensed.

About

Jupyter notebook to run a FastAPI server with Llama 2 model integration using Google Colab's free T4 GPU.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published