Skip to content

A curated list of 50+ resources for building and deploying generative AI specifically focusing on helping you become a Generative AI Data Scientist with LLMs

Notifications You must be signed in to change notification settings

business-science/awesome-generative-ai-data-scientist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 

Repository files navigation

Awesome Generative AI Data Scientist Awesome

A curated list of 50+ resources to help you become a Generative AI Data Scientist. This repository includes resources on building GenAI applications with Large Language Models (LLMs), and deploying LLMs and GenAI with Cloud-based solutions.

NOTE - This is a work in progress. So that you know, changes and additions are welcome. Please use Pull Requests to suggest modifications and improvements.

Contents:

Awesome Real-World AI Use Cases

Python Libraries

AI LLM Frameworks

  • LangChain: A framework for developing applications powered by large language models (LLMs). Documentation Github Cookbook
  • LangGraph: A library for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. Documentation Tutorials
  • LlamaIndex: LlamaIndex is a framework for building context-augmented generative AI applications with LLMs. Documentation Github
  • LlamaIndex Workflows: LlamaIndex workflows is a mechanism for orchestrating actions in the increasingly-complex AI application we see our users building.
  • CrewAI: Streamline workflows across industries with powerful AI agents. Documentation Github
  • AutoGen - A programming framework for agentic AI by Microsoft.
  • LangFlow: A low-code tool that makes building powerful AI agents and workflows that can use any API, model, or database easier. Documentation Github
  • Pydantic AI: Python agent framework designed to make it less painful to build production grade applications with Generative AI. Github

LLM Models and Providers

  • OpenAI: The official Python library for the OpenAI API
  • Hugging Face Models: Open LLM models by Meta, Mistral, and hundreds of other providers
  • Anthropic Claude: The official Python library for the Anthropic API
  • Meta Llama Models: The open source AI model you can fine-tune, distill and deploy anywhere.
  • Google Gemini: The official Python library for the Google Gemini API
  • Ollama: Get up and running with large language models locally.
  • Grok: The official Python Library for the Groq API

Huggingface Platform

  • Huggingface: An open-source platform for machine learning (ML) and artificial intelligence (AI) tools and models. Documentation

Vector Databases (RAG)

  • ChromaDB: The fastest way to build Python or JavaScript LLM apps with memory!
  • FAISS: A library for efficient similarity search and clustering of dense vectors.
  • Qdrant: High-Performance Vector Search at Scale
  • Pinecone: The official Pinecone Python SDK.
  • Milvus: Milvus is an open-source vector database built to power embedding similarity search and AI applications.

Pretraining

  • PyTorch - PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing.
  • TensorFlow - TensorFlow is an open source machine learning library developed by Google.
  • JAX - Google’s library for high-performance computing and automatic differentiation.
  • tinygrad - A minimalistic deep learning library with a focus on simplicity and educational use, created by George Hotz.
  • micrograd - A simple, lightweight autograd engine for educational purposes, created by Andrej Karpathy.

Fine-tuning

  • Transformers - Hugging Face Transformers is a popular library for Natural Language Processing (NLP) tasks, including fine-tuning large language models.
  • Unsloth - Finetune Llama 3.2, Mistral, Phi-3.5 & Gemma 2-5x faster with 80% less memory!
  • LitGPT - 20+ high-performance LLMs with recipes to pretrain, finetune, and deploy at scale.
  • AutoTrain - No code fine-tuning of LLMs and other machine learning tasks.

Testing and Monitoring

  • Opik - Opik is an open-source platform for evaluating, testing and monitoring LLM applications

Document Parsing

LLM Memory

  • Mem0: Mem0 is a self-improving memory layer for LLM applications, enabling personalized AI experiences that save costs and delight users. Documentation Github
  • Memary: Open Source Memory Layer For Autonomous Agents

Miscellaneous

  • AdalFlow - The library to build & auto-optimize LLM applications, from Chatbot, RAG, to Agent by SylphAI.
  • dspy - DSPy: The framework for programming—not prompting—foundation models.
  • AutoPrompt: A framework for prompt tuning using Intent-based Prompt Calibration.
  • PromptFify: A library for prompt engineering that simplifies NLP tasks (e.g., NER, classification) using LLMs like GPT.
  • LiteLLM: Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format.
  • LLMOps: Best practices designed to support your LLMOps initiatives
  • Jupyter Agent: Let a LLM agent write and execute code inside a notebook
  • Jupyter AI: A generative AI extension for JupyterLab Documentation
  • Pyspur: Graph-Based Editor for LLM Workflows
  • Browser-Use: Make websites accessible for AI agents
  • Agenta: Open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place. Documentation

LLM Deployment (Cloud Services)

  • AWS Bedrock: Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon
  • Microsoft Azure AI Services: Azure AI services help developers and organizations rapidly create intelligent, cutting-edge, market-ready, and responsible applications with out-of-the-box and prebuilt and customizable APIs and models.
  • Google Vertex AI: Vertex AI is a fully-managed, unified AI development platform for building and using generative AI.
  • NVIDIA NIM: NVIDIA NIM™, part of NVIDIA AI Enterprise, provides containers to self-host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations.

Examples and Cookbooks

Building AI

Deploying AI

Amazon Web Services (AWS)

Microsoft Azure

Google Cloud Platform (GCP)

  • Google Vertex AI Examples: Notebooks, code samples, sample apps, and other resources that demonstrate how to use, develop and manage machine learning and generative AI workflows using Google Cloud Vertex AI
  • Google Generative AI Examples: Sample code and notebooks for Generative AI on Google Cloud, with Gemini on Vertex AI

NVIDIA

  • NVIDIA NIM Anywhere: An entry point for developing with NIMs that natively scales out to full-sized labs and up to production environments.
  • NVIDIA NIM Deploy: Reference implementations, example documents, and architecture guides that can be used as a starting point to deploy multiple NIMs and other NVIDIA microservices into Kubernetes and other production deployment environments.

Newsletters

  • Python AI/ML Tips - Free newsletter on Generative AI and Data Science.
  • unwind ai - Latest AI news, tools, and tutorials for AI Developers

Courses and Training

Free Training

Paid Courses

  • 8-Week AI Bootcamp by Business Science: Focused on helping you become a Generative AI Data Scientist. Learn How To Build and Deploy AI-Powered Data Science Solutions using LangChain, LangGraph, Pandas, Scikit Learn, Streamlit, AWS, Bedrock, and EC2.

About

A curated list of 50+ resources for building and deploying generative AI specifically focusing on helping you become a Generative AI Data Scientist with LLMs

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published