A curated list of 50+ resources to help you become a Generative AI Data Scientist. This repository includes resources on building GenAI applications with Large Language Models (LLMs), and deploying LLMs and GenAI with Cloud-based solutions.
NOTE - This is a work in progress. So that you know, changes and additions are welcome. Please use Pull Requests to suggest modifications and improvements.
- Awesome Real World AI Use Cases
- Python Libraries
- Examples and Cookbooks
- Newsletters
- Courses and Training
- Awesome LLM Apps: LLM RAG AI Apps with Step-By-Step Tutorials
- AI Data Science Team: An AI-powered data science team of copilots that uses agents to help you perform common data science tasks 10X faster.
- AI Hedge Fund: Proof of concept for an AI-powered hedge fund
- AI Financial Agent: A financial agent for investment research
- LangChain: A framework for developing applications powered by large language models (LLMs). Documentation Github Cookbook
- LangGraph: A library for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. Documentation Tutorials
- LlamaIndex: LlamaIndex is a framework for building context-augmented generative AI applications with LLMs. Documentation Github
- LlamaIndex Workflows: LlamaIndex workflows is a mechanism for orchestrating actions in the increasingly-complex AI application we see our users building.
- CrewAI: Streamline workflows across industries with powerful AI agents. Documentation Github
- AutoGen - A programming framework for agentic AI by Microsoft.
- LangFlow: A low-code tool that makes building powerful AI agents and workflows that can use any API, model, or database easier. Documentation Github
- Pydantic AI: Python agent framework designed to make it less painful to build production grade applications with Generative AI. Github
- OpenAI: The official Python library for the OpenAI API
- Hugging Face Models: Open LLM models by Meta, Mistral, and hundreds of other providers
- Anthropic Claude: The official Python library for the Anthropic API
- Meta Llama Models: The open source AI model you can fine-tune, distill and deploy anywhere.
- Google Gemini: The official Python library for the Google Gemini API
- Ollama: Get up and running with large language models locally.
- Grok: The official Python Library for the Groq API
- Huggingface: An open-source platform for machine learning (ML) and artificial intelligence (AI) tools and models. Documentation
- ChromaDB: The fastest way to build Python or JavaScript LLM apps with memory!
- FAISS: A library for efficient similarity search and clustering of dense vectors.
- Qdrant: High-Performance Vector Search at Scale
- Pinecone: The official Pinecone Python SDK.
- Milvus: Milvus is an open-source vector database built to power embedding similarity search and AI applications.
- PyTorch - PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing.
- TensorFlow - TensorFlow is an open source machine learning library developed by Google.
- JAX - Google’s library for high-performance computing and automatic differentiation.
- tinygrad - A minimalistic deep learning library with a focus on simplicity and educational use, created by George Hotz.
- micrograd - A simple, lightweight autograd engine for educational purposes, created by Andrej Karpathy.
- Transformers - Hugging Face Transformers is a popular library for Natural Language Processing (NLP) tasks, including fine-tuning large language models.
- Unsloth - Finetune Llama 3.2, Mistral, Phi-3.5 & Gemma 2-5x faster with 80% less memory!
- LitGPT - 20+ high-performance LLMs with recipes to pretrain, finetune, and deploy at scale.
- AutoTrain - No code fine-tuning of LLMs and other machine learning tasks.
- Opik - Opik is an open-source platform for evaluating, testing and monitoring LLM applications
- Embedchain: Create an AI app on your own data in a minute Documentation Github Repo
- Docling by IBM: Parse documents and export them to the desired format with ease and speed. Github
- Markitdown by Microsoft: Python tool for converting files and office documents to Markdown.
- Gitingest: Turn any Git repository into a simple text ingest of its codebase. This is useful for feeding a codebase into any LLM. Github
- Mem0: Mem0 is a self-improving memory layer for LLM applications, enabling personalized AI experiences that save costs and delight users. Documentation Github
- Memary: Open Source Memory Layer For Autonomous Agents
- AdalFlow - The library to build & auto-optimize LLM applications, from Chatbot, RAG, to Agent by SylphAI.
- dspy - DSPy: The framework for programming—not prompting—foundation models.
- AutoPrompt: A framework for prompt tuning using Intent-based Prompt Calibration.
- PromptFify: A library for prompt engineering that simplifies NLP tasks (e.g., NER, classification) using LLMs like GPT.
- LiteLLM: Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format.
- LLMOps: Best practices designed to support your LLMOps initiatives
- Jupyter Agent: Let a LLM agent write and execute code inside a notebook
- Jupyter AI: A generative AI extension for JupyterLab Documentation
- Pyspur: Graph-Based Editor for LLM Workflows
- Browser-Use: Make websites accessible for AI agents
- Agenta: Open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place. Documentation
- AWS Bedrock: Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon
- Microsoft Azure AI Services: Azure AI services help developers and organizations rapidly create intelligent, cutting-edge, market-ready, and responsible applications with out-of-the-box and prebuilt and customizable APIs and models.
- Google Vertex AI: Vertex AI is a fully-managed, unified AI development platform for building and using generative AI.
- NVIDIA NIM: NVIDIA NIM™, part of NVIDIA AI Enterprise, provides containers to self-host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations.
- LangChain Cookbook: Example code for building applications with LangChain, with an emphasis on more applied and end-to-end examples.
- LangGraph Examples: Example code for building applications with LangGraph
- Llama Index Examples: Example code for building applications with Llama Index
- Streamlit LLM Examples: Streamlit LLM app examples for getting started
- Azure Generative AI Examples: Prompt Flow and RAG Examples for use with the Microsoft Azure Cloud platform
- Amazon Bedrock Workshop: Introduces how to leverage foundation models (FMs) through Amazon Bedrock
- Microsoft Generative AI for Beginners 21 Lessons teaching everything you need to know to start building Generative AI applications Github
- Microsoft Intro to Generative AI Course
- Google Vertex AI Examples: Notebooks, code samples, sample apps, and other resources that demonstrate how to use, develop and manage machine learning and generative AI workflows using Google Cloud Vertex AI
- Google Generative AI Examples: Sample code and notebooks for Generative AI on Google Cloud, with Gemini on Vertex AI
- NVIDIA NIM Anywhere: An entry point for developing with NIMs that natively scales out to full-sized labs and up to production environments.
- NVIDIA NIM Deploy: Reference implementations, example documents, and architecture guides that can be used as a starting point to deploy multiple NIMs and other NVIDIA microservices into Kubernetes and other production deployment environments.
- Python AI/ML Tips - Free newsletter on Generative AI and Data Science.
- unwind ai - Latest AI news, tools, and tutorials for AI Developers
- Generative AI Data Scientist Workshops Get free training on how to build and deploy Generative AI / ML Solutions. Register for the next free workshop here.
- 8-Week AI Bootcamp by Business Science: Focused on helping you become a Generative AI Data Scientist. Learn How To Build and Deploy AI-Powered Data Science Solutions using LangChain, LangGraph, Pandas, Scikit Learn, Streamlit, AWS, Bedrock, and EC2.