Skip to content

Commit

Permalink
Documentation Pass for Models (#2617)
Browse files Browse the repository at this point in the history
* links in chinese_clip

* links for clip model

* add mod docs for flux and llava

* module doc for MMDIT and MIMI

* add docs for a few more modesl

* mod docs for bert naser and beit

* add module docs for convmixer colpali codegeex and chatglm

* add another series of moddocs

* add  fastvit-llama2_c

* module docs mamba -> mobileone

* module docs from moondream-phi3

* mod docs for quantized and qwen

* update to yi

* fix long names

* Update llama2_c.rs

* Update llama2_c_weights.rs

* Fix the link for mimi + tweaks

---------

Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
  • Loading branch information
zachcp and LaurentMazare authored Nov 15, 2024
1 parent 0ed24b9 commit f689ce5
Show file tree
Hide file tree
Showing 94 changed files with 1,001 additions and 51 deletions.
7 changes: 3 additions & 4 deletions candle-transformers/src/models/based.rs
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
//! Based from the Stanford Hazy Research group.
//!
//! See "Simple linear attention language models balance the recall-throughput tradeoff", Arora et al. 2024
//! <https://arxiv.org/abs/2402.18668>

//! Original code:
//! https://github.com/HazyResearch/based
//! - [Arxiv](https://arxiv.org/abs/2402.18668)
//! - [Github](https://github.com/HazyResearch/based)
//!

use candle::{DType, Device, IndexOp, Module, Result, Tensor, D};
use candle_nn::{
Expand Down
7 changes: 7 additions & 0 deletions candle-transformers/src/models/beit.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
//! Based on the BEIT vision-language model.
//!
//! See "BEIT: BERT Pre-Training of Image Transformers", Bao et al. 2021
//! - [Arxiv](https://arxiv.org/abs/2106.08254)
//! - [Github](https://github.com/microsoft/unilm/tree/master/beit)
//!

use candle::{DType, Device, IndexOp, Result, Tensor, D};
use candle_nn::{layer_norm, LayerNorm, Linear, Module, VarBuilder};

Expand Down
6 changes: 6 additions & 0 deletions candle-transformers/src/models/bert.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
//! BERT (Bidirectional Encoder Representations from Transformers)
//!
//! See "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding", Devlin et al. 2018
//! - [Arxiv](https://arxiv.org/abs/1810.04805)
//! - [Github](https://github.com/google-research/bert)
//!
use super::with_tracing::{layer_norm, linear, LayerNorm, Linear};
use candle::{DType, Device, Result, Tensor};
use candle_nn::{embedding, Embedding, Module, VarBuilder};
Expand Down
7 changes: 7 additions & 0 deletions candle-transformers/src/models/bigcode.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
//! BigCode implementation in Rust based on the GPT-BigCode model.
//!
//! See "StarCoder: A State-of-the-Art LLM for Code", Mukherjee et al. 2023
//! - [Arxiv](https://arxiv.org/abs/2305.06161)
//! - [Github](https://github.com/bigcode-project/starcoder)
//!

use candle::{DType, Device, IndexOp, Result, Tensor, D};
use candle_nn::{embedding, linear_b as linear, Embedding, LayerNorm, Linear, Module, VarBuilder};

Expand Down
7 changes: 7 additions & 0 deletions candle-transformers/src/models/blip.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
//! Based on the BLIP paper from Salesforce Research.
//!
//! See "BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation"
//! - [Arxiv](https://arxiv.org/abs/2201.12086)
//! - [Github](https://github.com/salesforce/BLIP)
//!

use super::blip_text;
use super::with_tracing::{conv2d, linear, Conv2d, Linear};
use candle::{Module, Result, Tensor, D};
Expand Down
6 changes: 6 additions & 0 deletions candle-transformers/src/models/blip_text.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
//! Implementation of BLIP text encoder/decoder.
//!
//! See "BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation"
//! https://arxiv.org/abs/2201.12086
//!

use super::with_tracing::{linear, Embedding, Linear};
use candle::{Module, Result, Tensor, D};
use candle_nn::{layer_norm, LayerNorm, VarBuilder};
Expand Down
7 changes: 7 additions & 0 deletions candle-transformers/src/models/chatglm.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
//! Implementation of the ChatGLM2/3 models from THUDM.
//!
//! See:
//! - ChatGLM3: ["ChatGLM3: Advancing Multilingual Conversational Language Models with High-Quality Data"](https://github.com/THUDM/ChatGLM3)
//! - ChatGLM2: ["ChatGLM2: An Open Bilingual Chat LLM"](https://github.com/THUDM/ChatGLM2-6B)
//!

use crate::models::with_tracing::{linear_b as linear, Linear};
use candle::{DType, Device, IndexOp, Module, Result, Tensor, D};
use candle_nn::VarBuilder;
Expand Down
5 changes: 3 additions & 2 deletions candle-transformers/src/models/chinese_clip/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
//! Chinese contrastive Language-Image Pre-Training (CLIP) is an architecture trained on
//! pairs of images with related texts.
//!
//! https://github.com/OFA-Sys/Chinese-CLIP
//! https://github.com/huggingface/transformers/blob/5af7d41e49bbfc8319f462eb45253dcb3863dfb7/src/transformers/models/chinese_clip/modeling_chinese_clip.py
//! - [GH Link](https://github.com/OFA-Sys/Chinese-CLIP)
//! - Transformers Python [reference implementation](https://github.com/huggingface/transformers/blob/5af7d41e49bbfc8319f462eb45253dcb3863dfb7/src/transformers/models/chinese_clip/modeling_chinese_clip.py)
//!

use candle::{Module, Result, Tensor, D};
use candle_nn as nn;
Expand Down
5 changes: 3 additions & 2 deletions candle-transformers/src/models/clip/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
//! Contrastive Language-Image Pre-Training (CLIP) is an architecture trained on
//! pairs of images with related texts.
//!
//! https://github.com/openai/CLIP
//! https://github.com/huggingface/transformers/tree/f6fa0f0bf0796ac66f201f23bdb8585de1609add/src/transformers/models/clip
//! - [GH Link](https://github.com/openai/CLIP)
//! - Transformers Python [reference implementation](https://github.com/huggingface/transformers/tree/f6fa0f0bf0796ac66f201f23bdb8585de1609add/src/transformers/models/clip)

use self::{
text_model::{Activation, ClipTextTransformer},
vision_model::ClipVisionTransformer,
Expand Down
7 changes: 7 additions & 0 deletions candle-transformers/src/models/codegeex4_9b.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
//! CodeGeeX4 - A multi-language code generation model
//!
//! See "CodeGeeX: A Pre-Trained Model For Code Generation with Multilingual Evaluations on HumanEval-X", Qian et al. 2023
//! - [Arxiv](https://arxiv.org/abs/2303.17568)
//! - [Github](https://github.com/THUDM/CodeGeeX)
//!

use crate::models::with_tracing::{linear_b as linear, Linear};
use candle::{DType, Device, IndexOp, Module, Result, Tensor, D};
use candle_nn::VarBuilder;
Expand Down
5 changes: 5 additions & 0 deletions candle-transformers/src/models/colpali.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
//! Colpali Model for text/image similarity scoring.
//!
//! Colpali combines a vision encoder with an efficient LM for retrieving content.
//!

use candle::{Module, Result, Tensor};
use candle_nn::VarBuilder;

Expand Down
7 changes: 7 additions & 0 deletions candle-transformers/src/models/convmixer.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
//! ConvMixer implementation.
//!
//! See "Patches Are All You Need?" by Trockman et al. 2022
//! - [Arxiv](https://arxiv.org/abs/2201.09792)
//! - [Github](https://github.com/locuslab/convmixer)
//!

use candle::Result;
use candle_nn::{batch_norm, Conv2dConfig, Module, VarBuilder};

Expand Down
14 changes: 6 additions & 8 deletions candle-transformers/src/models/convnext.rs
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
//! ConvNeXt implementation.
//!
//! See "A ConvNet for the 2020s" Liu et al. 2022
//! <https://arxiv.org/abs/2201.03545>
//! See ["A ConvNet for the 2020s" Liu et al. 2022](https://arxiv.org/abs/2201.03545)
//! and
//! "ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders" Woo et al. 2023
//! <https://arxiv.org/abs/2301.00808>

//! ["ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders" Woo et al. 2023](https://arxiv.org/abs/2301.00808)
//!
//! Original code:
//! https://github.com/facebookresearch/ConvNeXt/
//! https://github.com/facebookresearch/ConvNeXt-V2/
//! timm: https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/convnext.py
//! - [ConvNeXt](https://github.com/facebookresearch/ConvNeXt/)
//! - [ConvNeXt-V2](https://github.com/facebookresearch/ConvNeXt-V2/)
//! - [timm](https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/convnext.py)

use candle::shape::ShapeWithOneHole;
use candle::{Result, D};
Expand Down
7 changes: 6 additions & 1 deletion candle-transformers/src/models/dac.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
/// Adapted from https://github.com/descriptinc/descript-audio-codec
//! Implementation of the Descript Audio Codec (DAC) model
//!
//! See: [Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec)
//!
/// An efficient neural codec for compressing/decompressing audio
///
use crate::models::encodec;
use candle::{IndexOp, Result, Tensor, D};
use candle_nn::{Conv1d, Conv1dConfig, ConvTranspose1d, ConvTranspose1dConfig, VarBuilder};
Expand Down
6 changes: 6 additions & 0 deletions candle-transformers/src/models/depth_anything_v2.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
//! Implementation of the Depth Anything model from FAIR.
//!
//! See:
//! - ["Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data"](https://github.com/LiheYoung/Depth-Anything)
//!

use candle::D::Minus1;
use candle::{Module, Result, Tensor};
use candle_nn::ops::Identity;
Expand Down
5 changes: 5 additions & 0 deletions candle-transformers/src/models/dinov2.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
//! Implementation of the DINOv2 models from Meta Research.
//!
//! See:
//! - DINOv2: ["DINOv2: Learning Robust Visual Features without Supervision"](https://github.com/facebookresearch/dinov2)
//!
use candle::{IndexOp, Result, Tensor, D};
use candle_nn::{layer_norm, LayerNorm, Linear, Module, VarBuilder};

Expand Down
7 changes: 7 additions & 0 deletions candle-transformers/src/models/dinov2reg4.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
//! Implementation of the DINOv2 revision (4 regularization)
//!
//! See:
//! - DINOv2: ["DINOv2: Learning Robust Visual Features without Supervision"](https://github.com/facebookresearch/dinov2)
//!
//! This code implements the regularization tokens version with 4 regularization tokens.
//!
use candle::{IndexOp, Result, Tensor, D};
use candle_nn::{layer_norm, LayerNorm, Linear, Module, VarBuilder};

Expand Down
5 changes: 5 additions & 0 deletions candle-transformers/src/models/distilbert.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
//! Implementation of DistilBert, a distilled version of BERT.
//!
//! See:
//! - ["DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter"](https://arxiv.org/abs/1910.01108)
//!
use super::with_tracing::{layer_norm, linear, LayerNorm, Linear};
use candle::{DType, Device, Result, Tensor};
use candle_nn::{Embedding, Module, VarBuilder};
Expand Down
5 changes: 5 additions & 0 deletions candle-transformers/src/models/efficientnet.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
//! Implementation of EfficientBert, an efficient variant of BERT for computer vision tasks.
//!
//! See:
//! - ["EfficientBERT: Progressively Searching Multilayer Perceptron Architectures for BERT"](https://arxiv.org/abs/2201.00462)
//!
use candle::{Result, Tensor, D};
use candle_nn as nn;
use nn::{Module, VarBuilder};
Expand Down
7 changes: 3 additions & 4 deletions candle-transformers/src/models/efficientvit.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
//! EfficientViT (MSRA) inference implementation based on timm.
//!
//! See "EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention"
//! https://arxiv.org/abs/2305.07027

//! https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/efficientvit_msra.py
//! See ["EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention"](https://arxiv.org/abs/2305.07027)
//!
//! Based on implementation from [pytorch-image-models](https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/efficientvit_msra.py)

use candle::{Result, Tensor, D};
use candle_nn::{
Expand Down
6 changes: 6 additions & 0 deletions candle-transformers/src/models/encodec.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
//! EnCodec neural audio codec based on the Encodec implementation.
//!
//! See ["High Fidelity Neural Audio Compression"](https://arxiv.org/abs/2210.13438)
//!
//! Based on implementation from [huggingface/transformers](https://github.com/huggingface/transformers/blob/main/src/transformers/models/encodec/modeling_encodec.py)

#![allow(unused)]
use candle::{DType, IndexOp, Layout, Module, Result, Shape, Tensor, D};
use candle_nn::{conv1d, Conv1d, Conv1dConfig, ConvTranspose1d, VarBuilder};
Expand Down
6 changes: 6 additions & 0 deletions candle-transformers/src/models/eva2.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
//! EVA-2 inference implementation.
//!
//! See ["EVA-02: A Visual Representation for Neon Genesis"](https://arxiv.org/abs/2303.11331)
//!
//! Based on implementation from [pytorch-image-models](https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/eva2.py)

use candle::{IndexOp, Result, Tensor, D};
use candle_nn::{layer_norm, LayerNorm, Linear, Module, VarBuilder};

Expand Down
6 changes: 6 additions & 0 deletions candle-transformers/src/models/falcon.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
//! Falcon language model inference implementation
//!
//! See ["Falcon: a new approach to large language models"](https://huggingface.co/blog/falcon)
//!
//! Based on implementation from [Huggingface Transformers](https://github.com/huggingface/transformers/blob/main/src/transformers/models/falcon)

use candle::{DType, Device, Result, Tensor, D};
use candle_nn::{embedding, linear_b as linear, Embedding, LayerNorm, Linear, Module, VarBuilder};
use serde::Deserialize;
Expand Down
8 changes: 4 additions & 4 deletions candle-transformers/src/models/fastvit.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
//! FastViT inference implementation based on timm
//! # FastViT inference implementation based on timm
//!
//! See "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization"
//! https://arxiv.org/pdf/2303.14189
//! ## Description
//! See ["FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization"](https://arxiv.org/pdf/2303.14189)
//!
//! https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/fastvit.py
//! Implementation based on [timm model](https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/fastvit.py)

use candle::{DType, Result, Tensor, D};
use candle_nn::{
Expand Down
7 changes: 7 additions & 0 deletions candle-transformers/src/models/flux/mod.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
//! Flux Model
//!
//! Flux is a series of text-to-image generation models based on diffusion transformers.
//!
//! - [GH Link](https://github.com/black-forest-labs/flux)
//! - Transformers Python [reference implementation](https://github.com/huggingface/transformers/blob/5af7d41e49bbfc8319f462eb45253dcb3863dfb7/src/transformers/models/chinese_clip/modeling_chinese_clip.py)
//!
use candle::{Result, Tensor};

pub trait WithForward {
Expand Down
6 changes: 6 additions & 0 deletions candle-transformers/src/models/gemma.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
//! Gemma inference implementation.
//!
//! See ["Gemma: Open Models Based on Gemini Technology"](https://blog.google/technology/developers/gemma-open-ai-model/)
//!
//! Based on implementation from Google and PyTorch

use std::sync::Arc;

use candle::{DType, Device, Module, Result, Tensor, D};
Expand Down
6 changes: 6 additions & 0 deletions candle-transformers/src/models/gemma2.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
//! Gemma LLM architecture (Google) inference implementation.
//!
//! See ["Gemma: Open Models Based on Gemini Technology"](https://blog.google/technology/developers/gemma-open-models/)
//!
//! Based on implementations from Google and OpenLLM

use std::sync::Arc;

use candle::{DType, Device, Module, Result, Tensor, D};
Expand Down
6 changes: 6 additions & 0 deletions candle-transformers/src/models/glm4.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
//! GLM-4 inference implementation.
//!
//! An open bilingual language model with 130B parameters.
//!
//! Based on implementation from [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B)

use crate::models::with_tracing::{linear_b as linear, Linear};
use candle::{DType, Device, IndexOp, Module, Result, Tensor, D};
use candle_nn::VarBuilder;
Expand Down
7 changes: 7 additions & 0 deletions candle-transformers/src/models/granite.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
//! Granite is a Long Context Transformer Language Model.
//!
//! A high performance transformer model optimized for efficient processing
//! of very long context sequences
//!
//! Based on implementation from [Nod.ai](https://github.com/nod-ai/granite)

use super::with_tracing::{linear_no_bias as linear, Linear, RmsNorm};
use candle::{DType, Device, IndexOp, Result, Tensor, D};
use candle_nn::{embedding, Embedding, Module, VarBuilder};
Expand Down
8 changes: 4 additions & 4 deletions candle-transformers/src/models/hiera.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
//! Hiera inference implementation based on timm.
//! [Hiera] inference implementation based on timm.
//!
//! See "Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles"
//! https://arxiv.org/abs/2306.00989
//! See "[Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles]"
//! [Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles]: https://arxiv.org/abs/2306.00989
//!
//! https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/hiera.py
//! [Hiera]: https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/hiera.py

use candle::{Result, D};
use candle_nn::{conv2d, layer_norm, linear, ops::softmax, Conv2dConfig, Func, VarBuilder};
Expand Down
6 changes: 6 additions & 0 deletions candle-transformers/src/models/jina_bert.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
//! # JinaBERT inference implementation
//!
//! Based on implementation from huggingface for Jina BERT and its variants
//!
//! See: [Jina Embeddings on HuggingFace](https://huggingface.co/jinaai/jina-embeddings-v2-base-en)

use super::with_tracing::{linear, linear_no_bias, Embedding, Linear};
use candle::{DType, Device, IndexOp, Result, Tensor, D};
use candle_nn::{layer_norm, LayerNorm, Module, VarBuilder};
Expand Down
6 changes: 6 additions & 0 deletions candle-transformers/src/models/llama.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
//! Llama inference implementation.
//!
//! See ["LLaMA: Open and Efficient Foundation Language Models"](https://arxiv.org/abs/2302.13971)
//!
//! Implementation based on Hugging Face's [transformers](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py)

use super::with_tracing::{linear_no_bias as linear, Linear, RmsNorm};
use candle::{DType, Device, IndexOp, Result, Tensor, D};
use candle_nn::{embedding, Embedding, Module, VarBuilder};
Expand Down
6 changes: 6 additions & 0 deletions candle-transformers/src/models/llama2_c.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
//! Llama2 inference implementation.
//!
//! See ["LLaMA 2: Open Foundation and Fine-Tuned Chat Models"](https://arxiv.org/abs/2307.09288)
//!
//! Based on the [llama2.c](https://github.com/karpathy/llama2.c) implementation

use candle::{DType, Device, IndexOp, Result, Tensor, D};
use candle_nn::linear_no_bias as linear;
use candle_nn::{embedding, rms_norm, Embedding, Linear, Module, RmsNorm, VarBuilder};
Expand Down
6 changes: 6 additions & 0 deletions candle-transformers/src/models/llama2_c_weights.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
//! Llama2 inference implementation.
//!
//! See ["LLaMA 2: Open Foundation and Fine-Tuned Chat Models"](https://arxiv.org/abs/2307.09288)
//!
//! Based on the [llama2.c](https://github.com/karpathy/llama2.c) implementation

use byteorder::{LittleEndian, ReadBytesExt};
use candle::{DType, Device, IndexOp, Result, Shape, Tensor};
use candle_nn::VarBuilder;
Expand Down
10 changes: 10 additions & 0 deletions candle-transformers/src/models/llava/mod.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
//! The LLaVA (Large Language and Vision Assistant) model.
//!
//! This provides the main model implementation combining a vision tower (CLIP) with
//! language model (Llama) for multimodal capabilities.
//!
//! The architecture implements the training-free projection technique from the paper:
//! [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485).
//!
//! - [GH Link](https://github.com/haotian-liu/LLaVA/tree/main)
//!
pub mod config;
pub mod utils;

Expand Down
Loading

0 comments on commit f689ce5

Please sign in to comment.