Skip to content

Commit

Permalink
docs: update finetuner docs (#843)
Browse files Browse the repository at this point in the history
* docs: update finetuner docs

* docs: update finetuner docs

* docs: update finetuner docs

* docs: fix dependency and default cpu

* docs: typo and remove docarray version

* docs: change finetuner version required

* docs: list finetuner models supported

* docs: change finetuner models and version

* docs: change finetuner models and version
  • Loading branch information
jemmyshin authored Oct 21, 2022
1 parent 6cdc3e2 commit baf94b5
Showing 1 changed file with 50 additions and 5 deletions.
55 changes: 50 additions & 5 deletions docs/user-guides/finetuner.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ This guide will show you how to use [Finetuner](https://finetuner.jina.ai) to fi
For installation and basic usage of Finetuner, please refer to [Finetuner documentation](https://finetuner.jina.ai).
You can also [learn more details about fine-tuning CLIP](https://finetuner.jina.ai/tasks/text-to-image/).

This tutorial requires `finetuner >=v0.6.4`, `clip_server >=v0.6.0`.

## Prepare Training Data

Finetuner accepts training data and evaluation data in the form of {class}`~docarray.array.document.DocumentArray`.
Expand Down Expand Up @@ -84,14 +86,14 @@ import finetuner

finetuner.login()
run = finetuner.fit(
model='openai/clip-vit-base-patch32',
model='ViT-B-32::openai',
run_name='clip-fashion',
train_data='clip-fashion-train-data',
eval_data='clip-fashion-eval-data', # optional
epochs=5,
learning_rate=1e-5,
loss='CLIPLoss',
cpu=False,
to_onnx=True,
)
```

Expand Down Expand Up @@ -169,15 +171,58 @@ executors:
py_modules:
- clip_server.executors.clip_onnx
with:
name: ViT-B/32
name: ViT-B-32::openai
model_path: 'clip-fashion-cas' # path to clip-fashion-cas
replicas: 1
```
```{warning}
Note that Finetuner only support ViT-B/32 CLIP model currently. The model name should match the fine-tuned model, or you will get incorrect output.
You can use `finetuner.describe_models()` to check the supported models in `finetuner`, you should see:
```bash
Finetuner backbones
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ name ┃ task ┃ output_dim ┃ architecture ┃ description ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ bert-base-cased │ text-to-text │ 768 │ transformer │ BERT model pre-trained on BookCorpus and English Wikipedia │
│ openai/clip-vit-base-patch16 │ text-to-image │ 512 │ transformer │ CLIP base model with patch size 16 │
│ openai/clip-vit-base-patch32 │ text-to-image │ 512 │ transformer │ CLIP base model │
│ openai/clip-vit-large-patch14-336 │ text-to-image │ 768 │ transformer │ CLIP large model for 336x336 images │
│ openai/clip-vit-large-patch14 │ text-to-image │ 1024 │ transformer │ CLIP large model with patch size 14 │
│ efficientnet_b0 │ image-to-image │ 1280 │ cnn │ EfficientNet B0 pre-trained on ImageNet │
│ efficientnet_b4 │ image-to-image │ 1792 │ cnn │ EfficientNet B4 pre-trained on ImageNet │
│ RN101::openai │ text-to-image │ 512 │ transformer │ Open CLIP "RN101::openai" model │
│ RN101-quickgelu::openai │ text-to-image │ 512 │ transformer │ Open CLIP "RN101-quickgelu::openai" model │
│ RN101-quickgelu::yfcc15m │ text-to-image │ 512 │ transformer │ Open CLIP "RN101-quickgelu::yfcc15m" model │
│ RN101::yfcc15m │ text-to-image │ 512 │ transformer │ Open CLIP "RN101::yfcc15m" model │
│ RN50::cc12m │ text-to-image │ 1024 │ transformer │ Open CLIP "RN50::cc12m" model │
│ RN50::openai │ text-to-image │ 1024 │ transformer │ Open CLIP "RN50::openai" model │
│ RN50-quickgelu::cc12m │ text-to-image │ 1024 │ transformer │ Open CLIP "RN50-quickgelu::cc12m" model │
│ RN50-quickgelu::openai │ text-to-image │ 1024 │ transformer │ Open CLIP "RN50-quickgelu::openai" model │
│ RN50-quickgelu::yfcc15m │ text-to-image │ 1024 │ transformer │ Open CLIP "RN50-quickgelu::yfcc15m" model │
│ RN50x16::openai │ text-to-image │ 768 │ transformer │ Open CLIP "RN50x16::openai" model │
│ RN50x4::openai │ text-to-image │ 640 │ transformer │ Open CLIP "RN50x4::openai" model │
│ RN50x64::openai │ text-to-image │ 1024 │ transformer │ Open CLIP "RN50x64::openai" model │
│ RN50::yfcc15m │ text-to-image │ 1024 │ transformer │ Open CLIP "RN50::yfcc15m" model │
│ ViT-B-16::laion400m_e31 │ text-to-image │ 512 │ transformer │ Open CLIP "ViT-B-16::laion400m_e31" model │
│ ViT-B-16::laion400m_e32 │ text-to-image │ 512 │ transformer │ Open CLIP "ViT-B-16::laion400m_e32" model │
│ ViT-B-16::openai │ text-to-image │ 512 │ transformer │ Open CLIP "ViT-B-16::openai" model │
│ ViT-B-16-plus-240::laion400m_e31 │ text-to-image │ 640 │ transformer │ Open CLIP "ViT-B-16-plus-240::laion400m_e31" model │
│ ViT-B-16-plus-240::laion400m_e32 │ text-to-image │ 640 │ transformer │ Open CLIP "ViT-B-16-plus-240::laion400m_e32" model │
│ ViT-B-32::laion2b_e16 │ text-to-image │ 512 │ transformer │ Open CLIP "ViT-B-32::laion2b_e16" model │
│ ViT-B-32::laion400m_e31 │ text-to-image │ 512 │ transformer │ Open CLIP "ViT-B-32::laion400m_e31" model │
│ ViT-B-32::laion400m_e32 │ text-to-image │ 512 │ transformer │ Open CLIP "ViT-B-32::laion400m_e32" model │
│ ViT-B-32::openai │ text-to-image │ 512 │ transformer │ Open CLIP "ViT-B-32::openai" model │
│ ViT-B-32-quickgelu::laion400m_e31 │ text-to-image │ 512 │ transformer │ Open CLIP "ViT-B-32-quickgelu::laion400m_e31" model │
│ ViT-B-32-quickgelu::laion400m_e32 │ text-to-image │ 512 │ transformer │ Open CLIP "ViT-B-32-quickgelu::laion400m_e32" model │
│ ViT-B-32-quickgelu::openai │ text-to-image │ 512 │ transformer │ Open CLIP "ViT-B-32-quickgelu::openai" model │
│ ViT-L-14-336::openai │ text-to-image │ 768 │ transformer │ Open CLIP "ViT-L-14-336::openai" model │
│ ViT-L-14::openai │ text-to-image │ 768 │ transformer │ Open CLIP "ViT-L-14::openai" model │
│ resnet152 │ image-to-image │ 2048 │ cnn │ ResNet152 pre-trained on ImageNet │
│ resnet50 │ image-to-image │ 2048 │ cnn │ ResNet50 pre-trained on ImageNet │
│ sentence-transformers/msmarco-distilbert-base-v3 │ text-to-text │ 768 │ transformer │ Pretrained BERT, fine-tuned on MS Marco │
└──────────────────────────────────────────────────┴────────────────┴────────────┴──────────────┴───────────────────────────────────────────────────────────
```


You can now start the `clip_server` using fine-tuned model to get a performance boost:

```bash
Expand Down

0 comments on commit baf94b5

Please sign in to comment.