diff --git a/.github/README-exec/onnx.readme.md b/.github/README-exec/onnx.readme.md index 958e034b1..b0f7809e4 100644 --- a/.github/README-exec/onnx.readme.md +++ b/.github/README-exec/onnx.readme.md @@ -1,7 +1,7 @@ # CLIPOnnxEncoder -**CLIPOnnxEncoder** is the executor implemented in [clip-as-service](https://github.com/jina-ai/clip-as-service). -It serves OpenAI released [CLIP](https://github.com/openai/CLIP) models with ONNX runtime (🚀 **3x** speed up). +**CLIPOnnxEncoder** is the executor implemented in [CLIP-as-service](https://github.com/jina-ai/clip-as-service). +The various `CLIP` models implemented in the [OpenAI](https://github.com/openai/CLIP) and [OpenCLIP](https://github.com/mlfoundations/open_clip) are supported with ONNX runtime (🚀 **3x** speed up). The introduction of the CLIP model [can be found here](https://openai.com/blog/clip/). - 🔀 **Automatic**: Auto-detect image and text documents depending on their content. @@ -11,19 +11,28 @@ The introduction of the CLIP model [can be found here](https://openai.com/blog/c ## Model support -Open AI has released 9 models so far. `ViT-B/32` is used as default model. Please also note that different model give **different size of output dimensions**. + `ViT-B-32::openai` is used as the default model. To use specific pretrained models provided by `open_clip`, please use `::` to separate model name and pretrained weight name, e.g. `ViT-B-32::laion2b_e16`. Please also note that **different models give different sizes of output dimensions**. -| Model | ONNX | Output dimension | -|----------------|-----| --- | -| RN50 | ✅ | 1024 | -| RN101 | ✅ | 512 | -| RN50x4 | ✅ | 640 | -| RN50x16 | ✅ | 768 | -| RN50x64 | ✅ | 1024 | -| ViT-B/32 | ✅ | 512 | -| ViT-B/16 | ✅ | 512 | -| ViT-L/14 | ✅ | 768 | -| ViT-L/14@336px | ✅ | 768 | +| Model | ONNX | Output dimension | +|---------------------------------------|------|------------------| +| RN50 | ✅ | 1024 | +| RN101 | ✅ | 512 | +| RN50x4 | ✅ | 640 | +| RN50x16 | ✅ | 768 | +| RN50x64 | ✅ | 1024 | +| ViT-B-32 | ✅ | 512 | +| ViT-B-16 | ✅ | 512 | +| ViT-B-lus-240 | ✅ | 640 | +| ViT-L-14 | ✅ | 768 | +| ViT-L-14@336px | ✅ | 768 | + +✅ = First class support + +Full list of open_clip models and weights can be found [here](https://github.com/mlfoundations/open_clip#pretrained-model-interface). + +```{note} +For model definition with `-quickgelu` postfix, please use non `-quickgelu` model name. +``` ## Usage @@ -116,7 +125,7 @@ From the output, you will see all the text and image docs have `embedding` attac ╰─────────────────────────────────────────────────────────────────╯ ``` -👉 Access the embedding playground in **clip-as-service** [doc](https://clip-as-service.jina.ai/playground/embedding), type sentence or image URL and see **live embedding**! +👉 Access the embedding playground in **CLIP-as-service** [doc](https://clip-as-service.jina.ai/playground/embedding), type sentence or image URL and see **live embedding**! ### Ranking @@ -174,4 +183,4 @@ d = Document( ) ``` -👉 Access the ranking playground in **clip-as-service** [doc](https://clip-as-service.jina.ai/playground/reasoning/). Just input the reasoning texts as prompts, the server will rank the prompts and return sorted prompts with scores. \ No newline at end of file +👉 Access the ranking playground in **CLIP-as-service** [doc](https://clip-as-service.jina.ai/playground/reasoning/). Just input the reasoning texts as prompts, the server will rank the prompts and return sorted prompts with scores. \ No newline at end of file diff --git a/.github/README-exec/torch.readme.md b/.github/README-exec/torch.readme.md index ae6e99a6f..c5f3130a5 100644 --- a/.github/README-exec/torch.readme.md +++ b/.github/README-exec/torch.readme.md @@ -1,7 +1,7 @@ # CLIPTorchEncoder -**CLIPTorchEncoder** is the executor implemented in [clip-as-service](https://github.com/jina-ai/clip-as-service). -It serves OpenAI released [CLIP](https://github.com/openai/CLIP) models with PyTorch runtime. +**CLIPTorchEncoder** is the executor implemented in [CLIP-as-service](https://github.com/jina-ai/clip-as-service). +The various `CLIP` models implemented in the [OpenAI](https://github.com/openai/CLIP), [OpenCLIP](https://github.com/mlfoundations/open_clip), and [MultilingualCLIP](https://github.com/FreddeFrallan/Multilingual-CLIP) are supported with PyTorch runtime. The introduction of the CLIP model [can be found here](https://openai.com/blog/clip/). - 🔀 **Automatic**: Auto-detect image and text documents depending on their content. @@ -12,19 +12,34 @@ With advances of ONNX runtime, you can use `CLIPOnnxEncoder` (see [link](https:/ ## Model support -Open AI has released **9 models** so far. `ViT-B/32` is used as default model. Please also note that different models give **the different sizes of output dimensions**. +`ViT-B-32::openai` is used as the default model. To use specific pretrained models provided by `open_clip`, please use `::` to separate model name and pretrained weight name, e.g. `ViT-B-32::laion2b_e16`. Please also note that **different models give different sizes of output dimensions**. + +| Model | PyTorch | Output dimension | +|---------------------------------------|---------|------------------| +| RN50 | ✅ | 1024 | +| RN101 | ✅ | 512 | +| RN50x4 | ✅ | 640 | +| RN50x16 | ✅ | 768 | +| RN50x64 | ✅ | 1024 | +| ViT-B-32 | ✅ | 512 | +| ViT-B-16 | ✅ | 512 | +| ViT-B-lus-240 | ✅ | 640 | +| ViT-L-14 | ✅ | 768 | +| ViT-L-14@336px | ✅ | 768 | +| M-CLIP/XLM_Roberta-Large-Vit-B-32 | ✅ | 512 | +| M-CLIP/XLM-Roberta-Large-Vit-L-14 | ✅ | 768 | +| M-CLIP/XLM-Roberta-Large-Vit-B-16Plus | ✅ | 640 | +| M-CLIP/LABSE-Vit-L-14 | ✅ | 768 | + +✅ = First class support + + +Full list of open_clip models and weights can be found [here](https://github.com/mlfoundations/open_clip#pretrained-model-interface). + +```{note} +For model definition with `-quickgelu` postfix, please use non `-quickgelu` model name. +``` -| Model | PyTorch | Output dimension | -|----------------|---------|------------------| -| RN50 | ✅ | 1024 | -| RN101 | ✅ | 512 | -| RN50x4 | ✅ | 640 | -| RN50x16 | ✅ | 768 | -| RN50x64 | ✅ | 1024 | -| ViT-B/32 | ✅ | 512 | -| ViT-B/16 | ✅ | 512 | -| ViT-L/14 | ✅ | 768 | -| ViT-L/14@336px | ✅ | 768 | ## Usage @@ -118,7 +133,7 @@ From the output, you will see all the text and image docs have `embedding` attac ╰─────────────────────────────────────────────────────────────────╯ ``` -👉 Access the embedding playground in **clip-as-service** [doc](https://clip-as-service.jina.ai/playground/embedding), type sentence or image URL and see **live embedding**! +👉 Access the embedding playground in **CLIP-as-service** [doc](https://clip-as-service.jina.ai/playground/embedding), type sentence or image URL and see **live embedding**! ### Ranking @@ -176,4 +191,4 @@ d = Document( ) ``` -👉 Access the ranking playground in **clip-as-service** [doc](https://clip-as-service.jina.ai/playground/reasoning/). Just input the reasoning texts as prompts, the server will rank the prompts and return sorted prompts with scores. \ No newline at end of file +👉 Access the ranking playground in **CLIP-as-service** [doc](https://clip-as-service.jina.ai/playground/reasoning/). Just input the reasoning texts as prompts, the server will rank the prompts and return sorted prompts with scores. \ No newline at end of file diff --git a/client/setup.py b/client/setup.py index 61082d21c..febd443cd 100644 --- a/client/setup.py +++ b/client/setup.py @@ -5,7 +5,7 @@ from setuptools import setup if sys.version_info < (3, 7, 0): - raise OSError(f'clip-as-service requires Python >=3.7, but yours is {sys.version}') + raise OSError(f'CLIP-as-service requires Python >=3.7, but yours is {sys.version}') try: pkg_name = 'clip-client' diff --git a/scripts/benchmark.py b/scripts/benchmark.py index 24c7b598b..2ecd702bd 100644 --- a/scripts/benchmark.py +++ b/scripts/benchmark.py @@ -30,7 +30,7 @@ def __init__( **kwargs, ): """ - @param server: the clip-as-service server URI + @param server: the CLIP-as-service server URI @param batch_size: number of batch sample @param num_iter: number of repeat run per experiment @param image_sample: uri of the test image diff --git a/server/clip_server/model/clip_onnx.py b/server/clip_server/model/clip_onnx.py index b02034629..980e71b52 100644 --- a/server/clip_server/model/clip_onnx.py +++ b/server/clip_server/model/clip_onnx.py @@ -146,6 +146,11 @@ ('ViT-L-14@336px/textual.onnx', '78fab479f136403eed0db46f3e9e7ed2'), ('ViT-L-14@336px/visual.onnx', 'f3b1f5d55ca08d43d749e11f7e4ba27e'), ), + # MultilingualCLIP models + # 'M-CLIP/LABSE-Vit-L-14': ( + # ('M-CLIP-LABSE-Vit-L-14/textual.onnx', 'b5b649f9e064457c764874e982bca296'), + # ('M-CLIP-LABSE-Vit-L-14/visual.onnx', '471951562303c9afbb804b865eedf149'), + # ), } diff --git a/server/setup.py b/server/setup.py index da3c042c3..4f21c70b1 100644 --- a/server/setup.py +++ b/server/setup.py @@ -5,7 +5,7 @@ from setuptools import setup if sys.version_info < (3, 7, 0): - raise OSError(f'clip-as-service requires Python >=3.7, but yours is {sys.version}') + raise OSError(f'CLIP-as-service requires Python >=3.7, but yours is {sys.version}') try: pkg_name = 'clip-server'