[Features] Generic clip #286

wanliAlex · 2023-01-25T01:06:59Z

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
feature
What is the current behavior? (You can also link to an open issue here)
we do not support generic clip model
What is the new behavior (if this is a feature change)?
we now support generic clip model input, e.g., custom openai_clip/open_clip weights
we also add a minor feature here, the fp16 version of openai_clip models.
Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
not
Have unit tests been run against this PR? (Has there also been any additional testing?)
working
Related Python client changes (link commit/PR here)
not
Related documentation changes (link commit/PR here)
working
Other information:
this PR solved the issue 256
Please check if the PR fulfills these requirements

The commit message follows our guidelines
Tests for the changes have been added (for bug fixes/features)
Docs have been added / updated (for bug fixes / features)

src/marqo/s2_inference/clip_utils.py

jn2clark · 2023-01-25T01:41:44Z

src/marqo/s2_inference/clip_utils.py

@@ -156,15 +185,195 @@ def __init__(self, model_type: str = "ViT-B/32", device: str = 'cpu',  embedding
        self.processor = None
        self.embedding_dimension = embedding_dim
        self.truncate = truncate
+        self.model_properties = kwargs["model_properties"]


is there any validation on "model_properties"? it might be better to have a default
self.model_properties = kwargs.get("model_properties", dict()) (check the syntax, but something like that)

this step is finished in marqo.s2_inference.s2_inference

marqo/src/marqo/s2_inference/s2_inference.py

Line 97 in 081278f

def _validate_model_properties(model_name: str, model_properties: dict) -> dict:

I think this still needs to be implemented?

jn2clark · 2023-01-25T01:42:37Z

src/marqo/s2_inference/clip_utils.py

-        self.tokenizer = clip.tokenize
-        self.model.eval()
-
+        try:


can we use the model_registry entries to check here instead of a try/except?

finished.

I propose we use the key "localpath" or "url" in model_properties to detect whether we use a generic loading.

jn2clark · 2023-01-25T01:45:42Z

src/marqo/s2_inference/clip_utils.py

+            self.std = self.model_properties.get("std", None)
+
+
+            try:


is there another way to determine if the model belongs to CLIP or open_clip instead of the try/except?

I can't find an explicit way. I merged two loading functions into one that can load both openai_clip and open_clip models.

src/marqo/s2_inference/clip_utils.py

jn2clark · 2023-01-25T01:48:40Z

src/marqo/s2_inference/clip_utils.py

+        device_node = [n for n in device_holder.graph.findAllNodes("prim::Constant") if "Device" in repr(n)][-1]
+
+        def patch_device(module):
+            try:


what causes the runtime error here?

this is not used anymore.

jn2clark · 2023-01-25T01:52:04Z

src/marqo/s2_inference/clip_utils.py

+                    if "value" in node.attributeNames() and str(node["value"]).startswith("cuda"):
+                        node.copyAttributes(device_node)
+
+        model.apply(patch_device)


is it possible to use the original load functions instead? https://github.com/openai/CLIP/blob/main/clip/clip.py#L94

this problem is solved by merging into one loading function.

The function in the link can only load openai clip.

jn2clark · 2023-01-25T01:52:42Z

src/marqo/s2_inference/clip_utils.py

+        return model, _get_transform(model.visual.input_resolution, self.mean, self.std)
+
+
+    def open_clip_load(self):


is it possible to re-use the code here between clip/open_clip? a quick look seems to be a few similar code snippets

yes, merged into one loading function.

jn2clark · 2023-01-25T01:54:31Z

src/marqo/s2_inference/hf_utils.py

+
+
+def whitespace_clean(text):


this function appears to be duplicated with one in custom_clip_utils.py

i should delete it.

jn2clark · 2023-01-25T01:55:32Z

One more thing, can you provide some examples of how a user would invoke this?

wanliAlex · 2023-01-25T03:10:19Z

Examples:

Example 1

Loading an open_clip custom model from url.

        model_name = "test-model"
        model_properties = {
                            "name": "ViT-B-32-quickgelu",
                            "dimensions": 512,
                            "url": "https://github.com/mlfoundations/open_clip/releases/download/v0.2-weights/vit_b_32-quickgelu-laion400m_e31-d867053b.pt",
                            "type": "open_clip",
                            "jit" : False
                            }
vectorise(model_name = model_name, content = "coco.jpg", model_properties = model_properties)

Example 2

Loading an openai clip custom model from local path.

        model_name = "test-model"
        model_properties = {
                            "name": "ViT-B/32",
                            "dimensions": 512,
                            "url": "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt",
                            "type": "clip",
                            }

vectorise(model_name = model_name, content = "coco.jpg", model_properties = model_properties)

src/marqo/s2_inference/clip_utils.py

src/marqo/s2_inference/processing/custom_clip_utils.py

src/marqo/s2_inference/errors.py

tests/s2_inference/test_generic_clip_model.py

src/marqo/s2_inference/s2_inference.py

pandu-k · 2023-02-01T06:43:30Z

src/marqo/s2_inference/s2_inference.py

+            required_keys = ["name", "dimensions"]
+            for key in required_keys:
+                if key not in model_properties:
+                    raise InvalidModelPropertiesError(f"model_properties has missing key '{key}'. ")


Can you make the error message more helpful? Including a link would also be helpful.

tests/s2_inference/test_generic_clip_model.py

pandu-k · 2023-02-02T08:32:25Z

src/marqo/s2_inference/processing/custom_clip_utils.py

+    '''
+    buffer_size = 8192
+    if not cache_dir:
+        cache_dir = os.path.expanduser("~/.cache/clip")


Could you make the cache consistent with the rest of our caches ('./cache/...').

See here:

https://github.com/marqo-ai/marqo/blob/e80e53b405ce524ccdb1f77c227535771c73fe43/src/marqo/s2_inference/configs.py

done. and also update this for other clip models.

wanliAlex

test

wanliAlex · 2023-01-25T02:47:32Z

src/marqo/s2_inference/clip_utils.py

@@ -156,15 +185,195 @@ def __init__(self, model_type: str = "ViT-B/32", device: str = 'cpu',  embedding
        self.processor = None
        self.embedding_dimension = embedding_dim
        self.truncate = truncate
+        self.model_properties = kwargs["model_properties"]


this step is finished in marqo.s2_inference.s2_inference

marqo/src/marqo/s2_inference/s2_inference.py

Line 97 in 081278f

def _validate_model_properties(model_name: str, model_properties: dict) -> dict:

wanliAlex · 2023-01-25T02:48:53Z

src/marqo/s2_inference/clip_utils.py

-        self.tokenizer = clip.tokenize
-        self.model.eval()
-
+        try:


finished.

I propose we use the key "localpath" or "url" in model_properties to detect whether we use a generic loading.

wanliAlex · 2023-01-25T02:50:06Z

src/marqo/s2_inference/clip_utils.py

+            self.std = self.model_properties.get("std", None)
+
+
+            try:


I can't find an explicit way. I merged two loading functions into one that can load both openai_clip and open_clip models.

wanliAlex · 2023-01-25T02:51:02Z

src/marqo/s2_inference/clip_utils.py

+        return model, _get_transform(model.visual.input_resolution, self.mean, self.std)
+
+
+    def open_clip_load(self):


yes, merged into one loading function.

wanliAlex · 2023-01-25T02:51:38Z

src/marqo/s2_inference/clip_utils.py

+                    if "value" in node.attributeNames() and str(node["value"]).startswith("cuda"):
+                        node.copyAttributes(device_node)
+
+        model.apply(patch_device)


this problem is solved by merging into one loading function.

wanliAlex · 2023-02-02T03:24:59Z

src/marqo/s2_inference/clip_utils.py

+        device_node = [n for n in device_holder.graph.findAllNodes("prim::Constant") if "Device" in repr(n)][-1]
+
+        def patch_device(module):
+            try:


this is not used anymore.

tests/s2_inference/test_generic_clip_model.py

wanliAlex · 2023-02-03T02:10:43Z

src/marqo/s2_inference/processing/custom_clip_utils.py

+    '''
+    buffer_size = 8192
+    if not cache_dir:
+        cache_dir = os.path.expanduser("~/.cache/clip")


done. and also update this for other clip models.

src/marqo/s2_inference/s2_inference.py

wanliAlex added 12 commits January 24, 2023 12:01

add large scale test

8bea5c8

add fp16 model support

2a07546

add fp16 model support

512e8b7

add fp16 model support

c351f32

add fp16 model support

93d5725

add fp16 model support

0692ed1

add fp16 model support

08f9336

add fp16 model support

f6cb5b0

add fp16 model support

a14c985

add fp16 model support

992d76e

add fp16 model support

0c49093

add fp16 model support

95c0817

wanliAlex requested a review from jn2clark January 25, 2023 01:07

wanliAlex marked this pull request as draft January 25, 2023 01:07

jn2clark reviewed Jan 25, 2023

View reviewed changes

wanliAlex added 2 commits January 25, 2023 13:52

add fp16 model support

83e8d8d

generic clip revise

cc17847

wanliAlex requested a review from jn2clark January 25, 2023 03:10

wanliAlex added 10 commits January 25, 2023 14:11

generic clip revise

dab1c22

generic clip revise

ff4d7b9

generic clip revise

93cbf80

generic clip revise

3682c37

add generic clip model tests

9a1817c

add generic clip model tests

3b74215

open_clip finish

8c48b53

open_clip finish

e20c969

generic clip finished

c157250

generic clip finished

d3a1cae

wanliAlex added 4 commits February 1, 2023 17:37

revise error style!

ffe06f1

remove space

5e9314e

change error message

c469835

change error message

ec7ca6d

pandu-k requested changes Feb 1, 2023

View reviewed changes

revised based on pandu's comments

7fa5c9d

pandu-k requested changes Feb 2, 2023

View reviewed changes

tests/s2_inference/test_generic_clip_model.py Outdated Show resolved Hide resolved

wanliAlex added 6 commits February 2, 2023 14:30

adding test pipelines

7326261

test another document

f166d65

test another document

ea35ff3

test another document

789203f

test another document

92f7a2f

test another document

9ea286c

pandu-k reviewed Feb 2, 2023

View reviewed changes

tests/s2_inference/test_generic_clip_model.py Show resolved Hide resolved

tests/s2_inference/test_generic_clip_model.py Show resolved Hide resolved

test another document

e80e53b

pandu-k reviewed Feb 2, 2023

View reviewed changes

change downloading path for clip

e7057c7

wanliAlex commented Feb 3, 2023

View reviewed changes

edit error

42410df

pandu-k reviewed Feb 3, 2023

View reviewed changes

src/marqo/s2_inference/s2_inference.py Show resolved Hide resolved

wanliAlex temporarily deployed to marqo-test-suite February 3, 2023 04:11 — with GitHub Actions Inactive

wanliAlex temporarily deployed to marqo-test-suite February 3, 2023 04:12 — with GitHub Actions Inactive

pandu-k approved these changes Feb 3, 2023

View reviewed changes

jn2clark approved these changes Feb 3, 2023

View reviewed changes

wanliAlex merged commit 28673ec into mainline Feb 5, 2023

jn2clark mentioned this pull request Mar 3, 2023

[ENHANCEMENT] Update custom models to support using custom CLIP/open_clip weights #256

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Features] Generic clip #286

[Features] Generic clip #286

wanliAlex commented Jan 25, 2023 •

edited

Loading

jn2clark Jan 25, 2023

wanliAlex Jan 25, 2023

jn2clark Jan 30, 2023

wanliAlex Feb 2, 2023

jn2clark Jan 25, 2023

wanliAlex Jan 25, 2023

jn2clark Jan 25, 2023

wanliAlex Jan 25, 2023

jn2clark Jan 25, 2023

wanliAlex Feb 2, 2023

jn2clark Jan 25, 2023

wanliAlex Jan 25, 2023

wanliAlex Jan 25, 2023

jn2clark Jan 25, 2023

wanliAlex Jan 25, 2023

jn2clark Jan 25, 2023

wanliAlex Jan 25, 2023

jn2clark commented Jan 25, 2023

wanliAlex commented Jan 25, 2023 •

edited

Loading

pandu-k Feb 1, 2023

wanliAlex Feb 1, 2023

pandu-k Feb 2, 2023

wanliAlex Feb 3, 2023

wanliAlex left a comment

wanliAlex Jan 25, 2023

wanliAlex Jan 25, 2023

wanliAlex Jan 25, 2023

wanliAlex Jan 25, 2023

wanliAlex Jan 25, 2023

wanliAlex Feb 2, 2023

wanliAlex Feb 3, 2023

		return model, _get_transform(model.visual.input_resolution, self.mean, self.std)


		def open_clip_load(self):

[Features] Generic clip #286

[Features] Generic clip #286

Conversation

wanliAlex commented Jan 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jn2clark commented Jan 25, 2023

wanliAlex commented Jan 25, 2023 • edited Loading

Examples:

Example 1

Example 2

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wanliAlex left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wanliAlex commented Jan 25, 2023 •

edited

Loading

wanliAlex commented Jan 25, 2023 •

edited

Loading