-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Features] Generic clip #286
Conversation
src/marqo/s2_inference/clip_utils.py
Outdated
@@ -156,15 +185,195 @@ def __init__(self, model_type: str = "ViT-B/32", device: str = 'cpu', embedding | |||
self.processor = None | |||
self.embedding_dimension = embedding_dim | |||
self.truncate = truncate | |||
self.model_properties = kwargs["model_properties"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any validation on "model_properties"? it might be better to have a default
self.model_properties = kwargs.get("model_properties", dict())
(check the syntax, but something like that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this step is finished in marqo.s2_inference.s2_inference
def _validate_model_properties(model_name: str, model_properties: dict) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this still needs to be implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved!
src/marqo/s2_inference/clip_utils.py
Outdated
self.tokenizer = clip.tokenize | ||
self.model.eval() | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use the model_registry entries to check here instead of a try/except?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finished.
I propose we use the key "localpath"
or "url"
in model_properties
to detect whether we use a generic loading.
src/marqo/s2_inference/clip_utils.py
Outdated
self.std = self.model_properties.get("std", None) | ||
|
||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there another way to determine if the model belongs to CLIP or open_clip instead of the try/except?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find an explicit way. I merged two loading functions into one that can load both openai_clip and open_clip models.
src/marqo/s2_inference/clip_utils.py
Outdated
device_node = [n for n in device_holder.graph.findAllNodes("prim::Constant") if "Device" in repr(n)][-1] | ||
|
||
def patch_device(module): | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what causes the runtime error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not used anymore.
src/marqo/s2_inference/clip_utils.py
Outdated
if "value" in node.attributeNames() and str(node["value"]).startswith("cuda"): | ||
node.copyAttributes(device_node) | ||
|
||
model.apply(patch_device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to use the original load functions instead? https://github.com/openai/CLIP/blob/main/clip/clip.py#L94
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this problem is solved by merging into one loading function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function in the link can only load openai clip.
src/marqo/s2_inference/clip_utils.py
Outdated
return model, _get_transform(model.visual.input_resolution, self.mean, self.std) | ||
|
||
|
||
def open_clip_load(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to re-use the code here between clip/open_clip? a quick look seems to be a few similar code snippets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, merged into one loading function.
src/marqo/s2_inference/hf_utils.py
Outdated
|
||
|
||
def whitespace_clean(text): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function appears to be duplicated with one in custom_clip_utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i should delete it.
One more thing, can you provide some examples of how a user would invoke this? |
Examples:Example 1Loading an model_name = "test-model"
model_properties = {
"name": "ViT-B-32-quickgelu",
"dimensions": 512,
"url": "https://github.com/mlfoundations/open_clip/releases/download/v0.2-weights/vit_b_32-quickgelu-laion400m_e31-d867053b.pt",
"type": "open_clip",
"jit" : False
}
vectorise(model_name = model_name, content = "coco.jpg", model_properties = model_properties) Example 2Loading an model_name = "test-model"
model_properties = {
"name": "ViT-B/32",
"dimensions": 512,
"url": "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt",
"type": "clip",
}
vectorise(model_name = model_name, content = "coco.jpg", model_properties = model_properties) |
required_keys = ["name", "dimensions"] | ||
for key in required_keys: | ||
if key not in model_properties: | ||
raise InvalidModelPropertiesError(f"model_properties has missing key '{key}'. ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the error message more helpful? Including a link would also be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revised
''' | ||
buffer_size = 8192 | ||
if not cache_dir: | ||
cache_dir = os.path.expanduser("~/.cache/clip") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make the cache consistent with the rest of our caches ('./cache/...'
).
See here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. and also update this for other clip models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test
src/marqo/s2_inference/clip_utils.py
Outdated
@@ -156,15 +185,195 @@ def __init__(self, model_type: str = "ViT-B/32", device: str = 'cpu', embedding | |||
self.processor = None | |||
self.embedding_dimension = embedding_dim | |||
self.truncate = truncate | |||
self.model_properties = kwargs["model_properties"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this step is finished in marqo.s2_inference.s2_inference
def _validate_model_properties(model_name: str, model_properties: dict) -> dict: |
src/marqo/s2_inference/clip_utils.py
Outdated
self.tokenizer = clip.tokenize | ||
self.model.eval() | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finished.
I propose we use the key "localpath"
or "url"
in model_properties
to detect whether we use a generic loading.
src/marqo/s2_inference/clip_utils.py
Outdated
self.std = self.model_properties.get("std", None) | ||
|
||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find an explicit way. I merged two loading functions into one that can load both openai_clip and open_clip models.
src/marqo/s2_inference/clip_utils.py
Outdated
return model, _get_transform(model.visual.input_resolution, self.mean, self.std) | ||
|
||
|
||
def open_clip_load(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, merged into one loading function.
src/marqo/s2_inference/clip_utils.py
Outdated
if "value" in node.attributeNames() and str(node["value"]).startswith("cuda"): | ||
node.copyAttributes(device_node) | ||
|
||
model.apply(patch_device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this problem is solved by merging into one loading function.
src/marqo/s2_inference/clip_utils.py
Outdated
device_node = [n for n in device_holder.graph.findAllNodes("prim::Constant") if "Device" in repr(n)][-1] | ||
|
||
def patch_device(module): | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not used anymore.
''' | ||
buffer_size = 8192 | ||
if not cache_dir: | ||
cache_dir = os.path.expanduser("~/.cache/clip") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. and also update this for other clip models.
What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
feature
What is the current behavior? (You can also link to an open issue here)
we do not support generic clip model
What is the new behavior (if this is a feature change)?
we now support generic clip model input, e.g., custom openai_clip/open_clip weights
we also add a minor feature here, the
fp16
version of openai_clip models.Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
not
Have unit tests been run against this PR? (Has there also been any additional testing?)
working
Related Python client changes (link commit/PR here)
not
Related documentation changes (link commit/PR here)
working
Other information:
this PR solved the issue 256
Please check if the PR fulfills these requirements