Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Cache Management Features #239

Merged
merged 72 commits into from
Jan 6, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
da1fbbe
add APIs in marqo
wanliAlex Dec 16, 2022
03ca4dd
test new model cache key
wanliAlex Dec 16, 2022
e7e7b09
cleaning
wanliAlex Dec 20, 2022
8ad198b
Add todo
wanliAlex Dec 20, 2022
d8a4fdd
add multi-gpu support
wanliAlex Dec 21, 2022
db30bcb
add multi-gpu support
wanliAlex Dec 21, 2022
f0a801b
space adding
wanliAlex Dec 21, 2022
e0f14a2
adding cpu usage, RAM usage api
wanliAlex Dec 21, 2022
186180a
adding cpu usage, RAM usage api
wanliAlex Dec 21, 2022
2c77fa7
adding cpu usage, RAM usage api
wanliAlex Dec 21, 2022
d88b590
adding cpu usage, RAM usage api
wanliAlex Dec 21, 2022
a95d030
revert back model cache key
wanliAlex Dec 21, 2022
ba97a44
revert back model cache key
wanliAlex Dec 21, 2022
2457920
add test_eject_model test
wanliAlex Dec 21, 2022
856b5f9
add test_eject_model test
wanliAlex Dec 21, 2022
3df97f0
add test_eject_model test
wanliAlex Dec 21, 2022
e3d3bec
add test_eject_model test
wanliAlex Dec 21, 2022
b6122aa
add test_eject_model test
wanliAlex Dec 21, 2022
1132cc5
add test_eject_model test
wanliAlex Dec 21, 2022
fd69a42
add test_eject_model test
wanliAlex Dec 21, 2022
fe2f5fa
add test_eject_model test
wanliAlex Dec 21, 2022
4b515c2
add test_eject_model test
wanliAlex Dec 21, 2022
7aa1a06
add test_eject_model test
wanliAlex Dec 21, 2022
53a6a13
add test_eject_model test
wanliAlex Dec 21, 2022
0d5140b
add test_eject_model test
wanliAlex Dec 21, 2022
05719e7
add test_eject_model test
wanliAlex Dec 21, 2022
1b058af
add test_eject_model test
wanliAlex Dec 21, 2022
74f9238
add test_eject_model test
wanliAlex Dec 21, 2022
e624322
add unit test
wanliAlex Dec 21, 2022
847653a
add unit test
wanliAlex Dec 21, 2022
f144d3e
add unit test
wanliAlex Dec 21, 2022
d07d56c
add unit test
wanliAlex Dec 21, 2022
046761e
add unit test
wanliAlex Dec 21, 2022
4a8e1e4
add unit test
wanliAlex Dec 21, 2022
18e83e4
add unit test
wanliAlex Dec 21, 2022
c5c014a
test cuda only when cuda is available
wanliAlex Dec 21, 2022
2c04048
format update
wanliAlex Dec 21, 2022
fe969e3
format update
wanliAlex Dec 21, 2022
d37c1fe
format update
wanliAlex Dec 21, 2022
2f8f087
test_edge_case_cpu fix (remove cuda memory test)
wanliAlex Dec 21, 2022
aa3f5cb
add separators for readable information in model cache key
wanliAlex Dec 23, 2022
a514876
add separators for readable information in model cache key
wanliAlex Dec 23, 2022
a7d3995
add separators for readable information in model cache key
wanliAlex Dec 23, 2022
61ad7b5
add separators for readable information in model cache key
wanliAlex Dec 23, 2022
6d15f98
add separators for readable information in model cache key
wanliAlex Dec 23, 2022
1cc4240
adding test
wanliAlex Dec 23, 2022
43d438a
adding test
wanliAlex Dec 23, 2022
a6b164a
adding test
wanliAlex Dec 23, 2022
8072324
adding test
wanliAlex Dec 23, 2022
315d5fc
adding test
wanliAlex Dec 23, 2022
e132d70
adding test
wanliAlex Dec 23, 2022
7cb6c42
update id to _device_id
wanliAlex Dec 29, 2022
5e00196
update id to _device_id
wanliAlex Dec 29, 2022
b11d30c
[Onnx clip]Adding the clip_onnx to our avaible models for faster infe…
wanliAlex Dec 29, 2022
e2dab10
mainline merge
wanliAlex Dec 29, 2022
9dea569
mainline merge
wanliAlex Dec 29, 2022
1cad359
mainline merge
wanliAlex Dec 29, 2022
aa00d18
mainline merge
wanliAlex Dec 29, 2022
7afbb88
reduce a model for testing stability
wanliAlex Dec 29, 2022
9aa15a9
reduce a model for testing stability
wanliAlex Dec 29, 2022
bb530f2
update
wanliAlex Dec 29, 2022
eab2a75
update
wanliAlex Dec 29, 2022
a4f0a42
update
wanliAlex Dec 29, 2022
670fcee
add test for generic model
wanliAlex Jan 5, 2023
2f8c6f0
add test for generic model
wanliAlex Jan 5, 2023
4026435
add test for generic model
wanliAlex Jan 5, 2023
45c5891
add test for generic model
wanliAlex Jan 5, 2023
a782e52
add test for generic model
wanliAlex Jan 5, 2023
621df43
add test for generic model
wanliAlex Jan 5, 2023
0b10d26
add test for generic model
wanliAlex Jan 5, 2023
0cf173a
revision
wanliAlex Jan 5, 2023
ba6fb8d
revision
wanliAlex Jan 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/marqo/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,10 @@ class IndexMaxFieldsError(__InvalidRequestError):
code = "index_max_fields_error"
status_code = HTTPStatus.BAD_REQUEST

class ModelNotLoadedError(__InvalidRequestError):
code = "model_not_loaded"
status_code = HTTPStatus.NOT_FOUND

pandu-k marked this conversation as resolved.
Show resolved Hide resolved
# ---MARQO INTERNAL ERROR---


Expand Down
3 changes: 3 additions & 0 deletions src/marqo/s2_inference/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,6 @@ class RerankerImageError(S2InferenceError):

class RerankerNameError(S2InferenceError):
pass

class ModelNotLoadedError(S2InferenceError):
pass
pandu-k marked this conversation as resolved.
Show resolved Hide resolved
16 changes: 15 additions & 1 deletion src/marqo/s2_inference/s2_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@
The functions defined here would have endpoints, later on.
"""
import numpy as np
from marqo.s2_inference.errors import VectoriseError, InvalidModelPropertiesError, ModelLoadError, UnknownModelError
from marqo.s2_inference.errors import VectoriseError, InvalidModelPropertiesError, ModelLoadError, UnknownModelError, ModelNotLoadedError
from PIL import UnidentifiedImageError
from marqo.s2_inference.model_registry import load_model_properties
from marqo.s2_inference.configs import get_default_device, get_default_normalization, get_default_seq_length
from marqo.s2_inference.types import *
from marqo.s2_inference.logger import get_logger
import torch

logger = get_logger(__name__)

Expand Down Expand Up @@ -291,6 +292,19 @@ def _load_model(model_name: str, model_properties: dict, device: str = get_defau

return model

def get_available_models():
return available_models

pandu-k marked this conversation as resolved.
Show resolved Hide resolved
def eject_model(model_name:str,device:str):
model_cache_key = _create_model_cache_key(model_name, device)
if model_cache_key in available_models:
del available_models[model_cache_key]
if device.startswith("cuda"):
torch.cuda.empty_cache()
wanliAlex marked this conversation as resolved.
Show resolved Hide resolved
return {"message":f"eject SUCCESS, eject model_name={model_name} from device={device}"}
else:
raise ModelNotLoadedError(f"The model_name={model_name} device={device} is not loaded")

# def normalize(inputs):

# is_valid = False
Expand Down
26 changes: 26 additions & 0 deletions src/marqo/tensor_search/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,16 @@ def check_health(marqo_config: config.Config = Depends(generate_config)):
def get_indexes(marqo_config: config.Config = Depends(generate_config)):
return tensor_search.get_indexes(config=marqo_config)

@app.get("/models")
def get_loaded_models():
return tensor_search.get_loaded_models()
@app.delete("/models")
def eject_model(model_name:str, model_device:str):
return tensor_search.eject_model(model_name = model_name, device = model_device)
@app.get("/device/cuda")
def get_cuda_info():
return tensor_search.get_cuda_info()

pandu-k marked this conversation as resolved.
Show resolved Hide resolved
# try these curl commands:

# ADD DOCS:
Expand Down Expand Up @@ -282,3 +292,19 @@ def get_indexes(marqo_config: config.Config = Depends(generate_config)):
curl -XDELETE http://localhost:8882/indexes/my-irst-ix
"""

# check cuda info
"""
curl -XGET http://localhost:8882/device/cuda
"""

# check the loaded models
"""
curl -XGET http://localhost:8882/models
"""

# eject a model
"""
curl -X DELETE 'http://localhost:8882/models?model_name=ViT-L/14&model_device=cuda'
curl -X DELETE 'http://localhost:8882/models?model_name=hf/all_datasets_v4_MiniLM-L6&model_device=cuda'
curl -X DELETE 'http://localhost:8882/models?model_name=hf/all_datasets_v4_MiniLM-L6&model_device=cpu'
"""
20 changes: 20 additions & 0 deletions src/marqo/tensor_search/tensor_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
from marqo.s2_inference.clip_utils import _is_image
from marqo.s2_inference.reranking import rerank
from marqo.s2_inference import s2_inference
import torch.cuda

# We depend on _httprequests.py for now, but this may be replaced in the future, as
# _httprequests.py is designed for the client
Expand Down Expand Up @@ -1238,3 +1239,22 @@ def _get_model_properties(index_info):
f"Please provide model_properties if the model is a custom model and is not supported by default")

return model_properties

def get_loaded_models() -> dict:
available_models = s2_inference.get_available_models()
message = {
'models' : [
{"model_name": ix} for ix in available_models
]
}
return message
def eject_model(model_name: str, device: str) -> dict:
try:
result = s2_inference.eject_model(model_name, device)
except s2_inference_errors.ModelNotLoadedError as e:
raise errors.ModelNotLoadedError(message=str(e))
return result
def get_cuda_info() -> dict:
return {"device": "cuda",
"memory_usage": f"{round(torch.cuda.memory_allocated() / 1024**3, 1)} GiB",
"total_device_memory": f"{round(torch.cuda.get_device_properties(0).total_memory/ 1024**3, 1)} GiB"}
pandu-k marked this conversation as resolved.
Show resolved Hide resolved
wanliAlex marked this conversation as resolved.
Show resolved Hide resolved
pandu-k marked this conversation as resolved.
Show resolved Hide resolved