Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multimodal model support #931

Merged
merged 165 commits into from
Sep 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
165 commits
Select commit Hold shift + click to select a range
1217a0a
initial commit
RaynorChavez Aug 12, 2024
8067291
import Modalities from s2_inference
RaynorChavez Aug 12, 2024
fce6bfa
Merge branch 'mainline' into raynor/multimodal_models
RaynorChavez Aug 12, 2024
3750b86
import numpy
RaynorChavez Aug 12, 2024
81f5bc2
add abstraction for choosing encoder, fix bugs, infer modality
RaynorChavez Aug 12, 2024
27f5f18
add support for existing models, fix modealities always being treated…
RaynorChavez Aug 12, 2024
3121db9
fix call to _encode_without_cache
RaynorChavez Aug 13, 2024
893d2c7
change references to image_repo to content_repo, draft extract frames…
RaynorChavez Aug 13, 2024
28b3568
add languagebind to model registry, change image_repo test to content…
RaynorChavez Aug 13, 2024
b9a5017
languagebind helpers and classes
RaynorChavez Aug 13, 2024
2caa666
Merge branch 'mainline' into raynor/multimodal_models
RaynorChavez Aug 15, 2024
f97fdbc
preliminary load languagebind and encoders
RaynorChavez Aug 15, 2024
8514b1f
update requirements
RaynorChavez Aug 15, 2024
c22f0b0
update requirements
RaynorChavez Aug 15, 2024
c2b89ca
update requirements
RaynorChavez Aug 15, 2024
3a6c4dc
update requirements
RaynorChavez Aug 15, 2024
b732626
fix registry
RaynorChavez Aug 15, 2024
e6192ee
remove MultimodalPointer
RaynorChavez Aug 15, 2024
f8eb0fd
change requirements
RaynorChavez Aug 15, 2024
b74e72f
change requirements
RaynorChavez Aug 15, 2024
78ec4ba
Merge branch 'mainline' into raynor/multimodal_models
RaynorChavez Aug 18, 2024
1cb410f
add video and audio support in tensor search - initial commit
RaynorChavez Aug 18, 2024
a7eacf5
bump typing-extensions from 4.5.0 to 4.8.0
RaynorChavez Aug 18, 2024
1206deb
relax pytorchvideo requirement
RaynorChavez Aug 18, 2024
a669b0a
revert to torch==1.13.1
RaynorChavez Aug 18, 2024
50f4003
fix
RaynorChavez Aug 18, 2024
e098e20
revert to torch==1.12.1
RaynorChavez Aug 18, 2024
6dea0dd
relax peft requirement
RaynorChavez Aug 18, 2024
f9ac58f
upgrade to torch==1.13.1
RaynorChavez Aug 18, 2024
df569a7
add torchaudio to requirements
RaynorChavez Aug 18, 2024
18f8f05
some bug fixes
RaynorChavez Aug 19, 2024
060eb65
add initial implementation for unstructured support
RaynorChavez Aug 19, 2024
25ad9ec
remove therman and depth helper files, add media filesize lookahead l…
RaynorChavez Aug 19, 2024
6579e91
readd helper functions for depth and thermal
RaynorChavez Aug 19, 2024
8648d56
some bug fixes
RaynorChavez Aug 19, 2024
5275265
print debugs
RaynorChavez Aug 20, 2024
a92589a
working languagebind image inference
RaynorChavez Aug 20, 2024
c497349
fix text encoder logic for languagebind
RaynorChavez Aug 21, 2024
480aa61
last attempt at streaming and chunking for now
RaynorChavez Aug 21, 2024
91dba28
working video and audio add docs structured. non streaming
RaynorChavez Aug 21, 2024
7b320f1
streaming logic
RaynorChavez Aug 21, 2024
3cb3565
remove redundant multimodal loader calls
RaynorChavez Aug 21, 2024
6f8d49d
add video_chunk_length into registry and as user configurable during …
RaynorChavez Aug 21, 2024
4b7813b
add working support for unstructured add documents
RaynorChavez Aug 21, 2024
3b4b083
more efficient fetch_file_metadata
RaynorChavez Aug 22, 2024
f39639a
correct streaming logic (better duration estimation, use ffmpeg to fe…
RaynorChavez Aug 22, 2024
d74a2fa
modify tensor search add unstructured to include better chunk descrip…
RaynorChavez Aug 22, 2024
4c2aaf9
remove redundant last chunk of size overlap, add failsafe temp file d…
RaynorChavez Aug 22, 2024
5374135
claenup and add 'treatUrlsAndPointersAsMedia' and audeio and video p…
RaynorChavez Aug 22, 2024
03d4e0c
add infer modality in multimodal fields
RaynorChavez Aug 22, 2024
65ee7d9
attempt at fixing multimodal fields
RaynorChavez Aug 22, 2024
9dc9591
initial implementation of working embed
RaynorChavez Aug 22, 2024
cd46898
debug embed implementation for video and audio
RaynorChavez Aug 22, 2024
7baaac7
fix bug in multimodal fields for audio and video
RaynorChavez Aug 22, 2024
9408377
wrap languagebind inputs with to_device
RaynorChavez Aug 22, 2024
f4f4e32
fix backward compatibility errors
RaynorChavez Aug 23, 2024
0c467ce
Merge branch 'mainline' into raynor/multimodal_models
RaynorChavez Aug 23, 2024
cb3e4e2
add packages to requirements.dev
RaynorChavez Aug 23, 2024
b27061b
updated requirements
RaynorChavez Aug 23, 2024
f8d5e15
update requirements
RaynorChavez Aug 23, 2024
b4ca956
loosen torchaudio
RaynorChavez Aug 23, 2024
62772b7
upgrade protobuf to 5.27.2
RaynorChavez Aug 23, 2024
e111b84
change requirements
RaynorChavez Aug 23, 2024
f1923ef
set protobuf==3.20.0
RaynorChavez Aug 23, 2024
8a8d2be
remove protobuf
RaynorChavez Aug 23, 2024
55d6adb
copying dev requirements over to nondev for now
RaynorChavez Aug 23, 2024
8b7bbf8
add device debug statements
RaynorChavez Aug 23, 2024
c03a0bb
restore
RaynorChavez Aug 23, 2024
be52eb1
remove import ffmpeg
RaynorChavez Aug 23, 2024
2df2bd5
add default value for video and audo preprocessing in marqo_index
RaynorChavez Aug 23, 2024
a31f7a1
bug
RaynorChavez Aug 23, 2024
f987bdb
print debug
RaynorChavez Aug 23, 2024
490246b
add_docs and embed tests and fix embed
RaynorChavez Aug 23, 2024
d668363
change download logic to use existing download_image_from_url
RaynorChavez Aug 24, 2024
4a36b4e
add certificate in add_docs
RaynorChavez Aug 24, 2024
bd8f4e1
Error out if model is not multimodal and attempts to add video or audio
RaynorChavez Aug 24, 2024
320799a
add ffmpeg in requirements and in environment
RaynorChavez Aug 24, 2024
6ce4442
more efficient fetch_file_metadata
RaynorChavez Aug 24, 2024
6d04340
new tests, refactor, and move languagebind cache_dir to standard loca…
RaynorChavez Aug 24, 2024
18fd2d4
remove cache_dir from MultimodalModelProperties
RaynorChavez Aug 24, 2024
3d43ad5
fix get_tokenizer cache_dir variable
RaynorChavez Aug 24, 2024
70ecc3b
fix some requirements
RaynorChavez Aug 25, 2024
192d5b6
restore peft specific git repo
RaynorChavez Aug 25, 2024
a27dd27
refactor
RaynorChavez Aug 25, 2024
cdc6ece
change largemodel unit tests machine to g6.2xlarge
RaynorChavez Aug 25, 2024
1f0d4ed
Raynor/multimodal models reduce reqs (#946)
RaynorChavez Aug 26, 2024
b9b1bf0
restore optimum==1.20.0
RaynorChavez Aug 26, 2024
f389a4c
cleanup main requirements.txt file
RaynorChavez Aug 26, 2024
fbf1aee
skip languagebind tests if on GPU
RaynorChavez Aug 26, 2024
2e9f53c
add selective loading of languagebind models permutations
RaynorChavez Aug 26, 2024
335a65b
fix text tokenizer model path and video model name
RaynorChavez Aug 26, 2024
0e81b03
copy .dev.txt into .txt
RaynorChavez Aug 27, 2024
450fea3
fix nonetype model image
RaynorChavez Aug 27, 2024
b43b495
cleanup abstraction a bit
RaynorChavez Aug 27, 2024
593efcc
cleanup
RaynorChavez Aug 27, 2024
d3f7c09
Fix Audio and Video text Tokenizer return
RaynorChavez Aug 27, 2024
b5596d4
add MARQO_MEDIA_DOWNLOAD_THREAD_COUNT env var
RaynorChavez Aug 27, 2024
72acab5
possible memory optimizations
RaynorChavez Aug 27, 2024
54c8906
remove deletion of content repo
RaynorChavez Aug 27, 2024
b2ee5c8
comment out print statements
RaynorChavez Aug 27, 2024
67a2c1a
limit download threads for languagebind
RaynorChavez Aug 28, 2024
e3f36a5
edit log to reflect lower media download thread count
RaynorChavez Aug 28, 2024
a1f2f1e
marqo-image to 28, minibatches for processing video/audio, replace an…
RaynorChavez Aug 29, 2024
70aceeb
add torchaudio to requirements
RaynorChavez Aug 29, 2024
6e1698f
restore languagebind requirements
RaynorChavez Aug 29, 2024
331586f
restore base image to 20
RaynorChavez Aug 29, 2024
31f5b24
requirements
RaynorChavez Aug 29, 2024
0d93786
revert to docker 20
RaynorChavez Aug 29, 2024
44ccc72
print statement
RaynorChavez Aug 29, 2024
180398e
Merge branch 'mainline' into raynor/multimodal_models
RaynorChavez Aug 29, 2024
1e67105
disable minibatches for video/audio
RaynorChavez Aug 29, 2024
6bc9413
cleanup, change video only model name, add error if add image in non-…
RaynorChavez Aug 29, 2024
1ef8bc6
fix check
RaynorChavez Aug 29, 2024
71a2dbf
move tensore to cuda before vectorise call
RaynorChavez Aug 29, 2024
df6f020
Update base image to 29, remove video download limit of 5
RaynorChavez Aug 29, 2024
4b1b4ff
change default value and equalize last chunk length
RaynorChavez Sep 2, 2024
7671728
add test for chunking start-end time logic
RaynorChavez Sep 2, 2024
7620975
implementing vector averaging in multimodal docs
RaynorChavez Sep 3, 2024
d42aeba
initial implementation of embedding normalization
RaynorChavez Sep 3, 2024
f88fcbb
Merge branch 'mainline' into raynor/multimodal_models
RaynorChavez Sep 3, 2024
5b0c2ce
fix invalid media url error not showing
RaynorChavez Sep 3, 2024
debe5c2
various bug fixes
RaynorChavez Sep 3, 2024
9b70efe
improve image modality detection and fix one test
RaynorChavez Sep 4, 2024
9449dbd
fallback to python-magic
RaynorChavez Sep 4, 2024
70ef116
installl libmagic in github runners and cleanup
RaynorChavez Sep 4, 2024
813f78c
change libmagic installation check command
RaynorChavez Sep 4, 2024
41f443b
new tests for infer_modality
RaynorChavez Sep 4, 2024
0fc622b
add test for ensuring image url is embedded as image and not as text
RaynorChavez Sep 4, 2024
803738e
fix test_image_url_is_embedded_as_image_not_text
RaynorChavez Sep 4, 2024
da1d608
unskip one test
RaynorChavez Sep 4, 2024
e0efca9
add eva-decord to main requirements file
RaynorChavez Sep 5, 2024
638f400
add decord to main requirements and remove eva-decord
RaynorChavez Sep 5, 2024
d8af865
cleanup
RaynorChavez Sep 5, 2024
91d2801
move decord to under if statement
RaynorChavez Sep 5, 2024
a2a1574
remove decord from requirements
RaynorChavez Sep 5, 2024
09522ac
cleanup and addressing some comments
RaynorChavez Sep 5, 2024
215af4b
fix get_content_vector
RaynorChavez Sep 5, 2024
a15759f
fix one test
RaynorChavez Sep 5, 2024
d3cd3d7
new unit tests for infer_modality
RaynorChavez Sep 5, 2024
8767fb2
new tests for multimodal field embed and partial update to ensure ima…
RaynorChavez Sep 6, 2024
e300897
fix test
RaynorChavez Sep 6, 2024
a043aae
fix tensor_rearch reversion
RaynorChavez Sep 6, 2024
7fe1ff4
Merge branch 'mainline' into raynor/multimodal_models
RaynorChavez Sep 6, 2024
736e89a
rename files not util
RaynorChavez Sep 6, 2024
52ab8d9
add default threadcount for languagebind, new env var, error invalid …
RaynorChavez Sep 6, 2024
1a5d8c2
cleanup and addressing comments
RaynorChavez Sep 8, 2024
cca032a
reformat code
RaynorChavez Sep 8, 2024
8aa7ee2
fix error message
RaynorChavez Sep 8, 2024
cd3b568
patch threadcounts
RaynorChavez Sep 8, 2024
039b578
patch
RaynorChavez Sep 8, 2024
bbb54e0
patch
RaynorChavez Sep 9, 2024
6826205
patch media threads
RaynorChavez Sep 9, 2024
67f13ad
change chunk start end
RaynorChavez Sep 9, 2024
5933cc5
fix
RaynorChavez Sep 9, 2024
ec3b8fc
fix unstructured use_existing_tensors
RaynorChavez Sep 9, 2024
eaa33eb
modify media download threads logic
RaynorChavez Sep 9, 2024
f092d83
include webp and validate_url
RaynorChavez Sep 9, 2024
8344ce8
switch to validate_url in infer_modality
RaynorChavez Sep 9, 2024
80b4c10
Change exception capture to ffmpeg and change download condition
RaynorChavez Sep 9, 2024
dffcb3c
add test for threaded download media logic
RaynorChavez Sep 9, 2024
84f2f1a
remove main from new test
RaynorChavez Sep 9, 2024
e9d6f94
Merge branch 'mainline' into raynor/multimodal_models
RaynorChavez Sep 9, 2024
4b0d6d4
uncomment error
RaynorChavez Sep 9, 2024
03d9664
pass exception and add noindex condition
RaynorChavez Sep 9, 2024
973ae02
change condtion order
RaynorChavez Sep 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion .github/workflows/largemodel_unit_test_CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
mode: start
github-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
ec2-image-id: ${{ secrets.CUDA_EC2_IMAGE_ID }}
ec2-instance-type: g4dn.xlarge
ec2-instance-type: g4dn.2xlarge
subnet-id: ${{ secrets.CUDA_SUBNET_ID }}
security-group-id: ${{ secrets.CUDA_SECURITY_GROUP_ID }}

Expand Down Expand Up @@ -62,6 +62,13 @@ jobs:
repository: marqo-ai/marqo-base
path: marqo-base

- name: Install FFmpeg and libmagic
run: |
sudo apt-get update
sudo apt-get install -y ffmpeg libmagic1
ffmpeg -version # Verify installation
file --version # Verify libmagic installation and version

- name: Install dependencies
run: |
pip install -r marqo-base/requirements.txt
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/unit_test_200gb_CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,12 @@ jobs:
repository: marqo-ai/marqo-base
path: marqo-base

- name: Install FFmpeg and libmagic
run: |
sudo apt-get update
sudo apt-get install -y ffmpeg libmagic1
ffmpeg -version # Verify installation
file --version # Verify libmagic installation and version
- name: Install dependencies
run: |
pip install -r marqo-base/requirements.txt
Expand Down
4 changes: 1 addition & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,4 @@ dump.rdb
.DS_Store

# Tester app for unit tests
scripts/vespa_local/vespa_tester_app.zip

src/marqo/tensor_search/cache_dir/*
scripts/vespa_local/vespa_tester_app.zip
10 changes: 10 additions & 0 deletions requirements.dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,13 @@ httpx==0.25.0
# test requirements
pyvespa==0.37.1
pytest==7.4.3


# LanguageBind
eva-decord==0.6.1
einops==0.6.1
pytorchvideo==0.1.5
torchaudio==0.12.1
SoundFile==0.12.1
python-magic==0.4.27
ffmpeg-python==0.2.0
RaynorChavez marked this conversation as resolved.
Show resolved Hide resolved
16 changes: 1 addition & 15 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,14 +1,3 @@
# This is a reduced set of requirements used for Marqo's tensor search module
# If you are running this on your local machine you will need to install additional
# requirements specified here: https://github.com/marqo-ai/marqo-base/blob/main/requirements.txt
requests==2.28.1
anyio==3.7.1
fastapi==0.86.0
uvicorn[standard]
fastapi-utils==0.2.1
jsonschema==4.17.1
typing-extensions==4.5.0
urllib3==1.26.0
pydantic==1.10.11
httpx==0.25.0
semver==3.0.2
Expand All @@ -18,7 +7,4 @@ cachetools==5.3.1
pynvml==11.5.0 # For cuda utilization
readerwriterlock==1.0.9
kazoo==2.10.0
pycurl==7.45.3
certifi==2019.11.28
transformers==4.41.2
optimum==1.20.0
pycurl==7.45.3
2 changes: 2 additions & 0 deletions src/marqo/api/configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ def default_env_vars() -> dict:
EnvVars.MARQO_THREAD_EXPIRY_TIME: 1800, # 30 minutes
EnvVars.MARQO_ENABLE_THROTTLING: "TRUE",
EnvVars.MARQO_LOG_LEVEL: "info",
EnvVars.MARQO_MEDIA_DOWNLOAD_THREAD_COUNT_PER_REQUEST: 5,
EnvVars.MARQO_IMAGE_DOWNLOAD_THREAD_COUNT_PER_REQUEST: 20,
# This env variable is set to "info" by default in run_marqo.sh, which overrides this value
EnvVars.MARQO_MAX_CPU_MODEL_MEMORY: 4,
EnvVars.MARQO_MAX_CUDA_MODEL_MEMORY: 4, # For multi-GPU, this is the max memory for each GPU.
Expand Down
12 changes: 12 additions & 0 deletions src/marqo/core/models/marqo_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ class FieldType(str, Enum):
ArrayFloat = 'array<float>'
ArrayDouble = 'array<double>'
ImagePointer = 'image_pointer'
VideoPointer = 'video_pointer'
AudioPointer = 'audio_pointer'
MultimodalCombination = 'multimodal_combination'
CustomVector = "custom_vector"
MapInt = 'map<text, int>'
Expand Down Expand Up @@ -117,6 +119,13 @@ class TextPreProcessing(ImmutableStrictBaseModel):
split_overlap: int = pydantic.Field(ge=0, alias='splitOverlap')
split_method: TextSplitMethod = pydantic.Field(alias='splitMethod')

class VideoPreProcessing(ImmutableStrictBaseModel):
split_length: int = pydantic.Field(gt=0, alias='splitLength')
split_overlap: int = pydantic.Field(ge=0, alias='splitOverlap')

class AudioPreProcessing(ImmutableStrictBaseModel):
split_length: int = pydantic.Field(gt=0, alias='splitLength')
split_overlap: int = pydantic.Field(ge=0, alias='splitOverlap')

class ImagePreProcessing(ImmutableStrictBaseModel):
patch_method: Optional[PatchMethod] = pydantic.Field(alias='patchMethod')
Expand Down Expand Up @@ -250,6 +259,8 @@ class MarqoIndex(ImmutableBaseModel, ABC):
normalize_embeddings: bool
text_preprocessing: TextPreProcessing
image_preprocessing: ImagePreProcessing
video_preprocessing: Optional[VideoPreProcessing] = None
wanliAlex marked this conversation as resolved.
Show resolved Hide resolved
audio_preprocessing: Optional[AudioPreProcessing] = None
distance_metric: DistanceMetric
vector_numeric_type: VectorNumericType
hnsw_config: HnswConfig
Expand Down Expand Up @@ -324,6 +335,7 @@ def _cache_or_get(self, key: str, func):
class UnstructuredMarqoIndex(MarqoIndex):
type = IndexType.Unstructured
treat_urls_and_pointers_as_images: bool
treat_urls_and_pointers_as_media: Optional[bool] = None
wanliAlex marked this conversation as resolved.
Show resolved Hide resolved
filter_string_max_length: int

def __init__(self, **data):
Expand Down
4 changes: 3 additions & 1 deletion src/marqo/core/models/marqo_index_request.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ class MarqoIndexRequest(ImmutableStrictBaseModel, ABC):
normalize_embeddings: bool
text_preprocessing: marqo_index.TextPreProcessing
image_preprocessing: marqo_index.ImagePreProcessing
video_preprocessing: marqo_index.VideoPreProcessing
audio_preprocessing: marqo_index.AudioPreProcessing
distance_metric: marqo_index.DistanceMetric
vector_numeric_type: marqo_index.VectorNumericType
hnsw_config: marqo_index.HnswConfig
Expand All @@ -39,9 +41,9 @@ def validate_name(cls, name):

class UnstructuredMarqoIndexRequest(MarqoIndexRequest):
treat_urls_and_pointers_as_images: bool
treat_urls_and_pointers_as_media: bool
filter_string_max_length: int


class FieldRequest(StrictBaseModel):
name: str
type: marqo_index.FieldType
Expand Down
2 changes: 1 addition & 1 deletion src/marqo/core/search/recommender.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def recommend(self,
Args:
index_name: Name of the index to search
documents: A list of document IDs or a dictionary where the keys are document IDs and the values are weights
tensor_fields: List of tensor fields to use for recommendation
tensor_fields: List of tensor fields to use for recommendation (can include text, image, audio, and video fields)
interpolation_method: Interpolation method to use for combining vectors
exclude_input_documents: Whether to exclude the input documents from the search results
result_count: Number of results to return
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ class StructuredVespaIndex(VespaIndex):
FieldType.ArrayLong: (list, int),
FieldType.ArrayDouble: (list, (float, int)),
FieldType.ImagePointer: str,
FieldType.VideoPointer: str,
FieldType.AudioPointer: str,
FieldType.MultimodalCombination: dict,
FieldType.CustomVector: str,
FieldType.MapInt: (dict, int),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ class StructuredVespaSchema(VespaSchema):
FieldType.ArrayFloat: 'array<float>',
FieldType.ArrayDouble: 'array<double>',
FieldType.ImagePointer: 'string',
FieldType.VideoPointer: 'string',
FieldType.AudioPointer: 'string',
FieldType.MultimodalCombination: 'map<string, float>',
FieldType.CustomVector: 'string', # Custom Vector "content" is stored as string in backend.
FieldType.MapInt: 'map<string, int>',
Expand Down Expand Up @@ -165,6 +167,8 @@ def _generate_document_section(self, schema_name: str) -> (List[str], Structured
normalize_embeddings=self._index_request.normalize_embeddings,
text_preprocessing=self._index_request.text_preprocessing,
image_preprocessing=self._index_request.image_preprocessing,
video_preprocessing=self._index_request.video_preprocessing,
audio_preprocessing=self._index_request.audio_preprocessing,
distance_metric=self._index_request.distance_metric,
vector_numeric_type=self._index_request.vector_numeric_type,
hnsw_config=self._index_request.hnsw_config,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,13 +54,16 @@ def _generate_unstructured_marqo_index(self, schema_name: str) -> UnstructuredMa
normalize_embeddings=self._index_request.normalize_embeddings,
text_preprocessing=self._index_request.text_preprocessing,
image_preprocessing=self._index_request.image_preprocessing,
video_preprocessing=self._index_request.video_preprocessing,
audio_preprocessing=self._index_request.audio_preprocessing,
distance_metric=self._index_request.distance_metric,
vector_numeric_type=self._index_request.vector_numeric_type,
hnsw_config=self._index_request.hnsw_config,
marqo_version=self._index_request.marqo_version,
created_at=self._index_request.created_at,
updated_at=self._index_request.updated_at,
treat_urls_and_pointers_as_images=self._index_request.treat_urls_and_pointers_as_images,
treat_urls_and_pointers_as_media=self._index_request.treat_urls_and_pointers_as_media,
farshidz marked this conversation as resolved.
Show resolved Hide resolved
filter_string_max_length=self._index_request.filter_string_max_length,
)

Expand Down
2 changes: 1 addition & 1 deletion src/marqo/s2_inference/clip_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -896,4 +896,4 @@ def get_multilingual_clip_properties() -> Dict:
"type": "multilingual_clip",
}
}
return MULTILINGUAL_CLIP_PROPERTIES
return MULTILINGUAL_CLIP_PROPERTIES
5 changes: 4 additions & 1 deletion src/marqo/s2_inference/configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,16 @@ class ModelCache:
# The hf_cache_path is managed by the hf_hub_download function
hf_cache_path = os.getenv('HF_SAVE_PATH', f'{utils.get_marqo_root_from_env()}/cache/hf/')

languagebind_cache_path = os.getenv('LANGUAGEBIND_CACHE_PATH', f'{utils.get_marqo_root_from_env()}/cache/languagebind/')

@classmethod
def get_all_cache_paths(cls):
return [
cls.onnx_cache_path,
cls.torch_cache_path,
cls.clip_cache_path,
cls.hf_cache_path
cls.hf_cache_path,
cls.languagebind_cache_path
]

class BaseTransformerModels:
Expand Down
4 changes: 4 additions & 0 deletions src/marqo/s2_inference/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,4 +62,8 @@ class BatchInferenceSizeNotMatchError(S2InferenceError):


class ImageDownloadError(S2InferenceError):
pass


class UnsupportedModalityError(S2InferenceError):
pass
69 changes: 69 additions & 0 deletions src/marqo/s2_inference/languagebind/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
from torch import nn

from .audio.configuration_audio import LanguageBindAudioConfig
from .audio.modeling_audio import LanguageBindAudio
from .audio.processing_audio import LanguageBindAudioProcessor
from .audio.tokenization_audio import LanguageBindAudioTokenizer
from .image.configuration_image import LanguageBindImageConfig
from .image.modeling_image import LanguageBindImage
from .image.processing_image import LanguageBindImageProcessor
from .image.tokenization_image import LanguageBindImageTokenizer
from .video.configuration_video import LanguageBindVideoConfig
from .video.modeling_video import LanguageBindVideo
from .video.processing_video import LanguageBindVideoProcessor
from .video.tokenization_video import LanguageBindVideoTokenizer

config_dict = {
'image': LanguageBindImageConfig,
'video': LanguageBindVideoConfig,
'audio': LanguageBindAudioConfig
}
model_dict = {
'image': LanguageBindImage,
'video': LanguageBindVideo,
'audio': LanguageBindAudio
}
transform_dict = {
'video': LanguageBindVideoProcessor,
'audio': LanguageBindAudioProcessor,
'image': LanguageBindImageProcessor,
}


class LanguageBind(nn.Module):
RaynorChavez marked this conversation as resolved.
Show resolved Hide resolved
def __init__(self, clip_type, use_temp=True, cache_dir='./cache_dir'):
super(LanguageBind, self).__init__()
self.use_temp = use_temp
self.modality_encoder = {}
self.modality_proj = {}
self.modality_scale = {}
self.modality_config = {}
for k, v in clip_type.items():
pretrained_ckpt = f'LanguageBind/{v}'
model = model_dict[k].from_pretrained(pretrained_ckpt, cache_dir=cache_dir)
self.modality_encoder[k] = model.vision_model
self.modality_proj[k] = model.visual_projection
self.modality_scale[k] = model.logit_scale
self.modality_config[k] = model.config
self.modality_encoder['language'] = model.text_model
self.modality_proj['language'] = model.text_projection

self.modality_encoder = nn.ModuleDict(self.modality_encoder)
self.modality_proj = nn.ModuleDict(self.modality_proj)

def forward(self, inputs):
outputs = {}
for key, value in inputs.items():
value = self.modality_encoder[key](value)[1]
value = self.modality_proj[key](value)
value = value / value.norm(p=2, dim=-1, keepdim=True)
if self.use_temp:
if key != 'language':
value = value * self.modality_scale[key].exp()
outputs[key] = value
return outputs


def to_device(x, device):
out_dict = {k: v.to(device) for k, v in x.items()}
return out_dict
Loading
Loading