Skip to content

Conversation

@jennifurhe
Copy link
Contributor

@jennifurhe jennifurhe commented Jul 7, 2025

Purpose

FIX (partial) #15697

Also, fixes a checkpoint loading error by adding requires_grad=False to position_ids. This avoids the following exception when loading weights:

ValueError: Following weights were not initialized from checkpoint: {'bert.embeddings.position_ids'}

Test Plan

BertEmbeddingModel:

python3 -m vllm.entrypoints.cli.main serve sentence-transformers/all-MiniLM-L6-v2 --served-model-name bert-embeddings --trust-remote-code

BertForSequenceClassification:

python3 -m vllm.entrypoints.cli.main serve textattack/bert-base-uncased-SST-2 --served-model-name bert-sst2 --trust-remote-code

RobertaForSequenceClassification

python3 -m vllm.entrypoints.cli.main serve cardiffnlp/twitter-roberta-base-sentiment-latest --served-model-name roberta-sentiment --trust-remote-code

python3 -m vllm.entrypoints.cli.main serve jinaai/jina-embeddings-v3 --served-model-name jina-v3 --trust-remote-code

RobertaEmbeddingModel:

python3 -m vllm.entrypoints.cli.main serve FacebookAI/roberta-base --served-model-name roberta-base --trust-remote-code

python3 -m vllm.entrypoints.cli.main serve sentence-transformers/stsb-roberta-base-v2 --served-model-name stsb-roberta --trust-remote-code

Test Result

Models load successfully

Signed-off-by: Jennifer He <islandhe@gmail.com>
@github-actions
Copy link

github-actions bot commented Jul 7, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @panyuhe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the BERT model implementation by adding support for AutoWeightsLoader to streamline the weight loading process. It also addresses a critical bug that prevented successful checkpoint loading due to an incorrect parameter registration, ensuring the model can be loaded and used correctly.

Highlights

  • Model Weights Loading: Adds AutoWeightsLoader support for the BERT model, enabling more flexible and automated weight loading.
  • Bug Fix: Fixes a checkpoint loading error by registering position_ids as a buffer instead of a parameter to avoid ValueError.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a correctness fix by changing position_ids to a buffer, resolving a model loading error. It also refactors the weight loading logic for BertForSequenceClassification to use AutoWeightsLoader, enhancing code maintainability.

Comment on lines 47 to 49
self.register_buffer(
"position_ids",
torch.arange(config.max_position_embeddings).expand((1, -1)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Changing position_ids from nn.Parameter to register_buffer is a critical correctness fix. position_ids are non-trainable and shouldn't be in the checkpoint. This prevents a ValueError during weight loading.

Comment on lines 473 to 495
def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]):
bert_weights = []
classifier_weights = []

self_weights = []

def weight_filter():
for name, weight in weights:
if name.startswith("bert."):
yield (name[len("bert."):], weight)
else:
self_weights.append((name, weight))
for name, weight in weights:
if name.startswith("bert."):
bert_weights.append((name, weight))
else:
classifier_weights.append((name, weight))

self.bert.load_weights(weight_filter())
loader = AutoWeightsLoader(self)
loaded_params = loader.load_weights(bert_weights)

params_dict = dict(self.named_parameters())

for name, loaded_weight in self_weights:
if name.startswith("classifier"):
for name, loaded_weight in classifier_weights:
if name in params_dict:
param = params_dict[name]
weight_loader = getattr(param, "weight_loader",
default_weight_loader)
weight_loader(param, loaded_weight)
loaded_params.add(name)

return loaded_params
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Refactoring load_weights to use AutoWeightsLoader for the bert module improves maintainability. The logic for handling classifier_weights is also more robust by checking if name in params_dict.

@DarkLight1337
Copy link
Member

cc @maxdebayser @noooop

self._pooler = ClassifierPooler(vllm_config.model_config,
self.classifier, self.bert.pooler)

def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to consider the Roberta model in roberta.py as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer! I've refactored the RoBERTa models, and ran the following tests to verify that the models load successfully (also updated the PR description):

RobertaForSequenceClassification:

  • cardiffnlp/twitter-roberta-base-sentiment-latest
  • jinaai/jina-embeddings-v3

RobertaEmbeddingModel:

  • FacebookAI/roberta-base
  • sentence-transformers/stsb-roberta-base-v2

Copy link
Collaborator

@noooop noooop Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should verify all the tests under /tests/models/language/pooling.

I am doing testing locally.

This code involves too many models.

@maxdebayser
Copy link
Contributor

Thanks for taking this on. I was thinking if we could also simplify the logic in BertModel. There are several special cases such as:

                if name.endswith(".bias") and name not in params_dict:
                    continue

I think they can be refactored using AutoWeightsLoader but it could be tricky to get all models right.

@jennifurhe
Copy link
Contributor Author

jennifurhe commented Jul 9, 2025

Thanks for taking this on. I was thinking if we could also simplify the logic in BertModel. There are several special cases such as:

                if name.endswith(".bias") and name not in params_dict:
                    continue

I think they can be refactored using AutoWeightsLoader but it could be tricky to get all models right.

Thanks for the comments!

Looking at the bias checks like:

pythonif name.endswith(".bias") and name not in params_dict:
    continue

I was thinking these become redundant when using AutoWeightsLoader since it already filters out weights that don't have corresponding parameters in the model. For other similar checks, I was able to refactor them with AutoWeightsLoader.

The main challenge for fully migrating BertModel.load_weights() to AutoWeightsLoader is the QKV fusion logic, which requires the 3-parameter signature weight_loader(param, weight, shard_id) that AutoWeightsLoader doesn't currently support, so I'm planning to keep the manual loading logic there.

RobertaForSequenceClassification:
- python3 -m vllm.entrypoints.cli.main serve cardiffnlp/twitter-roberta-base-sentiment-latest --served-model-name roberta-sentiment --trust-remote-code
- python3 -m vllm.entrypoints.cli.main serve jinaai/jina-embeddings-v3 --served-model-name jina-v3 --trust-remote-code

RobertaEmbeddingMode:
- python3 -m vllm.entrypoints.cli.main serve FacebookAI/roberta-base --served-model-name roberta-base --trust-remote-code
- python3 -m vllm.entrypoints.cli.main serve sentence-transformers/stsb-roberta-base-v2 --served-model-name stsb-roberta --trust-remote-code

BertEmbeddingModel:
- python3 -m vllm.entrypoints.cli.main serve sentence-transformers/all-MiniLM-L6-v2 --served-model-name bert-embeddings --trust-remote-code

Signed-off-by:  <islandhe@gmail.com>
Signed-off-by: Jen H <islandhe@gmail.com>
@jennifurhe jennifurhe force-pushed the fix-bert-autoweightsloader branch from a7769db to ed9c1ae Compare July 9, 2025 05:10
@jennifurhe jennifurhe changed the title [Model] Add AutoWeightsLoader support for BERT [Model] Add AutoWeightsLoader support for BERT, RoBERTa Jul 9, 2025
Copy link
Collaborator

@noooop noooop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thank you for simplifying this code.

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 9, 2025
@noooop
Copy link
Collaborator

noooop commented Jul 9, 2025

@DarkLight1337

Help enable the Language Models Test (Extended Pooling)

I'm not sure if all models are loaded correctly.

@noooop
Copy link
Collaborator

noooop commented Jul 9, 2025

@DarkLight1337

Can we pause CI testing first and wait for my local testing?

I actually just said that I think the code is OK, but not ready.

@DarkLight1337
Copy link
Member

It's too complicated to cancel the tests except for the extended pooling test, I'll just keep them running

@noooop
Copy link
Collaborator

noooop commented Jul 9, 2025

@panyuhe

Please click Convert to draft on the right

so that each submission will not trigger the full process CI,

which will run for more than 3 hours.

Wait until we have solved most of the problems in local testing before starting it

@jennifurhe jennifurhe marked this pull request as draft July 9, 2025 05:52
@noooop
Copy link
Collaborator

noooop commented Jul 9, 2025

I'm testing locally and will send you the failed tests

@noooop
Copy link
Collaborator

noooop commented Jul 9, 2025

Overall, most models were loaded successfully, only a few models fails

ValueError: Following weights were not initialized from checkpoint: {'model.embeddings.position_ids'}

FAILED tests/models/language/pooling/test_scoring.py::test_cross_encoder_1_to_1[BAAI/bge-reranker-v2-m3]
FAILED tests/models/language/pooling/test_scoring.py::test_cross_encoder_1_to_N[BAAI/bge-reranker-v2-m3]
FAILED tests/models/language/pooling/test_scoring.py::test_cross_encoder_N_to_N[BAAI/bge-reranker-v2-m3]
FAILED tests/models/language/pooling/test_baai.py::test_embed_models_mteb[model_info13]
FAILED tests/models/language/pooling/test_baai.py::test_embed_models_correctness[model_info13]
FAILED tests/models/language/pooling/test_jina.py::test_rerank_models_mteb[model_info0]

ERROR tests/entrypoints/openai/test_embedding_dimensions.py::test_matryoshka[model_info1-bfloat16]
ERROR tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_list[model0-True]
ERROR tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_list[model0-False]
ERROR tests/entrypoints/openai/test_score.py::TestModel::test_text_1_list_text_2_list[model0-True]
ERROR tests/entrypoints/openai/test_score.py::TestModel::test_text_1_list_text_2_list[model0-False]
ERROR tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_str[model0-True]
ERROR tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_str[model0-False]
ERROR tests/entrypoints/openai/test_score.py::TestModel::test_score_max_model_len[model0-True]
ERROR tests/entrypoints/openai/test_score.py::TestModel::test_score_max_model_len[model0-False]

FAILED tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info0]
FAILED tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info5]
FAILED tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info6]
FAILED tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_correctness[model_info0]
FAILED tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_correctness[model_info5]
FAILED tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_correctness[model_info6]

roberta_task_weights_filter in roberta.py may no longer be used

@noooop
Copy link
Collaborator

noooop commented Jul 9, 2025

@panyuhe
If you have any difficulty running the tests, please leave a message and we will help you.

It took me a long time to learn how to run these tests. (╯‵□′)╯︵┻━┻

vllm is really too complicated

@jennifurhe
Copy link
Contributor Author

@panyuhe If you have any difficulty running the tests, please leave a message and we will help you.

It took me a long time to learn how to run these tests. (╯‵□′)╯︵┻━┻

vllm is really too complicated

Thank you for helping to run the tests. I'm able to reproduce the same error messages using vllm.entrypoints.cli.main on the failing models. I'm still working on setting up the test environment to run the actual pooling tests locally, and will reach out if I run into difficulties. Thank you!

@noooop
Copy link
Collaborator

noooop commented Jul 10, 2025

You may need to install mteb[bm25s]>=1.38.11, <2 and pytest-asyncio to run the tests

For all dependencies, please refer to

But installing everything will be very slow

Then you can launch the failing tests using pytest, e.g.

pytest -s -vvv tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info0]

@jennifurhe
Copy link
Contributor Author

@noooop All of the originally failing tests are passing now and I pasted the results of running the full pooling test suite below. Thank you for your help!

Making an amendment to my previous statement, AutoWeightsLoader does not actually check if the weight is in the model's named parameters, so I added back the check for that.

Full pooling test results (some failing due to HW)
# test_output_test_baai.log
✅ vllm/tests/models/language/pooling/test_baai.py::test_embed_models_mteb[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_baai.py::test_embed_models_mteb[model_info13] - PASSED
✅ vllm/tests/models/language/pooling/test_baai.py::test_embed_models_mteb[model_info14] - PASSED
✅ vllm/tests/models/language/pooling/test_baai.py::test_embed_models_correctness[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_baai.py::test_embed_models_correctness[model_info13] - PASSED
✅ vllm/tests/models/language/pooling/test_baai.py::test_embed_models_correctness[model_info14] - PASSED
✅ vllm/tests/models/language/pooling/test_baai.py::test_rerank_models_mteb[model_info0] - PASSED

# test_output_test_classification.log
✅ vllm/tests/models/language/pooling/test_classification.py::test_models[float-jason9693/Qwen2.5-1.5B-apeach] - PASSED

# test_output_test_cross_encoder.log
✅ vllm/tests/models/language/pooling/test_cross_encoder.py::test_rerank_models_mteb[model_info0] - PASSED
🟡 vllm/tests/models/language/pooling/test_cross_encoder.py::test_rerank_models_mteb[model_info1] - FAILED_OOM

# 🟡 1 CUDA OOM failure(s) detected in test_output_test_cross_encoder.log
#   🟡 vllm/tests/models/language/pooling/test_cross_encoder.py::test_rerank_models_mteb[model_info1]
#      Attempted: 146.0 MiB
#      Available: 1.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.37 GiB

# test_output_test_embedding_dimensions.log
# No passed/failed test results found (skipped tests excluded)

# test_output_test_embedding.log
🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[True-BAAI/bge-multilingual-gemma2] - FAILED_OOM
🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[True-intfloat/e5-mistral-7b-instruct] - FAILED_OOM
🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[True-ssmits/Qwen2-7B-Instruct-embed-base] - FAILED_OOM
🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[False-BAAI/bge-multilingual-gemma2] - FAILED_OOM
🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[False-intfloat/e5-mistral-7b-instruct] - FAILED_OOM
🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[False-BAAI/bge-base-en-v1.5] - FAILED_OOM
✅ vllm/tests/models/language/pooling/test_embedding.py::test_models[False-sentence-transformers/all-MiniLM-L12-v2] - PASSED
✅ vllm/tests/models/language/pooling/test_embedding.py::test_models[False-intfloat/multilingual-e5-small] - PASSED
✅ vllm/tests/models/language/pooling/test_embedding.py::test_models[False-Alibaba-NLP/gte-Qwen2-1.5B-instruct] - PASSED
✅ vllm/tests/models/language/pooling/test_embedding.py::test_models[False-sentence-transformers/stsb-roberta-base-v2] - PASSED

# 🟡 6 CUDA OOM failure(s) detected in test_output_test_embedding.log
#   🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[True-BAAI/bge-multilingual-gemma2]
#      Attempted: 56.0 MiB
#      Available: 49.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.32 GiB
#   🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[True-intfloat/e5-mistral-7b-instruct]
#      Attempted: 250.0 MiB
#      Available: 49.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.32 GiB
#   🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[True-ssmits/Qwen2-7B-Instruct-embed-base]
#      Attempted: 130.0 MiB
#      Available: 129.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.24 GiB
#   🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[False-BAAI/bge-multilingual-gemma2]
#      Attempted: 3.42 GiB
#      Available: 129.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.24 GiB
#   🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[False-intfloat/e5-mistral-7b-instruct]
#      Attempted: 112.0 MiB
#      Available: 29.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.34 GiB
#   🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[False-BAAI/bge-base-en-v1.5]
#      Attempted: 46.0 MiB
#      Available: 29.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.34 GiB

# test_output_test_gritlm.log
✅ vllm/tests/models/language/pooling/test_gritlm.py::test_find_array - PASSED
🟡 vllm/tests/models/language/pooling/test_gritlm.py::test_gritlm_offline_embedding - FAILED_OOM
🔴 vllm/tests/models/language/pooling/test_gritlm.py::test_gritlm_api_server_embedding - FAILED_SERVER_CRASH
🟡 vllm/tests/models/language/pooling/test_gritlm.py::test_gritlm_offline_generate - FAILED_OOM
🔴 vllm/tests/models/language/pooling/test_gritlm.py::test_gritlm_api_server_generate - FAILED_SERVER_CRASH

# 🟡 2 CUDA OOM failure(s) detected in test_output_test_gritlm.log
#   🟡 vllm/tests/models/language/pooling/test_gritlm.py::test_gritlm_offline_embedding
#      Attempted: 224.0 MiB
#      Available: 15.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.35 GiB
#   🟡 vllm/tests/models/language/pooling/test_gritlm.py::test_gritlm_offline_generate
#      Attempted: 224.0 MiB
#      Available: 15.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.35 GiB

# 🔴 2 Server crash failure(s) detected in test_output_test_gritlm.log
#   (Likely caused by previous OOM failures crashing the server process)
#   vllm/tests/models/language/pooling/test_gritlm.py::test_gritlm_api_server_embedding
#   vllm/tests/models/language/pooling/test_gritlm.py::test_gritlm_api_server_generate

# test_output_test_gte.log
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_mteb[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_mteb[model_info6] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_mteb[model_info7] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_mteb[model_info8] - PASSED
🟡 vllm/tests/models/language/pooling/test_gte.py::test_embed_models_mteb[model_info9] - FAILED_OOM
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_mteb[model_info10] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_mteb[model_info11] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_correctness[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_correctness[model_info6] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_correctness[model_info7] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_correctness[model_info8] - PASSED
🟡 vllm/tests/models/language/pooling/test_gte.py::test_embed_models_correctness[model_info9] - FAILED_OOM
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_correctness[model_info10] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_correctness[model_info11] - PASSED

# 🟡 2 CUDA OOM failure(s) detected in test_output_test_gte.log
#   🟡 vllm/tests/models/language/pooling/test_gte.py::test_embed_models_mteb[model_info9]
#      Attempted: 64.0 MiB
#      Available: 1.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.37 GiB
#   🟡 vllm/tests/models/language/pooling/test_gte.py::test_embed_models_correctness[model_info9]
#      Attempted: 54.0 MiB
#      Available: 59.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.31 GiB

# test_output_test_intfloat.log
✅ vllm/tests/models/language/pooling/test_intfloat.py::test_embed_models_mteb[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_intfloat.py::test_embed_models_mteb[model_info4] - PASSED
✅ vllm/tests/models/language/pooling/test_intfloat.py::test_embed_models_correctness[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_intfloat.py::test_embed_models_correctness[model_info4] - PASSED

# test_output_test_jina.log
✅ vllm/tests/models/language/pooling/test_jina.py::test_embed_models_mteb[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_jina.py::test_embed_models_correctness[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_jina.py::test_rerank_models_mteb[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_jina.py::test_matryoshka[16-half-model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_jina.py::test_matryoshka[32-half-model_info0] - PASSED

# test_output_test_mxbai_rerank.log
✅ vllm/tests/models/language/pooling/test_mxbai_rerank.py::test_rerank_models_mteb[model_info0] - PASSED

# test_output_test_nomic.log
✅ vllm/tests/models/language/pooling/test_nomic.py::test_embed_models_mteb[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic.py::test_embed_models_mteb[model_info3] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic.py::test_embed_models_correctness[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic.py::test_embed_models_correctness[model_info3] - PASSED

# test_output_test_nomic_max_model_len.log
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_default[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_default[model_info1] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_set_max_model_len_legal[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_set_max_model_len_legal[model_info1] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_set_max_model_len_illegal[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_set_max_model_len_illegal[model_info1] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_use_rope_scaling_legal[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_use_rope_scaling_legal[model_info1] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_use_rope_scaling_illegal[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_use_rope_scaling_illegal[model_info1] - PASSED

# test_output_test_qwen3_reranker.log
✅ vllm/tests/models/language/pooling/test_qwen3_reranker.py::test_rerank_models_mteb[model_info0] - PASSED

# test_output_test_reward.log
❌ vllm/tests/models/language/pooling/test_reward.py::test_prm_models[True-half-Qwen/Qwen2.5-Math-PRM-7B] - FAILED (COMPUTE_CAPABILITY)
🟡 vllm/tests/models/language/pooling/test_reward.py::test_prm_models[False-half-Qwen/Qwen2.5-Math-PRM-7B] - FAILED_OOM

# 🟡 1 CUDA OOM failure(s) detected in test_output_test_reward.log
#   🟡 vllm/tests/models/language/pooling/test_reward.py::test_prm_models[False-half-Qwen/Qwen2.5-Math-PRM-7B]
#      Attempted: 260.0 MiB
#      Available: 115.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.26 GiB

# test_output_test_score.log
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_list[model0-True] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_list[model0-False] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_list_text_2_list[model0-True] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_list_text_2_list[model0-False] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_str[model0-True] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_str[model0-False] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_score_max_model_len[model0-True] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_score_max_model_len[model0-False] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_list[model1-True] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_list[model1-False] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_list_text_2_list[model1-True] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_list_text_2_list[model1-False] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_str[model1-True] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_str[model1-False] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_score_max_model_len[model1-True] - PASSED

# test_output_test_scoring.log
✅ vllm/tests/models/language/pooling/test_scoring.py::test_cross_encoder_1_to_1[cross-encoder/ms-marco-MiniLM-L-6-v2] - PASSED
✅ vllm/tests/models/language/pooling/test_scoring.py::test_cross_encoder_1_to_N[cross-encoder/ms-marco-MiniLM-L-6-v2] - PASSED
✅ vllm/tests/models/language/pooling/test_scoring.py::test_cross_encoder_N_to_N[cross-encoder/ms-marco-MiniLM-L-6-v2] - PASSED
✅ vllm/tests/models/language/pooling/test_scoring.py::test_cross_encoder_1_to_1[BAAI/bge-reranker-v2-m3] - PASSED
✅ vllm/tests/models/language/pooling/test_scoring.py::test_cross_encoder_1_to_N[BAAI/bge-reranker-v2-m3] - PASSED
✅ vllm/tests/models/language/pooling/test_scoring.py::test_cross_encoder_N_to_N[BAAI/bge-reranker-v2-m3] - PASSED
✅ vllm/tests/models/language/pooling/test_scoring.py::test_embedding_1_to_1[sentence-transformers/all-MiniLM-L12-v2] - PASSED
✅ vllm/tests/models/language/pooling/test_scoring.py::test_embedding_1_to_N[sentence-transformers/all-MiniLM-L12-v2] - PASSED
✅ vllm/tests/models/language/pooling/test_scoring.py::test_embedding_N_to_N[sentence-transformers/all-MiniLM-L12-v2] - PASSED

# test_output_test_snowflake_arctic_embed.log
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info3] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info5] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info6] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info7] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_correctness[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_correctness[model_info3] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_correctness[model_info5] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_correctness[model_info6] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_correctness[model_info7] - PASSED

# test_output_test_truncation_control.log
✅ vllm/tests/models/language/pooling/test_truncation_control.py::test_smaller_truncation_size HTTP Error 429 thrown while requesting HEAD https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2/resolve/main/config.json - PASSED
✅ vllm/tests/models/language/pooling/test_truncation_control.py::test_max_truncation_size - PASSED
✅ vllm/tests/models/language/pooling/test_truncation_control.py::test_bigger_truncation_size - PASSED

@jennifurhe jennifurhe marked this pull request as ready for review July 13, 2025 19:51
Also, switch position_ids to be initialized as a buffer and clean up
unused code.

Signed-off-by: Jen H <islandhe@gmail.com>

Signed-off-by:  <islandhe@gmail.com>
@noooop
Copy link
Collaborator

noooop commented Jul 14, 2025

buildkite/ci/pr/async-engine-inputs-utils-worker-test failure is unrelated to this PR

All tests passed, my local test also passed.

@maxdebayser Is there anything else to add?

@Isotr0py
Copy link
Member

Merge from main branch should fix the failing async engine test.

@jennifurhe
Copy link
Contributor Author

Done!

@Isotr0py Isotr0py merged commit 85bd659 into vllm-project:main Jul 15, 2025
69 checks passed
@jennifurhe jennifurhe deleted the fix-bert-autoweightsloader branch July 15, 2025 06:45
x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025
…#20534)

Signed-off-by: Jennifer He <islandhe@gmail.com>
Signed-off-by: <islandhe@gmail.com>
Signed-off-by: Jen H <islandhe@gmail.com>
Signed-off-by: x22x22 <wadeking@qq.com>
Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025
…#20534)

Signed-off-by: Jennifer He <islandhe@gmail.com>
Signed-off-by: <islandhe@gmail.com>
Signed-off-by: Jen H <islandhe@gmail.com>
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
…#20534)

Signed-off-by: Jennifer He <islandhe@gmail.com>
Signed-off-by: <islandhe@gmail.com>
Signed-off-by: Jen H <islandhe@gmail.com>
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
…#20534)

Signed-off-by: Jennifer He <islandhe@gmail.com>
Signed-off-by: <islandhe@gmail.com>
Signed-off-by: Jen H <islandhe@gmail.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
…#20534)

Signed-off-by: Jennifer He <islandhe@gmail.com>
Signed-off-by: <islandhe@gmail.com>
Signed-off-by: Jen H <islandhe@gmail.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
…#20534)

Signed-off-by: Jennifer He <islandhe@gmail.com>
Signed-off-by: <islandhe@gmail.com>
Signed-off-by: Jen H <islandhe@gmail.com>
Signed-off-by: Diego-Castan <diego.castan@ibm.com>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 27, 2025
…#20534)

Signed-off-by: Jennifer He <islandhe@gmail.com>
Signed-off-by: <islandhe@gmail.com>
Signed-off-by: Jen H <islandhe@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants