[Model] Add AutoWeightsLoader support for BERT, RoBERTa #20534

jennifurhe · 2025-07-07T00:06:45Z

Purpose

FIX (partial) #15697

Also, fixes a checkpoint loading error by adding requires_grad=False to position_ids. This avoids the following exception when loading weights:

ValueError: Following weights were not initialized from checkpoint: {'bert.embeddings.position_ids'}

Test Plan

BertEmbeddingModel:

python3 -m vllm.entrypoints.cli.main serve sentence-transformers/all-MiniLM-L6-v2 --served-model-name bert-embeddings --trust-remote-code

BertForSequenceClassification:

python3 -m vllm.entrypoints.cli.main serve textattack/bert-base-uncased-SST-2 --served-model-name bert-sst2 --trust-remote-code

RobertaForSequenceClassification

python3 -m vllm.entrypoints.cli.main serve cardiffnlp/twitter-roberta-base-sentiment-latest --served-model-name roberta-sentiment --trust-remote-code

python3 -m vllm.entrypoints.cli.main serve jinaai/jina-embeddings-v3 --served-model-name jina-v3 --trust-remote-code

RobertaEmbeddingModel:

python3 -m vllm.entrypoints.cli.main serve FacebookAI/roberta-base --served-model-name roberta-base --trust-remote-code

python3 -m vllm.entrypoints.cli.main serve sentence-transformers/stsb-roberta-base-v2 --served-model-name stsb-roberta --trust-remote-code

Test Result

Models load successfully

Signed-off-by: Jennifer He <islandhe@gmail.com>

github-actions · 2025-07-07T00:06:58Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @panyuhe, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the BERT model implementation by adding support for AutoWeightsLoader to streamline the weight loading process. It also addresses a critical bug that prevented successful checkpoint loading due to an incorrect parameter registration, ensuring the model can be loaded and used correctly.

Highlights

Model Weights Loading: Adds AutoWeightsLoader support for the BERT model, enabling more flexible and automated weight loading.
Bug Fix: Fixes a checkpoint loading error by registering position_ids as a buffer instead of a parameter to avoid ValueError.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a correctness fix by changing position_ids to a buffer, resolving a model loading error. It also refactors the weight loading logic for BertForSequenceClassification to use AutoWeightsLoader, enhancing code maintainability.

gemini-code-assist · 2025-07-07T00:09:08Z

vllm/model_executor/models/bert.py

+        self.register_buffer(
+            "position_ids",
+            torch.arange(config.max_position_embeddings).expand((1, -1)))


Changing position_ids from nn.Parameter to register_buffer is a critical correctness fix. position_ids are non-trainable and shouldn't be in the checkpoint. This prevents a ValueError during weight loading.

gemini-code-assist · 2025-07-07T00:09:08Z

vllm/model_executor/models/bert.py

    def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]):
+        bert_weights = []
+        classifier_weights = []

-        self_weights = []
-
-        def weight_filter():
-            for name, weight in weights:
-                if name.startswith("bert."):
-                    yield (name[len("bert."):], weight)
-                else:
-                    self_weights.append((name, weight))
+        for name, weight in weights:
+            if name.startswith("bert."):
+                bert_weights.append((name, weight))
+            else:
+                classifier_weights.append((name, weight))

-        self.bert.load_weights(weight_filter())
+        loader = AutoWeightsLoader(self)
+        loaded_params = loader.load_weights(bert_weights)

        params_dict = dict(self.named_parameters())
-
-        for name, loaded_weight in self_weights:
-            if name.startswith("classifier"):
+        for name, loaded_weight in classifier_weights:
+            if name in params_dict:
                param = params_dict[name]
                weight_loader = getattr(param, "weight_loader",
                                        default_weight_loader)
                weight_loader(param, loaded_weight)
+                loaded_params.add(name)
+
+        return loaded_params


Refactoring load_weights to use AutoWeightsLoader for the bert module improves maintainability. The logic for handling classifier_weights is also more robust by checking if name in params_dict.

DarkLight1337 · 2025-07-07T08:41:48Z

cc @maxdebayser @noooop

noooop · 2025-07-07T10:18:13Z

vllm/model_executor/models/bert.py

        self._pooler = ClassifierPooler(vllm_config.model_config,
                                        self.classifier, self.bert.pooler)

    def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]):


You need to consider the Roberta model in roberta.py as well.

Thanks for the pointer! I've refactored the RoBERTa models, and ran the following tests to verify that the models load successfully (also updated the PR description):

RobertaForSequenceClassification:

cardiffnlp/twitter-roberta-base-sentiment-latest

jinaai/jina-embeddings-v3

RobertaEmbeddingModel:

FacebookAI/roberta-base

sentence-transformers/stsb-roberta-base-v2

You should verify all the tests under /tests/models/language/pooling.

I am doing testing locally.

This code involves too many models.

vllm/model_executor/models/bert.py

maxdebayser · 2025-07-07T19:04:39Z

Thanks for taking this on. I was thinking if we could also simplify the logic in BertModel. There are several special cases such as:

                if name.endswith(".bias") and name not in params_dict:
                    continue

I think they can be refactored using AutoWeightsLoader but it could be tricky to get all models right.

jennifurhe · 2025-07-09T03:19:06Z

Thanks for taking this on. I was thinking if we could also simplify the logic in BertModel. There are several special cases such as:
                if name.endswith(".bias") and name not in params_dict:
                    continue
I think they can be refactored using AutoWeightsLoader but it could be tricky to get all models right.

Thanks for the comments!

Looking at the bias checks like:

pythonif name.endswith(".bias") and name not in params_dict:
    continue

~~I was thinking these become redundant when using AutoWeightsLoader since it already filters out weights that don't have corresponding parameters in the model~~. For other similar checks, I was able to refactor them with AutoWeightsLoader.

The main challenge for fully migrating BertModel.load_weights() to AutoWeightsLoader is the QKV fusion logic, which requires the 3-parameter signature weight_loader(param, weight, shard_id) that AutoWeightsLoader doesn't currently support, so I'm planning to keep the manual loading logic there.

RobertaForSequenceClassification: - python3 -m vllm.entrypoints.cli.main serve cardiffnlp/twitter-roberta-base-sentiment-latest --served-model-name roberta-sentiment --trust-remote-code - python3 -m vllm.entrypoints.cli.main serve jinaai/jina-embeddings-v3 --served-model-name jina-v3 --trust-remote-code RobertaEmbeddingMode: - python3 -m vllm.entrypoints.cli.main serve FacebookAI/roberta-base --served-model-name roberta-base --trust-remote-code - python3 -m vllm.entrypoints.cli.main serve sentence-transformers/stsb-roberta-base-v2 --served-model-name stsb-roberta --trust-remote-code BertEmbeddingModel: - python3 -m vllm.entrypoints.cli.main serve sentence-transformers/all-MiniLM-L6-v2 --served-model-name bert-embeddings --trust-remote-code Signed-off-by: <islandhe@gmail.com> Signed-off-by: Jen H <islandhe@gmail.com>

noooop

LGTM

Thank you for simplifying this code.

noooop · 2025-07-09T05:24:19Z

@DarkLight1337

Help enable the Language Models Test (Extended Pooling)

I'm not sure if all models are loaded correctly.

noooop · 2025-07-09T05:34:58Z

@DarkLight1337

Can we pause CI testing first and wait for my local testing?

I actually just said that I think the code is OK, but not ready.

DarkLight1337 · 2025-07-09T05:39:02Z

It's too complicated to cancel the tests except for the extended pooling test, I'll just keep them running

noooop · 2025-07-09T05:51:31Z

@panyuhe

Please click Convert to draft on the right

so that each submission will not trigger the full process CI,

which will run for more than 3 hours.

Wait until we have solved most of the problems in local testing before starting it

noooop · 2025-07-09T05:53:53Z

I'm testing locally and will send you the failed tests

noooop · 2025-07-09T06:52:17Z

Overall, most models were loaded successfully, only a few models fails

ValueError: Following weights were not initialized from checkpoint: {'model.embeddings.position_ids'}

FAILED tests/models/language/pooling/test_scoring.py::test_cross_encoder_1_to_1[BAAI/bge-reranker-v2-m3]
FAILED tests/models/language/pooling/test_scoring.py::test_cross_encoder_1_to_N[BAAI/bge-reranker-v2-m3]
FAILED tests/models/language/pooling/test_scoring.py::test_cross_encoder_N_to_N[BAAI/bge-reranker-v2-m3]
FAILED tests/models/language/pooling/test_baai.py::test_embed_models_mteb[model_info13]
FAILED tests/models/language/pooling/test_baai.py::test_embed_models_correctness[model_info13]
FAILED tests/models/language/pooling/test_jina.py::test_rerank_models_mteb[model_info0]

ERROR tests/entrypoints/openai/test_embedding_dimensions.py::test_matryoshka[model_info1-bfloat16]
ERROR tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_list[model0-True]
ERROR tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_list[model0-False]
ERROR tests/entrypoints/openai/test_score.py::TestModel::test_text_1_list_text_2_list[model0-True]
ERROR tests/entrypoints/openai/test_score.py::TestModel::test_text_1_list_text_2_list[model0-False]
ERROR tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_str[model0-True]
ERROR tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_str[model0-False]
ERROR tests/entrypoints/openai/test_score.py::TestModel::test_score_max_model_len[model0-True]
ERROR tests/entrypoints/openai/test_score.py::TestModel::test_score_max_model_len[model0-False]

FAILED tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info0]
FAILED tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info5]
FAILED tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info6]
FAILED tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_correctness[model_info0]
FAILED tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_correctness[model_info5]
FAILED tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_correctness[model_info6]

roberta_task_weights_filter in roberta.py may no longer be used

noooop · 2025-07-09T07:42:00Z

@panyuhe
If you have any difficulty running the tests, please leave a message and we will help you.

It took me a long time to learn how to run these tests. (╯‵□′)╯︵┻━┻

vllm is really too complicated

jennifurhe · 2025-07-09T19:06:13Z

@panyuhe If you have any difficulty running the tests, please leave a message and we will help you.

It took me a long time to learn how to run these tests. (╯‵□′)╯︵┻━┻

vllm is really too complicated

Thank you for helping to run the tests. I'm able to reproduce the same error messages using vllm.entrypoints.cli.main on the failing models. I'm still working on setting up the test environment to run the actual pooling tests locally, and will reach out if I run into difficulties. Thank you!

noooop · 2025-07-10T01:28:53Z

You may need to install mteb[bm25s]>=1.38.11, <2 and pytest-asyncio to run the tests

For all dependencies, please refer to

But installing everything will be very slow

Then you can launch the failing tests using pytest, e.g.

pytest -s -vvv tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info0]

jennifurhe · 2025-07-13T19:51:15Z

@noooop All of the originally failing tests are passing now and I pasted the results of running the full pooling test suite below. Thank you for your help!

Making an amendment to my previous statement, AutoWeightsLoader does not actually check if the weight is in the model's named parameters, so I added back the check for that.

Full pooling test results (some failing due to HW)

# test_output_test_baai.log
✅ vllm/tests/models/language/pooling/test_baai.py::test_embed_models_mteb[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_baai.py::test_embed_models_mteb[model_info13] - PASSED
✅ vllm/tests/models/language/pooling/test_baai.py::test_embed_models_mteb[model_info14] - PASSED
✅ vllm/tests/models/language/pooling/test_baai.py::test_embed_models_correctness[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_baai.py::test_embed_models_correctness[model_info13] - PASSED
✅ vllm/tests/models/language/pooling/test_baai.py::test_embed_models_correctness[model_info14] - PASSED
✅ vllm/tests/models/language/pooling/test_baai.py::test_rerank_models_mteb[model_info0] - PASSED

# test_output_test_classification.log
✅ vllm/tests/models/language/pooling/test_classification.py::test_models[float-jason9693/Qwen2.5-1.5B-apeach] - PASSED

# test_output_test_cross_encoder.log
✅ vllm/tests/models/language/pooling/test_cross_encoder.py::test_rerank_models_mteb[model_info0] - PASSED
🟡 vllm/tests/models/language/pooling/test_cross_encoder.py::test_rerank_models_mteb[model_info1] - FAILED_OOM

# 🟡 1 CUDA OOM failure(s) detected in test_output_test_cross_encoder.log
#   🟡 vllm/tests/models/language/pooling/test_cross_encoder.py::test_rerank_models_mteb[model_info1]
#      Attempted: 146.0 MiB
#      Available: 1.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.37 GiB

# test_output_test_embedding_dimensions.log
# No passed/failed test results found (skipped tests excluded)

# test_output_test_embedding.log
🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[True-BAAI/bge-multilingual-gemma2] - FAILED_OOM
🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[True-intfloat/e5-mistral-7b-instruct] - FAILED_OOM
🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[True-ssmits/Qwen2-7B-Instruct-embed-base] - FAILED_OOM
🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[False-BAAI/bge-multilingual-gemma2] - FAILED_OOM
🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[False-intfloat/e5-mistral-7b-instruct] - FAILED_OOM
🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[False-BAAI/bge-base-en-v1.5] - FAILED_OOM
✅ vllm/tests/models/language/pooling/test_embedding.py::test_models[False-sentence-transformers/all-MiniLM-L12-v2] - PASSED
✅ vllm/tests/models/language/pooling/test_embedding.py::test_models[False-intfloat/multilingual-e5-small] - PASSED
✅ vllm/tests/models/language/pooling/test_embedding.py::test_models[False-Alibaba-NLP/gte-Qwen2-1.5B-instruct] - PASSED
✅ vllm/tests/models/language/pooling/test_embedding.py::test_models[False-sentence-transformers/stsb-roberta-base-v2] - PASSED

# 🟡 6 CUDA OOM failure(s) detected in test_output_test_embedding.log
#   🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[True-BAAI/bge-multilingual-gemma2]
#      Attempted: 56.0 MiB
#      Available: 49.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.32 GiB
#   🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[True-intfloat/e5-mistral-7b-instruct]
#      Attempted: 250.0 MiB
#      Available: 49.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.32 GiB
#   🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[True-ssmits/Qwen2-7B-Instruct-embed-base]
#      Attempted: 130.0 MiB
#      Available: 129.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.24 GiB
#   🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[False-BAAI/bge-multilingual-gemma2]
#      Attempted: 3.42 GiB
#      Available: 129.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.24 GiB
#   🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[False-intfloat/e5-mistral-7b-instruct]
#      Attempted: 112.0 MiB
#      Available: 29.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.34 GiB
#   🟡 vllm/tests/models/language/pooling/test_embedding.py::test_models[False-BAAI/bge-base-en-v1.5]
#      Attempted: 46.0 MiB
#      Available: 29.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.34 GiB

# test_output_test_gritlm.log
✅ vllm/tests/models/language/pooling/test_gritlm.py::test_find_array - PASSED
🟡 vllm/tests/models/language/pooling/test_gritlm.py::test_gritlm_offline_embedding - FAILED_OOM
🔴 vllm/tests/models/language/pooling/test_gritlm.py::test_gritlm_api_server_embedding - FAILED_SERVER_CRASH
🟡 vllm/tests/models/language/pooling/test_gritlm.py::test_gritlm_offline_generate - FAILED_OOM
🔴 vllm/tests/models/language/pooling/test_gritlm.py::test_gritlm_api_server_generate - FAILED_SERVER_CRASH

# 🟡 2 CUDA OOM failure(s) detected in test_output_test_gritlm.log
#   🟡 vllm/tests/models/language/pooling/test_gritlm.py::test_gritlm_offline_embedding
#      Attempted: 224.0 MiB
#      Available: 15.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.35 GiB
#   🟡 vllm/tests/models/language/pooling/test_gritlm.py::test_gritlm_offline_generate
#      Attempted: 224.0 MiB
#      Available: 15.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.35 GiB

# 🔴 2 Server crash failure(s) detected in test_output_test_gritlm.log
#   (Likely caused by previous OOM failures crashing the server process)
#   vllm/tests/models/language/pooling/test_gritlm.py::test_gritlm_api_server_embedding
#   vllm/tests/models/language/pooling/test_gritlm.py::test_gritlm_api_server_generate

# test_output_test_gte.log
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_mteb[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_mteb[model_info6] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_mteb[model_info7] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_mteb[model_info8] - PASSED
🟡 vllm/tests/models/language/pooling/test_gte.py::test_embed_models_mteb[model_info9] - FAILED_OOM
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_mteb[model_info10] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_mteb[model_info11] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_correctness[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_correctness[model_info6] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_correctness[model_info7] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_correctness[model_info8] - PASSED
🟡 vllm/tests/models/language/pooling/test_gte.py::test_embed_models_correctness[model_info9] - FAILED_OOM
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_correctness[model_info10] - PASSED
✅ vllm/tests/models/language/pooling/test_gte.py::test_embed_models_correctness[model_info11] - PASSED

# 🟡 2 CUDA OOM failure(s) detected in test_output_test_gte.log
#   🟡 vllm/tests/models/language/pooling/test_gte.py::test_embed_models_mteb[model_info9]
#      Attempted: 64.0 MiB
#      Available: 1.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.37 GiB
#   🟡 vllm/tests/models/language/pooling/test_gte.py::test_embed_models_correctness[model_info9]
#      Attempted: 54.0 MiB
#      Available: 59.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.31 GiB

# test_output_test_intfloat.log
✅ vllm/tests/models/language/pooling/test_intfloat.py::test_embed_models_mteb[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_intfloat.py::test_embed_models_mteb[model_info4] - PASSED
✅ vllm/tests/models/language/pooling/test_intfloat.py::test_embed_models_correctness[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_intfloat.py::test_embed_models_correctness[model_info4] - PASSED

# test_output_test_jina.log
✅ vllm/tests/models/language/pooling/test_jina.py::test_embed_models_mteb[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_jina.py::test_embed_models_correctness[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_jina.py::test_rerank_models_mteb[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_jina.py::test_matryoshka[16-half-model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_jina.py::test_matryoshka[32-half-model_info0] - PASSED

# test_output_test_mxbai_rerank.log
✅ vllm/tests/models/language/pooling/test_mxbai_rerank.py::test_rerank_models_mteb[model_info0] - PASSED

# test_output_test_nomic.log
✅ vllm/tests/models/language/pooling/test_nomic.py::test_embed_models_mteb[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic.py::test_embed_models_mteb[model_info3] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic.py::test_embed_models_correctness[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic.py::test_embed_models_correctness[model_info3] - PASSED

# test_output_test_nomic_max_model_len.log
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_default[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_default[model_info1] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_set_max_model_len_legal[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_set_max_model_len_legal[model_info1] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_set_max_model_len_illegal[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_set_max_model_len_illegal[model_info1] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_use_rope_scaling_legal[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_use_rope_scaling_legal[model_info1] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_use_rope_scaling_illegal[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_nomic_max_model_len.py::test_use_rope_scaling_illegal[model_info1] - PASSED

# test_output_test_qwen3_reranker.log
✅ vllm/tests/models/language/pooling/test_qwen3_reranker.py::test_rerank_models_mteb[model_info0] - PASSED

# test_output_test_reward.log
❌ vllm/tests/models/language/pooling/test_reward.py::test_prm_models[True-half-Qwen/Qwen2.5-Math-PRM-7B] - FAILED (COMPUTE_CAPABILITY)
🟡 vllm/tests/models/language/pooling/test_reward.py::test_prm_models[False-half-Qwen/Qwen2.5-Math-PRM-7B] - FAILED_OOM

# 🟡 1 CUDA OOM failure(s) detected in test_output_test_reward.log
#   🟡 vllm/tests/models/language/pooling/test_reward.py::test_prm_models[False-half-Qwen/Qwen2.5-Math-PRM-7B]
#      Attempted: 260.0 MiB
#      Available: 115.38 MiB
#      Total GPU: 7.57 GiB
#      In Use: 7.26 GiB

# test_output_test_score.log
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_list[model0-True] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_list[model0-False] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_list_text_2_list[model0-True] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_list_text_2_list[model0-False] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_str[model0-True] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_str[model0-False] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_score_max_model_len[model0-True] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_score_max_model_len[model0-False] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_list[model1-True] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_list[model1-False] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_list_text_2_list[model1-True] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_list_text_2_list[model1-False] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_str[model1-True] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_text_1_str_text_2_str[model1-False] - PASSED
✅ vllm/tests/entrypoints/openai/test_score.py::TestModel::test_score_max_model_len[model1-True] - PASSED

# test_output_test_scoring.log
✅ vllm/tests/models/language/pooling/test_scoring.py::test_cross_encoder_1_to_1[cross-encoder/ms-marco-MiniLM-L-6-v2] - PASSED
✅ vllm/tests/models/language/pooling/test_scoring.py::test_cross_encoder_1_to_N[cross-encoder/ms-marco-MiniLM-L-6-v2] - PASSED
✅ vllm/tests/models/language/pooling/test_scoring.py::test_cross_encoder_N_to_N[cross-encoder/ms-marco-MiniLM-L-6-v2] - PASSED
✅ vllm/tests/models/language/pooling/test_scoring.py::test_cross_encoder_1_to_1[BAAI/bge-reranker-v2-m3] - PASSED
✅ vllm/tests/models/language/pooling/test_scoring.py::test_cross_encoder_1_to_N[BAAI/bge-reranker-v2-m3] - PASSED
✅ vllm/tests/models/language/pooling/test_scoring.py::test_cross_encoder_N_to_N[BAAI/bge-reranker-v2-m3] - PASSED
✅ vllm/tests/models/language/pooling/test_scoring.py::test_embedding_1_to_1[sentence-transformers/all-MiniLM-L12-v2] - PASSED
✅ vllm/tests/models/language/pooling/test_scoring.py::test_embedding_1_to_N[sentence-transformers/all-MiniLM-L12-v2] - PASSED
✅ vllm/tests/models/language/pooling/test_scoring.py::test_embedding_N_to_N[sentence-transformers/all-MiniLM-L12-v2] - PASSED

# test_output_test_snowflake_arctic_embed.log
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info3] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info5] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info6] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_mteb[model_info7] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_correctness[model_info0] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_correctness[model_info3] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_correctness[model_info5] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_correctness[model_info6] - PASSED
✅ vllm/tests/models/language/pooling/test_snowflake_arctic_embed.py::test_embed_models_correctness[model_info7] - PASSED

# test_output_test_truncation_control.log
✅ vllm/tests/models/language/pooling/test_truncation_control.py::test_smaller_truncation_size HTTP Error 429 thrown while requesting HEAD https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2/resolve/main/config.json - PASSED
✅ vllm/tests/models/language/pooling/test_truncation_control.py::test_max_truncation_size - PASSED
✅ vllm/tests/models/language/pooling/test_truncation_control.py::test_bigger_truncation_size - PASSED

Also, switch position_ids to be initialized as a buffer and clean up unused code. Signed-off-by: Jen H <islandhe@gmail.com> Signed-off-by: <islandhe@gmail.com>

noooop · 2025-07-14T04:39:25Z

buildkite/ci/pr/async-engine-inputs-utils-worker-test failure is unrelated to this PR

All tests passed, my local test also passed.

@maxdebayser Is there anything else to add?

Isotr0py · 2025-07-14T05:44:59Z

Merge from main branch should fix the failing async engine test.

jennifurhe · 2025-07-14T17:38:02Z

Done!

…#20534) Signed-off-by: Jennifer He <islandhe@gmail.com> Signed-off-by: <islandhe@gmail.com> Signed-off-by: Jen H <islandhe@gmail.com> Signed-off-by: x22x22 <wadeking@qq.com>

…#20534) Signed-off-by: Jennifer He <islandhe@gmail.com> Signed-off-by: <islandhe@gmail.com> Signed-off-by: Jen H <islandhe@gmail.com>

…#20534) Signed-off-by: Jennifer He <islandhe@gmail.com> Signed-off-by: <islandhe@gmail.com> Signed-off-by: Jen H <islandhe@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

…#20534) Signed-off-by: Jennifer He <islandhe@gmail.com> Signed-off-by: <islandhe@gmail.com> Signed-off-by: Jen H <islandhe@gmail.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

…#20534) Signed-off-by: Jennifer He <islandhe@gmail.com> Signed-off-by: <islandhe@gmail.com> Signed-off-by: Jen H <islandhe@gmail.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

…#20534) Signed-off-by: Jennifer He <islandhe@gmail.com> Signed-off-by: <islandhe@gmail.com> Signed-off-by: Jen H <islandhe@gmail.com>

[Model] Use AutoWeightsLoader for BERT and fix position_ids.

65b80fe

Signed-off-by: Jennifer He <islandhe@gmail.com>

gemini-code-assist bot reviewed Jul 7, 2025

View reviewed changes

noooop reviewed Jul 7, 2025

View reviewed changes

Isotr0py reviewed Jul 7, 2025

View reviewed changes

vllm/model_executor/models/bert.py Outdated Show resolved Hide resolved

jennifurhe force-pushed the fix-bert-autoweightsloader branch from a7769db to ed9c1ae Compare July 9, 2025 05:10

jennifurhe changed the title ~~[Model] Add AutoWeightsLoader support for BERT~~ [Model] Add AutoWeightsLoader support for BERT, RoBERTa Jul 9, 2025

noooop approved these changes Jul 9, 2025

View reviewed changes

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 9, 2025

jennifurhe marked this pull request as draft July 9, 2025 05:52

jennifurhe marked this pull request as ready for review July 13, 2025 19:51

Add back checks for model named parameters.

216f147

Also, switch position_ids to be initialized as a buffer and clean up unused code. Signed-off-by: Jen H <islandhe@gmail.com> Signed-off-by: <islandhe@gmail.com>

jennifurhe force-pushed the fix-bert-autoweightsloader branch from d3a7a66 to 216f147 Compare July 13, 2025 20:29

noooop mentioned this pull request Jul 14, 2025

[Model] use AutoWeightsLoader for bart #18299

Merged

Isotr0py approved these changes Jul 14, 2025

View reviewed changes

Merge branch 'vllm-project:main' into fix-bert-autoweightsloader

fc62297

Merge branch 'vllm-project:main' into fix-bert-autoweightsloader

f3bb969

Isotr0py merged commit 85bd659 into vllm-project:main Jul 15, 2025
69 checks passed

jennifurhe deleted the fix-bert-autoweightsloader branch July 15, 2025 06:45

Uh oh!

[Model] Add AutoWeightsLoader support for BERT, RoBERTa #20534

[Model] Add AutoWeightsLoader support for BERT, RoBERTa #20534

Uh oh!

Conversation

jennifurhe commented Jul 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Jul 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Jul 7, 2025

Uh oh!

noooop Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

jennifurhe Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

noooop Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

maxdebayser commented Jul 7, 2025

Uh oh!

jennifurhe commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop left a comment

Choose a reason for hiding this comment

Uh oh!

noooop commented Jul 9, 2025

Uh oh!

noooop commented Jul 9, 2025

Uh oh!

DarkLight1337 commented Jul 9, 2025

Uh oh!

noooop commented Jul 9, 2025

Uh oh!

noooop commented Jul 9, 2025

Uh oh!

noooop commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jennifurhe commented Jul 9, 2025

Uh oh!

noooop commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jennifurhe commented Jul 13, 2025

Uh oh!

noooop commented Jul 14, 2025

Uh oh!

Isotr0py commented Jul 14, 2025

Uh oh!

jennifurhe commented Jul 14, 2025

Uh oh!

Uh oh!

jennifurhe commented Jul 7, 2025 •

edited by github-actions bot

Loading

noooop Jul 9, 2025 •

edited

Loading

jennifurhe commented Jul 9, 2025 •

edited

Loading

noooop commented Jul 9, 2025 •

edited

Loading

noooop commented Jul 9, 2025 •

edited

Loading

noooop commented Jul 10, 2025 •

edited

Loading