Skip to content

Conversation

@antrec
Copy link
Contributor

@antrec antrec commented Oct 7, 2025

Purpose

Add support for ModernBertForTokenClassification.
I got inspired from #24872 , and adapted to ModernBert architecture. I did not touch to flex attention backend or anything, though, because I expected the changes already done in that previous PR to make BERT work for NER, would work for ModernBert as well.

Test Plan

Added a single test case in tests/models/language/pooling/test_token_classification.py, that checks the results of huggingface and vllm runners for disham993/electrical-ner-ModernBERT-base are almost equal. I took a "random" model, as little as possible (there are not many modernbert for NER available on huggingface).

Test Result

The test passes.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@github-actions
Copy link

github-actions bot commented Oct 7, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify bot added the new-model Requests to new models label Oct 7, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for ModernBertForTokenClassification. The changes include the model implementation, registration, and a new test case. The implementation of ModernBertForTokenClassification is missing the application of the dropout layer, which is a correctness issue. Additionally, the new test case for this model has a bug in its result-checking loop that will cause it to fail. I've provided suggestions to fix both of these critical issues.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Comment on lines 66 to 69
# check logits difference
for hf_output, vllm_output in enumerate(zip(hf_outputs, vllm_outputs)):
hf_output = torch.tensor(hf_output).cpu().float()
vllm_output = torch.tensor(vllm_output).cpu().float()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Token classification test compares index tensor to model output

The ModernBERT token classification test iterates with enumerate(zip(hf_outputs, vllm_outputs)), so hf_output is the loop index and vllm_output is a tuple of tensors. The subsequent conversion to tensors (torch.tensor(hf_output) / torch.tensor(vllm_output)) therefore attempts to turn an integer and a tuple of tensors into tensors, raising a TypeError before the assertion can run. This means the new test fails regardless of model correctness. The loop should iterate over zip(hf_outputs, vllm_outputs) directly so that the actual logits are compared.

Useful? React with 👍 / 👎.

@DarkLight1337
Copy link
Member

@noooop would be great if you could help review this!

inputs_embeds=inputs_embeds,
intermediate_tensors=intermediate_tensors,
)
hidden_states = self.drop(self.head(hidden_states))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution.

Clear and clean, with only a few details needing modification.

For the vLLM inference framework, there is no need to implement dropout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have made a commit to remove the dropout layer from the model, and dismiss its weight loading in load_weights.

@antrec antrec force-pushed the add-modernbert-token-classification branch from 1699421 to 303edff Compare October 7, 2025 11:10
@noooop
Copy link
Collaborator

noooop commented Oct 7, 2025

LGTM

cc @DarkLight1337

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both of you!

@DarkLight1337
Copy link
Member

Actually, can you also update the Supported Models page docs/models/supported_models.md?

santrecare and others added 5 commits October 7, 2025 14:34
Signed-off-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: antrec <antoine.recanati@gmail.com>
Added dropout layer in forward, which should not have effect at inference, though, but I still kept it in case, and to match huggingface architecture so that weight loading is simplified.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: antrec <antoine.recanati@gmail.com>
Signed-off-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Signed-off-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
@antrec antrec force-pushed the add-modernbert-token-classification branch from 1c21935 to 5a839fd Compare October 7, 2025 12:36
@mergify mergify bot added the documentation Improvements or additions to documentation label Oct 7, 2025
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 7, 2025 12:39
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 7, 2025
@DarkLight1337 DarkLight1337 merged commit 6f59bea into vllm-project:main Oct 7, 2025
55 checks passed
mrasquinha-g pushed a commit to mrasquinha-g/vllm that referenced this pull request Oct 9, 2025
…t#26340)

Signed-off-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Signed-off-by: antrec <antoine.recanati@gmail.com>
Co-authored-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@DarkLight1337
Copy link
Member

@noooop
Copy link
Collaborator

noooop commented Oct 9, 2025

The test fails in https://buildkite.com/vllm/ci/builds/33941/steps/canvas?sid=0199c1fa-d5a7-40a2-8f02-b45545380a3b, can you fix it?

fix in #26414

@antrec
Copy link
Contributor Author

antrec commented Oct 9, 2025

The test fails in https://buildkite.com/vllm/ci/builds/33941/steps/canvas?sid=0199c1fa-d5a7-40a2-8f02-b45545380a3b, can you fix it?

fix in #26414

Thanks! So, there is nothing to do on my side?

The test passed locally for me before committing, but I did not manage to have the exact same setup than in the CI on my Macbook (I did not use VLLM_USE_PRECOMPILED=1, and used torch==2.8.0 instead of torch==2.8.0+cpu), sorry if I missed something because of that.

@noooop
Copy link
Collaborator

noooop commented Oct 9, 2025

The test fails in https://buildkite.com/vllm/ci/builds/33941/steps/canvas?sid=0199c1fa-d5a7-40a2-8f02-b45545380a3b, can you fix it?

fix in #26414

Thanks! So, there is nothing to do on my side?

The test passed locally for me before committing, but I did not manage to have the exact same setup than in the CI on my Macbook (I did not use VLLM_USE_PRECOMPILED=1, and used torch==2.8.0 instead of torch==2.8.0+cpu), sorry if I missed something because of that.

There are minor differences in numerical precision across different devices. It’s okay; this has been fixed by #26466.

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…t#26340)

Signed-off-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Signed-off-by: antrec <antoine.recanati@gmail.com>
Co-authored-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
…t#26340)

Signed-off-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Signed-off-by: antrec <antoine.recanati@gmail.com>
Co-authored-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…t#26340)

Signed-off-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Signed-off-by: antrec <antoine.recanati@gmail.com>
Co-authored-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…t#26340)

Signed-off-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Signed-off-by: antrec <antoine.recanati@gmail.com>
Co-authored-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…t#26340)

Signed-off-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Signed-off-by: antrec <antoine.recanati@gmail.com>
Co-authored-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation new-model Requests to new models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants