[Bug]: multilingual-e5-large embedding models should use mean pooling instead of the last

### Your current environment

I  am using docker env for vLLM: vllm/vllm-openai:v0.7.1

### 🐛 Describe the bug

I launched openai-compatible inference server on k8s cluster serving `intfloat/multilingual-e5-large-instruct` model. This is XLMRobertaModel type which is supposed to be using mean pooling instead of last pooling. But I confirmed that the result I get from vllm server matches the one that I can get normalizing the last hidden state. I think this must have been addressed from https://github.com/vllm-project/vllm/pull/9387, but apparently it's not. 

The command I used to launch is "python -m vllm.entrypoints.openai.api_server --model /mnt/models/e5-large ..." and the directory under `/mnt/models/e5-large` looks like this:
```
❯ ls -laRX .
drwxr-xr-x    - jisoo  1 Apr 15:00 .
drwxr-xr-x    - jisoo  1 Apr 13:45 ..
drwxr-xr-x    - jisoo  1 Apr 15:00 1_Pooling
lrw-r--r--  690 jisoo  1 Apr 13:45 config.json
lrw-r--r-- 1.1G jisoo  1 Apr 13:45 model.safetensors
lrw-r--r--  349 jisoo  1 Apr 15:00 modules.json
lrw-r--r--   53 jisoo  1 Apr 15:00 sentence_xlm-roberta_config.json
lrw-r--r-- 5.1M jisoo  1 Apr 13:45 sentencepiece.bpe.model
lrw-r--r--  964 jisoo  1 Apr 13:45 special_tokens_map.json
lrw-r--r--  17M jisoo  1 Apr 13:45 tokenizer.json
lrw-r--r-- 1.2k jisoo  1 Apr 13:45 tokenizer_config.json

./1_Pooling:
drwxr-xr-x   - jisoo  1 Apr 15:00 .
drwxr-xr-x   - jisoo  1 Apr 15:00 ..
lrw-r--r-- 271 jisoo  1 Apr 15:00 config.json
```

modules.json
```
[
  {
    "idx": 0,
    "name": "0",
    "path": "",
    "type": "sentence_transformers.models.Transformer"
  },
  {
    "idx": 1,
    "name": "1",
    "path": "1_Pooling",
    "type": "sentence_transformers.models.Pooling"
  },
  {
    "idx": 2,
    "name": "2",
    "path": "2_Normalize",
    "type": "sentence_transformers.models.Normalize"
  }
]
```

1_Pooling/config.json
```
{
  "word_embedding_dimension": 1024,
  "pooling_mode_cls_token": false,
  "pooling_mode_mean_tokens": true,
  "pooling_mode_max_tokens": false,
  "pooling_mode_mean_sqrt_len_tokens": false,
  "pooling_mode_weightedmean_tokens": false,
  "pooling_mode_lasttoken": false
}
```

is there something that I am missing?
Thanks!

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: multilingual-e5-large embedding models should use mean pooling instead of the last #15929

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: multilingual-e5-large embedding models should use mean pooling instead of the last #15929

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions