[Feature]: Add num_corrupted_request metric to V1 metrics system.

### Description

Currently, vLLM internally tracks a corrupted_requests_counter metric whenever a request produces invalid outputs (NaNs) due to model, engine, or hardware issues. However, this metric is not directly exposed to users in logs or Prometheus metrics.

Exposing this metric would allow users to:
- Detect model instability or misbehaving custom models.
- Monitor runtime/engine health in production clusters.
- Quickly identify hardware or distributed inference issues affecting outputs

### Motivation & Problem

While NaN outputs are rare with well-tested models, they become critical for custom models in early development stages or may also arise within the runtime because of Engine/runtime issues. 

- Models may have numerical instability.
- Hardware issues are more likely to surface

The codebase already detects corrupted requests (Request.is_output_corrupted) when the env variable is set `VLLM_COMPUTE_NANS_IN_LOGITS = TRUE`, but this diagnostic information is completely hidden from users, as there are no metrics, no logging, and no monitoring.

### Proposed Idea

Two Approaches for Corrupted Request Metrics that i am thinking of are: 

1. Approach 1: CLI Config-Based (Current Implementation)

 -  `--include-corrupted-requests #CLI flag`
 - Config: SchedulerConfig.include_corrupted_requests
 - Usage: vllm serve model --include-corrupted-requests

Pros: User-friendly, explicit control, follows vLLM patterns
Cons: Adds new CLI argument, requires config changes


I welcome suggestions and thoughts on the same, and would love to contribute the same.


### Alternatives

2. Approach 2: Environment Variable-Based (Proposed Alternative)

- Existing `VLLM_COMPUTE_NANS_IN_LOGITS` environment variable
- Logic: When NaN detection is enabled, automatically expose corrupted metrics
- Usage: VLLM_COMPUTE_NANS_IN_LOGITS=1 vllm serve model

Pros: Reuses existing infrastructure, no new CLI args.
Cons: A coupling of metrics exposure to NaN detection, and less granular control.

Thanks
Snehlata

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Add num_corrupted_request metric to V1 metrics system. #27301

Description

Motivation & Problem

Proposed Idea

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Add num_corrupted_request metric to V1 metrics system. #27301

Description

Description

Motivation & Problem

Proposed Idea

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions