[Feature]: Generalize RoutingMethodType for broader MoE routing control

### 🚀 The feature, motivation and pitch

PR #27492 introduced `RoutingMethodType` to support different routing methods for FP8 flashinfer TRTLLM MOE (DeepSeekV3, Llama4, Renormalize, etc.).
While this was implemented to support Qwen3 and Qwen3-next models, the review discussion revealed opportunities to use this more broadly across the
codebase to simplify MoE routing configuration.

### Motivation:

Currently, MoE routing behavior is controlled through multiple fragmented parameters (scoring_func, renormalize, use_grouped_topk, custom routing
functions, etc.). This creates several issues:

1. Lack of clarity: The routing method isn't explicitly defined in one place
2. Code duplication: Each model must explicitly specify routing parameters
3. Maintenance burden: Adding new routing methods requires updates across multiple locations
4. Tight coupling: Current implementation is tied to flashinfer's specific enum values

As noted by @mgoin:
"I like the idea of having a routing method type so we can reduce the need for hacks like checking the llama 4 custom routing function within the 
quant method... I think if we do this right, we can actually remove other arguments we have in FusedMoE such as renormalize."

### Proposed improvements:

1. Auto-derive routing type: Instead of requiring each model to explicitly set routing_method_type, automatically derive it from existing parameters
(`scoring_func`, `renormalize`, `use_grouped_topk`, `top_k`, etc.) within `FusedMoE.__init__`
1. Decouple from flashinfer: Make RoutingMethodType a vLLM-native abstraction that works across all fused MoE backends (not just flashinfer TRTLLM),
with backend-specific mapping happening at the kernel level
1. Simplify FusedMoE API: Remove redundant parameters like `renormalize` and potentially apply_router_weight_on_input by folding them into the routing
type
1. Support explicit override: Allow models to explicitly specify routing type when auto-derivation isn't sufficient
2. Router abstraction: Consider implementing router objects/functions that can be passed directly (as suggested by @bnellnm)

### Alternatives

Keep the current approach of using multiple discrete parameters (`scoring_func`, `renormalize`, etc.), but this requires ongoing maintenance of mapping
logic scattered across quant methods and model code.

### Additional context

Related PR: #27492 - Initial implementation of `RoutingMethodType`

Code locations that would benefit:
- `vllm/model_executor/layers/fused_moe/config.py:RoutingMethodType` - Make backend-agnostic
- `vllm/model_executor/layers/fused_moe/layer.py:FusedMoE.__init__` - Add auto-derivation logic
- `vllm/model_executor/layers/quantization/fp8.py` - Simplify routing type usage
- `vllm/model_executor/models/qwen3_moe.py` - Should not need explicit routing_method_type
- `vllm/model_executor/models/qwen3_next.py` - Should not need explicit routing_method_type

cc @bnellnm @jiahanc @pavanimajety 

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Generalize RoutingMethodType for broader MoE routing control #28408

🚀 The feature, motivation and pitch

Motivation:

Proposed improvements:

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Generalize RoutingMethodType for broader MoE routing control #28408

Description

🚀 The feature, motivation and pitch

Motivation:

Proposed improvements:

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions