Skip to content

Conversation

@peishenyan
Copy link
Contributor

Description

Support the local_window_size attribute in GroupQueryAttention Operator, which is designed for sliding window attention and may influence the attention mask pattern.

For local window size not equal to -1, new attention mask pattern will be created as follows for applying sliding window.

     condition_1 (old attn_mask) ---> CumSum (axis=3, exclusive=true, reversed=true)
          |                             |
          |                           Lesser <--- local_window_size
          |                             |
      LogicalAnd <----------------- condition_2
          |
    new attn_mask

Motivation and Context

add log info

temp

Support local_window_size for WebNN GQA
@peishenyan
Copy link
Contributor Author

PTAL, thanks. @Honry

Copy link
Contributor

@Honry Honry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM, pls. also mention the needs to use 'expand' for mask_shape_ones_shape_constant in your commit message.

@Honry
Copy link
Contributor

Honry commented Nov 17, 2025

  • @fdwr, please take another look, thanks!

Copy link
Contributor

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@fdwr
Copy link
Contributor

fdwr commented Nov 20, 2025

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline,Windows GPU WebGPU CI Pipeline,Windows OpenVINO CI Pipeline

@fdwr
Copy link
Contributor

fdwr commented Nov 20, 2025

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

@fdwr
Copy link
Contributor

fdwr commented Nov 20, 2025

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI

@fdwr
Copy link
Contributor

fdwr commented Nov 20, 2025

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@fdwr
Copy link
Contributor

fdwr commented Nov 20, 2025

/azp run Test Linux CUDA x64 Release,Test Linux TensorRT x64 Release,web_Debug / build_onnxruntime_web,web_Release / build_onnxruntime_web

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@fdwr
Copy link
Contributor

fdwr commented Nov 20, 2025

/azp run Linux QNN CI Pipeline

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

1 similar comment
@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants