-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
graph, benchdnn: support implicit causal mask in graph API #2330
Conversation
f444b05
to
4651fa6
Compare
4651fa6
to
2a9ab89
Compare
ab92e00
to
28f56e6
Compare
make test |
@@ -146,6 +146,37 @@ DNNL_BACKEND_REGISTER_PATTERN_MATCHER_PASS(dnnl, float_sdp_fusion) | |||
return std::make_shared<sdp_base_t<>>(); | |||
}); | |||
|
|||
DNNL_BACKEND_REGISTER_PATTERN_MATCHER_PASS(dnnl, float_sdp_implicit_mask_fusion) | |||
.set_priority(21.0f) | |||
.set_engine_kind(engine_kind::cpu) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why cpu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the currently implementation of gen_index only focus on CPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have any plan to implement it for GPU also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but most likely in a separate PR, and not target v3.7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this pattern should work for optimized GPU version, I don't think it make sense to limit to engine kind.
I'd expect it returns unimplemented later in the flow (it won't pick up any other patterns anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concern is that: if we don't limit the engine kind here, then for GPU case, the unimplemented status returned to user will be at execution stage, which may be too late for them to make necessary modifications.
In contrast, if we adopt the current CPU handling (and plan to remove this constraint once GPU support is implemented), then unsupported partitions will be returned earlier at get_partitions stage, this approach allows users to handle these unsupported partitions earlier by themselves.
28f56e6
to
4975a13
Compare
make test |
78c01cc
to
0ac759c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a few comments, please incorporate as you see fit, thanks!
0ac759c
to
4e81d51
Compare
3bbb517
to
1a555aa
Compare
make test |
1 similar comment
make test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more edits suggested, please incorporate, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I reviewed API, document, interface folder, part of the DNNL backend implementation.
e133094
to
027e58b
Compare
make test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor edits suggested. Rest looks good, thanks
GenIndex{#dev_guide_op_genindex} | ||
================================ | ||
|
||
## General |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## General | |
## Overview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, as other operations are also using General, I suggest to change them all in a separate PR.
GreaterEqual{#dev_guide_op_greaterequal} | ||
======================================== | ||
|
||
## General |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## General | |
## Overview |
027e58b
to
9208bcf
Compare
make test |
Description
The implementation of option 1.1 (top-left aligned causal mask with subgraph approach) in rfcs: graph api: support implicit causal mask in SDPA