Skip to content

[PASS] Layout transform pass #233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 10, 2017
Merged

[PASS] Layout transform pass #233

merged 4 commits into from
Jul 10, 2017

Conversation

ZihengJiang
Copy link
Contributor

No description provided.

return false;
}

inline LayoutInfo GetLayout(const nnvm::OpMap<FTVMLayoutInfo>& layouts,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a function called CombineLayout? vector - > vector

LayoutInfo olayout = GetLayout(olayouts, in, e.index);
LayoutInfo ilayout = GetLayout(ilayouts, n, idx);
if (IsPair(olayout, ilayout)) {
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about other inputs that might need layout change? Is break right way ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, should be continue

if (olayouts.count(e.node->op())) {
LayoutInfo layout = GetLayout(olayouts, e.node, e.index);
nnvm::NodePtr tnode =
CreateLayoutTransformNode(layout.src, layout.dst);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always assert output transform back to NCHW

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic does not take benefit from transform cache. Maybe consider change the logic to, always eagerly create the output transform node(from its producer) and the consumer choose whether to consume it

tnode->inputs.emplace_back(e);
transformed.emplace(e, nnvm::NodeEntry{tnode, 0, 0});
}
new_node->inputs[idx] = transformed.at(e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do an if else logic might be faster and more clear here

CreateLayoutTransformNode(layout.src, layout.dst);
tnode->inputs.emplace_back(nnvm::NodeEntry{new_node, i, 0});
transformed.emplace(
nnvm::NodeEntry{n, i, 0}, nnvm::NodeEntry{tnode, 0, 0});
Copy link
Contributor Author

@ZihengJiang ZihengJiang Jul 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one problem is the version field, if we create transformed item eagerly, we have to assume it as 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to cache NodeEntry, only Node is needed, so you can still copy the node, and copy the entry index over

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One node has multiple output layouts, so one node is corresponding with multiple transform node. So I think it should be a mapping from NodeEntry to NodeEntry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nerver mind, I got it

@ZihengJiang ZihengJiang merged commit 3212186 into apache:master Jul 10, 2017
@ZihengJiang ZihengJiang deleted the layout branch July 10, 2017 02:31
vinx13 pushed a commit to vinx13/tvm that referenced this pull request Mar 9, 2022
* add simplify

* remove simplify in auto complete
apivovarov added a commit to apivovarov/tvm that referenced this pull request Mar 30, 2022
gigiblender pushed a commit to gigiblender/tvm that referenced this pull request Oct 3, 2022
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.
junrushao pushed a commit to junrushao/tvm that referenced this pull request Oct 18, 2022
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.
MasterJH5574 pushed a commit to MasterJH5574/tvm that referenced this pull request Nov 20, 2022
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.
junrushao pushed a commit to junrushao/tvm that referenced this pull request Feb 8, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.
yelite pushed a commit to yelite/tvm that referenced this pull request Feb 17, 2023
It may be useful for some passes to collapse chains of definitions, particularly after other compiler transformations that may reduce or simplify some expressions.

This pass will take chains of definitions and replace references to later definitions to the original one. It works by checking `LookupBinding` for each var use-site and replacing the var with its definition if the definition was another var. (Note: This required updating `BlockBuilder` to also update its binding map for `MatchShape` nodes; that was arguably a bug.) Additionally, `MatchShape` bindings where the `LHS` and the `RHS` are guaranteed to match at compile time are canonicalized into ordinary `VarBinding`s.
tqchen pushed a commit to tqchen/tvm that referenced this pull request May 29, 2023
)

Current codegen output `(half4)*(device uint*)A` tries to create a
`int32` number and then cast it to `half4`, which is not the expected
behavior.

As Metal supports `uchar4` and `char4` types, we can direct use them to
solve that problem.

(cherry picked from commit 6198c7f)
LeiWang1999 added a commit to LeiWang1999/tvm that referenced this pull request Nov 8, 2024
* Merge TL Update

* submodule update

* Re-implement macro with sub function.

* lint fix

* Refactor tensor core memory allocation in MatmulFineGrainScheduler

- Adjusted the local fragment sizes for tensor core memory allocation in the MatmulFineGrainScheduler class.
- Updated the allocation sizes for A_local, B_local, and C_local variables based on the new fragment sizes.
- The changes ensure efficient memory utilization and improve performance.

Refactor tensor core memory allocation in MatmulDequantizeFineGrainedScheduler

- Modified the fragment sizes for tensor core memory allocation in the MatmulDequantizeFineGrainedScheduler class.
- Updated the allocation sizes for A_frag, B_frag, and C_frag variables based on the new fragment sizes.
- The changes optimize memory usage and enhance the efficiency of the dequantization process.

Refactor tensor core memory allocation in MatmulDequantizeWeightPropagationScheduler

- Adjusted the fragment sizes for tensor core memory allocation in the MatmulDequantizeWeightPropagationScheduler class.
- Updated the allocation sizes for A_frag, B_frag, B_dequantize_frag, and C_frag variables based on the new fragment sizes.
- The changes improve memory utilization and optimize the weight propagation process.

* Implement int4 tensorcore

* lint fix

* support uint2->uint4 fast dequantize

* Support int4 tensorcore decoding

* lint fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants