Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Vulkan][Codegen] Spir-V codegen, correct labels/blocks in WhileNode. #8013

Merged
merged 1 commit into from
May 11, 2021

Conversation

Lunderberg
Copy link
Contributor

Previously, the WhileNode assumes that evaluating the loop condition
will not introduce any additional labels. If this assumption is
violated, such as for a WhileNode whose condition is an if/else
statement, then the OpLoopMerge instruction appears in the wrong
block.

The unittest added exercises this code path, but doesn't yet trigger a
failure. Once spvValidate is enabled for all vulkan codegen, then
this unit test will catch the failure mode.

Previously, the WhileNode assumes that evaluating the loop condition
will not introduce any additional labels.  If this assumption is
violated, such as for a WhileNode whose condition is an if/else
statement, then the OpLoopMerge instruction appears in the wrong
block.

The unittest added exercises this code path, but doesn't yet trigger a
failure.  Once spvValidate is enabled for all vulkan codegen, then
this unit test will catch the failure mode.
@Lunderberg
Copy link
Contributor Author

Potential reviewers: @masahi

@masahi masahi merged commit 7b49744 into apache:main May 11, 2021
@masahi
Copy link
Member

masahi commented May 11, 2021

thanks @Lunderberg

@Lunderberg Lunderberg deleted the vulkan_while_if_labels branch May 11, 2021 16:08
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request May 19, 2021
…apache#8013)

Previously, the WhileNode assumes that evaluating the loop condition
will not introduce any additional labels.  If this assumption is
violated, such as for a WhileNode whose condition is an if/else
statement, then the OpLoopMerge instruction appears in the wrong
block.

The unittest added exercises this code path, but doesn't yet trigger a
failure.  Once spvValidate is enabled for all vulkan codegen, then
this unit test will catch the failure mode.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 17, 2021
…apache#8013)

Previously, the WhileNode assumes that evaluating the loop condition
will not introduce any additional labels.  If this assumption is
violated, such as for a WhileNode whose condition is an if/else
statement, then the OpLoopMerge instruction appears in the wrong
block.

The unittest added exercises this code path, but doesn't yet trigger a
failure.  Once spvValidate is enabled for all vulkan codegen, then
this unit test will catch the failure mode.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jun 17, 2021
…apache#8013)

Previously, the WhileNode assumes that evaluating the loop condition
will not introduce any additional labels.  If this assumption is
violated, such as for a WhileNode whose condition is an if/else
statement, then the OpLoopMerge instruction appears in the wrong
block.

The unittest added exercises this code path, but doesn't yet trigger a
failure.  Once spvValidate is enabled for all vulkan codegen, then
this unit test will catch the failure mode.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Sep 22, 2021
…_pass_infra.py

This tutorial registers a global layout transformation for conv2d for all
targets which is not well-formed. Later uses of conv2d in the tutorials
pick that layout up then assert fail in the conv2d type-relation.

Better would be to register a transform for an entirely fake target, but
that is beyond my current level of expertise.

In general our use of sphinx/sphinx_gallery for running and rendering the
tutorials is highly suspect since there is no inter-example isolation:
 - Examples using tensorflow will gobble up GPU memory and not give it back.
 - Any examples which use any of the (many!) global registration mechanisms
   need to ensure the registrant is safe across all tutorials.
I recall seeing a thread with the sphinx_gallery where they said they'd prefer
not to work on process-level isolation, but it's probably worth pinging again.

While digging into this I noticed we had a slicing cast in AlterOpLayout due
to a derived class of ObjectRef introducing virtuals. I moved the virtuals to
the corresponding Node classes. In this case we got away with it since the
ObjectRef happened to not get copied but we were on very thin ice.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Sep 22, 2021
…_pass_infra.py

This tutorial registers a global layout transformation for conv2d for all
targets which is not well-formed. Later uses of conv2d in the tutorials
pick that layout up then assert fail in the conv2d type-relation.

Better would be to register a transform for an entirely fake target, but
that is beyond my current level of expertise.

In general our use of sphinx/sphinx_gallery for running and rendering the
tutorials is highly suspect since there is no inter-example isolation:
 - Examples using tensorflow will gobble up GPU memory and not give it back.
 - Any examples which use any of the (many!) global registration mechanisms
   need to ensure the registrant is safe across all tutorials.
I recall seeing a thread with the sphinx_gallery where they said they'd prefer
not to work on process-level isolation, but it's probably worth pinging again.

While digging into this I noticed we had a slicing cast in AlterOpLayout due
to a derived class of ObjectRef introducing virtuals. I moved the virtuals to
the corresponding Node classes. In this case we got away with it since the
ObjectRef happened to not get copied but we were on very thin ice.
junrushao pushed a commit that referenced this pull request Sep 23, 2021
…infra.py (#9076)

* BUG #8013: Remove register_alter_op_layout example from dev/use_pass_infra.py

This tutorial registers a global layout transformation for conv2d for all
targets which is not well-formed. Later uses of conv2d in the tutorials
pick that layout up then assert fail in the conv2d type-relation.

Better would be to register a transform for an entirely fake target, but
that is beyond my current level of expertise.

In general our use of sphinx/sphinx_gallery for running and rendering the
tutorials is highly suspect since there is no inter-example isolation:
 - Examples using tensorflow will gobble up GPU memory and not give it back.
 - Any examples which use any of the (many!) global registration mechanisms
   need to ensure the registrant is safe across all tutorials.
I recall seeing a thread with the sphinx_gallery where they said they'd prefer
not to work on process-level isolation, but it's probably worth pinging again.

While digging into this I noticed we had a slicing cast in AlterOpLayout due
to a derived class of ObjectRef introducing virtuals. I moved the virtuals to
the corresponding Node classes. In this case we got away with it since the
ObjectRef happened to not get copied but we were on very thin ice.

* [checkpoint] Woops, forgot there was an extra AlterOpLayout

I should have run locally, there goes 6hrs of CI.
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
…_pass_infra.py (apache#9076)

* BUG apache#8013: Remove register_alter_op_layout example from dev/use_pass_infra.py

This tutorial registers a global layout transformation for conv2d for all
targets which is not well-formed. Later uses of conv2d in the tutorials
pick that layout up then assert fail in the conv2d type-relation.

Better would be to register a transform for an entirely fake target, but
that is beyond my current level of expertise.

In general our use of sphinx/sphinx_gallery for running and rendering the
tutorials is highly suspect since there is no inter-example isolation:
 - Examples using tensorflow will gobble up GPU memory and not give it back.
 - Any examples which use any of the (many!) global registration mechanisms
   need to ensure the registrant is safe across all tutorials.
I recall seeing a thread with the sphinx_gallery where they said they'd prefer
not to work on process-level isolation, but it's probably worth pinging again.

While digging into this I noticed we had a slicing cast in AlterOpLayout due
to a derived class of ObjectRef introducing virtuals. I moved the virtuals to
the corresponding Node classes. In this case we got away with it since the
ObjectRef happened to not get copied but we were on very thin ice.

* [checkpoint] Woops, forgot there was an extra AlterOpLayout

I should have run locally, there goes 6hrs of CI.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…_pass_infra.py (apache#9076)

* BUG apache#8013: Remove register_alter_op_layout example from dev/use_pass_infra.py

This tutorial registers a global layout transformation for conv2d for all
targets which is not well-formed. Later uses of conv2d in the tutorials
pick that layout up then assert fail in the conv2d type-relation.

Better would be to register a transform for an entirely fake target, but
that is beyond my current level of expertise.

In general our use of sphinx/sphinx_gallery for running and rendering the
tutorials is highly suspect since there is no inter-example isolation:
 - Examples using tensorflow will gobble up GPU memory and not give it back.
 - Any examples which use any of the (many!) global registration mechanisms
   need to ensure the registrant is safe across all tutorials.
I recall seeing a thread with the sphinx_gallery where they said they'd prefer
not to work on process-level isolation, but it's probably worth pinging again.

While digging into this I noticed we had a slicing cast in AlterOpLayout due
to a derived class of ObjectRef introducing virtuals. I moved the virtuals to
the corresponding Node classes. In this case we got away with it since the
ObjectRef happened to not get copied but we were on very thin ice.

* [checkpoint] Woops, forgot there was an extra AlterOpLayout

I should have run locally, there goes 6hrs of CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants