-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Thanks for participating in the TVM community! We use https://discuss.tvm.ai for any general usage questions and discussions. The issue tracker is used for actionable items such as feature proposals discussion, roadmaps, and bug tracking. You are always welcomed to post on the forum first 😸
Issues that are inactive for a period of time may get closed. We adopt this policy so that we won't lose track of actionable issues that may fall at the bottom of the pile. Feel free to reopen a new one if you feel there is an additional problem that needs attention when an old one gets closed.
Expected behavior
What you were expecting
-> The tutorial code e2e_opt_model.py should work.
Actual behavior
What actually happened
->
When TaskScheduler picks Task 3: "fused_conv2d9_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_relu4"
File "/home/ysy/Documents/open_source/tvm/source/src/tir/transforms/inject_software_pipeline.cc", line 1143, in tvm::tir::software_pipeline::PipelineInjector::VisitStmt_(tvm::tir::ForNode const*)
InternalError: Check failed: pipeline_stages.size() == original_order.size() (3 vs. 4) : PrimFunc "main" has original order ["", "", "", ""], but pipeline annotation is [0, 0, 3] with different size
Environment
Any environment details, such as: Operating System, TVM version, etc
->
- Ubuntu 22.04, Intel i7 13650hx, RTX 4060
- commit: 2d964b4 (0.21.dev0)
Steps to reproduce
Preferably a minimal script to cause the issue to occur.
-> Execute e2e_opt_model.py
🌟My analysis
Error point
The error occurs at inject_software_pipeline.cc:1133 during the post process VerifyGPUCode
auto pipeline_stages =
Downcast<Array<Integer>>(op->annotations.at(attr::software_pipeline_stage));
CHECK_EQ(pipeline_stages.size(), original_order.size())As indicated by the error message, pipeline_stages.size() is 3 whereas original_order.size() is 4
There are 4 blocks, while the annotation software_pipeline_stage has 3 elements.
Why?
Before VerifyGPUCode, RewriteReduction is executed, which decomposes a reduction block conv2d_nchw into conv2d_nchw_init block and conv2d_nchw_update block, thereby adding a new block.
This increases original_order.size() from 3 to 4. However, the annotation pipeline_stages is not updated according to the added block.
This appears to cause the bug.
Potential solution
In my opinion, CHECK_EQ just validates the normal state, checking if each block can be mapped to a pipeline stage, and the problem actually lies with RewriteReduction.
RewriteReduction should update the annotation sizes(of the pipeline stages) after adding the block, shouldn't it?
I tried to make this modification, but I struggled due to the complexity of the optimization algorithm.
Is there an expert who could take this on? I'd appreciate your expertise.
Triage
Please refer to the list of label tags here to find the relevant tags and add them below in a bullet format (example below).
- tune:meta_schedule