[Bug][meta_schedule] Tutorial `e2e_opt_model.py` fails

Thanks for participating in the TVM community! We use https://discuss.tvm.ai for any general usage questions and discussions. The issue tracker is used for actionable items such as feature proposals discussion, roadmaps, and bug tracking.  You are always welcomed to post on the forum first :smile_cat:

Issues that are inactive for a period of time may get closed. We adopt this policy so that we won't lose track of actionable issues that may fall at the bottom of the pile. Feel free to reopen a new one if you feel there is an additional problem that needs attention when an old one gets closed.

### Expected behavior

What you were expecting
-> The tutorial code `e2e_opt_model.py` should work.

### Actual behavior

What actually happened
->
When TaskScheduler picks Task 3: "fused_conv2d9_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_relu4"
```
  File "/home/ysy/Documents/open_source/tvm/source/src/tir/transforms/inject_software_pipeline.cc", line 1143, in tvm::tir::software_pipeline::PipelineInjector::VisitStmt_(tvm::tir::ForNode const*)
InternalError: Check failed: pipeline_stages.size() == original_order.size() (3 vs. 4) : PrimFunc "main" has original order ["", "", "", ""], but pipeline annotation is [0, 0, 3] with different size
```

### Environment

Any environment details, such as: Operating System, TVM version, etc
->
- Ubuntu 22.04, Intel i7 13650hx, RTX 4060
- **commit:** 2d964b4133aac2f92e4185b3f095df4eb3bf3a90 (0.21.dev0)

### Steps to reproduce

Preferably a minimal script to cause the issue to occur.
-> Execute `e2e_opt_model.py`

### 🌟My analysis
**Error point**
The error occurs at `inject_software_pipeline.cc:1133` during the post process `VerifyGPUCode`
```cpp
auto pipeline_stages =
        Downcast<Array<Integer>>(op->annotations.at(attr::software_pipeline_stage));
CHECK_EQ(pipeline_stages.size(), original_order.size())
```
As indicated by the error message, pipeline_stages.size() is 3 whereas original_order.size() is 4
There are 4 blocks, while the annotation `software_pipeline_stage` has 3 elements.

**Why?**
Before `VerifyGPUCode`, `RewriteReduction` is executed, which decomposes a reduction block `conv2d_nchw` into `conv2d_nchw_init` block and `conv2d_nchw_update` block, thereby adding a new block.
This increases original_order.size() from 3 to 4. However, the annotation `pipeline_stages` is not updated according to the added block.
This appears to cause the bug.

**Potential solution**
In my opinion, `CHECK_EQ` just validates the normal state, checking if each block can be mapped to a pipeline stage, and the problem actually lies with `RewriteReduction`.
`RewriteReduction` should update the annotation sizes(of the pipeline stages) after adding the block, shouldn't it?
I tried to make this modification, but I struggled due to the complexity of the optimization algorithm.
Is there an expert who could take this on? I'd appreciate your expertise.

### Triage

Please refer to the list of label tags [here](https://github.com/apache/tvm/wiki/Issue-Triage-Labels) to find the relevant tags and add them below in a bullet format (example below).

* tune:meta_schedule


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug][meta_schedule] Tutorial `e2e_opt_model.py` fails #18018

Expected behavior

Actual behavior

Environment

Steps to reproduce

🌟My analysis

Triage

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug][meta_schedule] Tutorial e2e_opt_model.py fails #18018

Description

Expected behavior

Actual behavior

Environment

Steps to reproduce

🌟My analysis

Triage

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug][meta_schedule] Tutorial `e2e_opt_model.py` fails #18018