Skip to content

[Bug][meta_schedule] Tutorial e2e_opt_model.py fails #18018

@vacu9708

Description

@vacu9708

Thanks for participating in the TVM community! We use https://discuss.tvm.ai for any general usage questions and discussions. The issue tracker is used for actionable items such as feature proposals discussion, roadmaps, and bug tracking. You are always welcomed to post on the forum first 😸

Issues that are inactive for a period of time may get closed. We adopt this policy so that we won't lose track of actionable issues that may fall at the bottom of the pile. Feel free to reopen a new one if you feel there is an additional problem that needs attention when an old one gets closed.

Expected behavior

What you were expecting
-> The tutorial code e2e_opt_model.py should work.

Actual behavior

What actually happened
->
When TaskScheduler picks Task 3: "fused_conv2d9_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_relu4"

  File "/home/ysy/Documents/open_source/tvm/source/src/tir/transforms/inject_software_pipeline.cc", line 1143, in tvm::tir::software_pipeline::PipelineInjector::VisitStmt_(tvm::tir::ForNode const*)
InternalError: Check failed: pipeline_stages.size() == original_order.size() (3 vs. 4) : PrimFunc "main" has original order ["", "", "", ""], but pipeline annotation is [0, 0, 3] with different size

Environment

Any environment details, such as: Operating System, TVM version, etc
->

  • Ubuntu 22.04, Intel i7 13650hx, RTX 4060
  • commit: 2d964b4 (0.21.dev0)

Steps to reproduce

Preferably a minimal script to cause the issue to occur.
-> Execute e2e_opt_model.py

🌟My analysis

Error point
The error occurs at inject_software_pipeline.cc:1133 during the post process VerifyGPUCode

auto pipeline_stages =
        Downcast<Array<Integer>>(op->annotations.at(attr::software_pipeline_stage));
CHECK_EQ(pipeline_stages.size(), original_order.size())

As indicated by the error message, pipeline_stages.size() is 3 whereas original_order.size() is 4
There are 4 blocks, while the annotation software_pipeline_stage has 3 elements.

Why?
Before VerifyGPUCode, RewriteReduction is executed, which decomposes a reduction block conv2d_nchw into conv2d_nchw_init block and conv2d_nchw_update block, thereby adding a new block.
This increases original_order.size() from 3 to 4. However, the annotation pipeline_stages is not updated according to the added block.
This appears to cause the bug.

Potential solution
In my opinion, CHECK_EQ just validates the normal state, checking if each block can be mapped to a pipeline stage, and the problem actually lies with RewriteReduction.
RewriteReduction should update the annotation sizes(of the pipeline stages) after adding the block, shouldn't it?
I tried to make this modification, but I struggled due to the complexity of the optimization algorithm.
Is there an expert who could take this on? I'd appreciate your expertise.

Triage

Please refer to the list of label tags here to find the relevant tags and add them below in a bullet format (example below).

  • tune:meta_schedule

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triagePRs or issues that need to be investigated by maintainers to find the right assignees to address ittype: bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions