[Bug] Task initialization failed for one subgraph in resnet50 #13425

shingjan · 2022-11-18T01:37:55Z

Actual behavior

tvm.tir.schedule.schedule.ScheduleError: Traceback (most recent call last):
  10: TVMFuncCall
  9: _ZN3tvm7runtime13PackedFuncObj
  8: tvm::runtime::TypedPackedFunc<void (tvm::meta_schedule::TaskScheduler, tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>)>::AssignTypedLambda<tvm::runtime::Registry::set_body_method<tvm::meta_schedule::TaskScheduler, tvm::meta_schedule::TaskSchedulerNode, void, tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>, void>(void (tvm::meta_schedule::TaskSchedulerNode::*)(tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>))::{lambda(tvm::meta_schedule::TaskScheduler, tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>)#1}>(tvm::runtime::Registry::set_body_method<tvm::meta_schedule::TaskScheduler, tvm::meta_schedule::TaskSchedulerNode, void, tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>, void>(void (tvm::meta_schedule::TaskSchedulerNode::*)(tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>))::{lambda(tvm::meta_schedule::TaskScheduler, tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
  7: tvm::meta_schedule::GradientBasedNode::Tune(tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>)
  6: tvm::meta_schedule::TaskSchedulerNode::Tune(tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>)
  5: tvm::meta_schedule::PostOrderApplyNode::GenerateDesignSpace(tvm::IRModule const&)
  4: tvm::meta_schedule::MultiLevelTilingTensorCoreNode::Apply(tvm::tir::Schedule const&, tvm::tir::BlockRV const&)
  3: tvm::meta_schedule::MultiLevelTilingTensorCoreNode::ApplySubRules(std::vector<tvm::meta_schedule::State, std::allocator<tvm::meta_schedule::State> >)
  2: tvm::meta_schedule::MultiLevelTilingNode::AddWriteReuse(tvm::meta_schedule::State) const
  1: tvm::tir::TracedScheduleNode::ReverseComputeAt(tvm::tir::BlockRV const&, tvm::tir::LoopRV const&, bool, int)
  0: tvm::tir::ConcreteScheduleNode::ReverseComputeAt(tvm::tir::BlockRV const&, tvm::tir::LoopRV const&, bool, int) [clone .cold]
ScheduleError: An error occurred in the schedule primitive 'reverse-compute-at'.

Environment

TVM git hash b4d4b82

Steps to reproduce

Please refer to this gist here

Triage

needs-triage

cc: @zxybazh

The text was updated successfully, but these errors were encountered:

masahi · 2022-11-18T20:10:13Z

Your repro didn't get an error for me. And it uses "llvm" target while the error log is coming from MultiLevelTilingTensorCoreNode

shingjan · 2022-11-18T20:25:59Z

@masahi The LLVM target is used for a seg fault on git hash b4d4b82. The error is from running this repro with target("nvidia/nvidia-t4"). I am not sure where does the seg fault comes from though. I have edited the repro to reflect the setting I had while tuning resnet50.

zxybazh · 2022-11-19T00:26:01Z

Thanks for reporting this issue. I found this segfault when using llvm -num-cores=12 as target is caused by check vnni function introduced in #13383 where this PR didn't check if the pointer to PackedFunc itself is nullptr. To workaround that you can add a check for this pointer.

shingjan · 2022-11-19T01:49:57Z

@zxybazh PR is in for the patch #13441. Will update if there is more details on the multi-level tiling failure.

junrushao · 2022-11-19T08:35:49Z

I'm confused by the discussion here:

From the error message provided, the issue comes from a schedule primitive; PR [MetaSchedule] Add from-target Defaults for x86 VNNI Targets #13383 instead solves a typo that leads to crash, which is not related to the schedule primitive;
As @shingjan points out, the issue comes with a cuda target, while PR [MetaSchedule] Add from-target Defaults for x86 VNNI Targets #13383 clearly fixes the issue on an llvm target.

zxybazh · 2022-11-19T08:44:50Z

Sorry for the confusion, to clarify, there're 2 issues with different targets here:

The vnni PR introduced a segfault when the vnni detect function is not imported on python side. This is caused when using llvm kind of target in the script shared here. related PR [MetaSchedule] Add from-target Defaults for x86 VNNI Targets #13383 [Meta Schedule] Patch ICHECK for target_has_vnni to avoid seg fault #13441
There's another issue triggered when using cuda kind of target with this same script. Seems like an illegal reverse-compute-at. I've talked to @shingjan and he would provide some more context on which TIR & reverse-compute-at decision caused this issue as a follow up later.

zxybazh · 2022-11-22T02:20:44Z

Keeping this open for the cuda issue.

shingjan added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug labels Nov 18, 2022

masahi closed this as completed Nov 18, 2022

masahi reopened this Nov 18, 2022

shingjan mentioned this issue Nov 19, 2022

[Meta Schedule] Patch ICHECK for target_has_vnni to avoid seg fault #13441

Merged

zxybazh closed this as completed in #13441 Nov 22, 2022

zxybazh reopened this Nov 22, 2022

vinx13 self-assigned this Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Task initialization failed for one subgraph in resnet50 #13425

[Bug] Task initialization failed for one subgraph in resnet50 #13425

shingjan commented Nov 18, 2022

masahi commented Nov 18, 2022

shingjan commented Nov 18, 2022

zxybazh commented Nov 19, 2022 •

edited

Loading

shingjan commented Nov 19, 2022

junrushao commented Nov 19, 2022

zxybazh commented Nov 19, 2022 •

edited

Loading

zxybazh commented Nov 22, 2022

[Bug] Task initialization failed for one subgraph in resnet50 #13425

[Bug] Task initialization failed for one subgraph in resnet50 #13425

Comments

shingjan commented Nov 18, 2022

Actual behavior

Environment

Steps to reproduce

Triage

masahi commented Nov 18, 2022

shingjan commented Nov 18, 2022

zxybazh commented Nov 19, 2022 • edited Loading

shingjan commented Nov 19, 2022

junrushao commented Nov 19, 2022

zxybazh commented Nov 19, 2022 • edited Loading

zxybazh commented Nov 22, 2022

zxybazh commented Nov 19, 2022 •

edited

Loading

zxybazh commented Nov 19, 2022 •

edited

Loading