Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Task initialization failed for one subgraph in resnet50 #13425

Open
shingjan opened this issue Nov 18, 2022 · 7 comments · Fixed by #13441
Open

[Bug] Task initialization failed for one subgraph in resnet50 #13425

shingjan opened this issue Nov 18, 2022 · 7 comments · Fixed by #13441
Assignees
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug

Comments

@shingjan
Copy link

Actual behavior

tvm.tir.schedule.schedule.ScheduleError: Traceback (most recent call last):
  10: TVMFuncCall
  9: _ZN3tvm7runtime13PackedFuncObj
  8: tvm::runtime::TypedPackedFunc<void (tvm::meta_schedule::TaskScheduler, tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>)>::AssignTypedLambda<tvm::runtime::Registry::set_body_method<tvm::meta_schedule::TaskScheduler, tvm::meta_schedule::TaskSchedulerNode, void, tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>, void>(void (tvm::meta_schedule::TaskSchedulerNode::*)(tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>))::{lambda(tvm::meta_schedule::TaskScheduler, tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>)#1}>(tvm::runtime::Registry::set_body_method<tvm::meta_schedule::TaskScheduler, tvm::meta_schedule::TaskSchedulerNode, void, tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>, void>(void (tvm::meta_schedule::TaskSchedulerNode::*)(tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>))::{lambda(tvm::meta_schedule::TaskScheduler, tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
  7: tvm::meta_schedule::GradientBasedNode::Tune(tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>)
  6: tvm::meta_schedule::TaskSchedulerNode::Tune(tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optional<tvm::meta_schedule::Database>, tvm::runtime::Optional<tvm::meta_schedule::CostModel>)
  5: tvm::meta_schedule::PostOrderApplyNode::GenerateDesignSpace(tvm::IRModule const&)
  4: tvm::meta_schedule::MultiLevelTilingTensorCoreNode::Apply(tvm::tir::Schedule const&, tvm::tir::BlockRV const&)
  3: tvm::meta_schedule::MultiLevelTilingTensorCoreNode::ApplySubRules(std::vector<tvm::meta_schedule::State, std::allocator<tvm::meta_schedule::State> >)
  2: tvm::meta_schedule::MultiLevelTilingNode::AddWriteReuse(tvm::meta_schedule::State) const
  1: tvm::tir::TracedScheduleNode::ReverseComputeAt(tvm::tir::BlockRV const&, tvm::tir::LoopRV const&, bool, int)
  0: tvm::tir::ConcreteScheduleNode::ReverseComputeAt(tvm::tir::BlockRV const&, tvm::tir::LoopRV const&, bool, int) [clone .cold]
ScheduleError: An error occurred in the schedule primitive 'reverse-compute-at'.

Environment

TVM git hash b4d4b82

Steps to reproduce

Please refer to this gist here

Triage

  • needs-triage

cc: @zxybazh

@shingjan shingjan added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug labels Nov 18, 2022
@masahi
Copy link
Member

masahi commented Nov 18, 2022

Your repro didn't get an error for me. And it uses "llvm" target while the error log is coming from MultiLevelTilingTensorCoreNode

@masahi masahi closed this as completed Nov 18, 2022
@masahi masahi reopened this Nov 18, 2022
@shingjan
Copy link
Author

@masahi The LLVM target is used for a seg fault on git hash b4d4b82. The error is from running this repro with target("nvidia/nvidia-t4"). I am not sure where does the seg fault comes from though. I have edited the repro to reflect the setting I had while tuning resnet50.

@zxybazh
Copy link
Member

zxybazh commented Nov 19, 2022

Thanks for reporting this issue. I found this segfault when using llvm -num-cores=12 as target is caused by check vnni function introduced in #13383 where this PR didn't check if the pointer to PackedFunc itself is nullptr. To workaround that you can add a check for this pointer.

@shingjan
Copy link
Author

@zxybazh PR is in for the patch #13441. Will update if there is more details on the multi-level tiling failure.

@junrushao
Copy link
Member

I'm confused by the discussion here:

@zxybazh
Copy link
Member

zxybazh commented Nov 19, 2022

Sorry for the confusion, to clarify, there're 2 issues with different targets here:

  1. The vnni PR introduced a segfault when the vnni detect function is not imported on python side. This is caused when using llvm kind of target in the script shared here. related PR [MetaSchedule] Add from-target Defaults for x86 VNNI Targets #13383 [Meta Schedule] Patch ICHECK for target_has_vnni to avoid seg fault #13441
  2. There's another issue triggered when using cuda kind of target with this same script. Seems like an illegal reverse-compute-at. I've talked to @shingjan and he would provide some more context on which TIR & reverse-compute-at decision caused this issue as a follow up later.

@zxybazh
Copy link
Member

zxybazh commented Nov 22, 2022

Keeping this open for the cuda issue.

@zxybazh zxybazh reopened this Nov 22, 2022
@vinx13 vinx13 self-assigned this Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants