Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Enable USE_DNNL cause Sphix crash when processing from_oneflow.py and from_paddle.py for documents #12020

Open
huajsj opened this issue Jul 6, 2022 · 7 comments

Comments

@huajsj
Copy link
Contributor

huajsj commented Jul 6, 2022

Expected behavior

What you were expecting
Set USE_DNNL ON or OFF should not impact document processing

Actual behavior

What actually happened

Set USE_DNNL ON then run "docker/bash.sh --env CI --env TVM_SHARD_INDEX --env TVM_NUM_SHARDS --env RUN_DISPLAY_URL --env PLATFORM tlcpack/ci-gpu:20220630-060117-558ba99c7 ./tests/scripts/task_python_docs.sh"
will see the crash happen in procesing from_oneflow.py and from_paddle.py.

Environment

Any environment details, such as: Operating System, TVM version, etc
tlcpack/ci-gpu:20220630
tlcpack/ci-gpu:20220619

Steps to reproduce

Preferably a minimal script to cause the issue to occur.

mkdir ./build
cp ./cmake/config.cmake ./build/
echo set(USE_DNNL ON) >> ./build/config.cmake
docker/bash.sh -it --env CI --env TVM_SHARD_INDEX --env TVM_NUM_SHARDS --env RUN_DISPLAY_URL --env PLATFORM tlcpack/ci-gpu:20220630-060117-558ba99c7
cd build
cmake ../
make
cd ../
../tests/scripts/task_python_docs.sh"

Debug information

docker/bash.sh -it --env CI --env TVM_SHARD_INDEX --env TVM_NUM_SHARDS --env RUN_DISPLAY_URL --env PLATFORM tlcpack/ci-gpu:20220630-060117-558ba99c7
cd _staging
gdb python3
set args -m sphinx -b html -d /workspace/docs/_build/doctrees   . /workspace/docs/_build/html
r

can saw the crash happen in 'dlopen' for "from_oneflow.py "
after set USE_DNNL OFF , and rebuild issue go away

@huajsj
Copy link
Contributor Author

huajsj commented Jul 6, 2022

@driazati

@areusch
Copy link
Contributor

areusch commented Jul 11, 2022

@huajsj just curious why you needed USE_DNNL ON in your tutorial? is that closer to the use case for pipeline executor, or is it possible to demonstrate pipeline executor with just two llvm graphs?

@huajsj
Copy link
Contributor Author

huajsj commented Jul 13, 2022

@areusch, thanks for the follow up, yes BYOC should be the use case for pipeline executor which target to bring different backend/hardware together to do a heterogenous parallel execution and get the performance improvement.

besides of dnnl, cutlass is another option of BYOC backend, I am trying to see if i can bring up a cutlass example, if that still not work,definitely I will go to the two LLVM tutorial.

@huajsj
Copy link
Contributor Author

huajsj commented Jul 18, 2022

After using CUTLASS+BYOC in PR 11557, the crash issue gone, now this issue not the blocker of PR11557 anymore.

@billishyahao
Copy link
Contributor

Hi @huajsj , Do you observe the failure before merging the pr #11638 ? Shall we rule out this one?

@areusch
Copy link
Contributor

areusch commented Jul 25, 2022

@huajsj are you able to look at the question above?

@huajsj
Copy link
Contributor Author

huajsj commented Jul 28, 2022

@areusch @billishyahao , thanks for the follow up, I tried before PR #11638 , but still saw
the issue.

@areusch areusch added the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 19, 2022
@driazati driazati added type: doc frontend:oneflow python/tvm/relay/frontend/oneflow.py frontend: paddlepaddle and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants