New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Runtime][PipelineExecutor] Tutorial of using pipeline executor. #11557

Merged

masahi merged 39 commits into apache:main from huajsj:pipeline-tutorial

Jul 22, 2022

Contributor

huajsj commented Jun 3, 2022 •

edited

Loading

RFC:https://github.com/apache/tvm-rfcs/blob/main/rfcs/0014-pipeline-executor.md
issue: #8596
Tutorial of using pipeline executor including the byoc use case.

This tutorial need to enable "USE_PIPELINE_EXECUTOR","USE_DNNL_CODEGEN" on config.cmake with MKL-DNN installed, not sure if the "How To Guides" is a better fit.

cc @areusch, @masahi

huajsj mentioned this pull request

[RFC][Tracking Issue] Pipeline Executor For Compute graph pipeline #8596

Closed

15 tasks

github-actions bot requested a review from areusch

June 3, 2022 04:34

areusch reviewed

View reviewed changes

gallery/how_to/work_with_relay/using_with_pipeline_executor.py Outdated

+              # own splitting function logic.
+              import os
+              os.sys.path.append(os.path.abspath(os.environ["TVM_HOME"] + "/tests/python/relay"))

Contributor

areusch Jun 7, 2022

i think unfortunately right now this has to be done with relative paths. you can debug this with tests/scripts/ci.py docs i believe.

Contributor Author

huajsj Jun 9, 2022

@areusch , thanks for the follow up, the path issue get fixed, but seems like the ci box not enabled dnnl or not installed mkldnn?, then the tutorial still can not execute, to handle such issue, I put the BYOC part into a function and comment the function execution to avoid the DNNL execution error issue.

Contributor

areusch Jun 9, 2022

it should have whatever is enabled in ci_gpu, and that's determined partly by Dockerfile.ci_gpu and by tests/scripts/task_config_build_gpu.sh. you could propose a change there if you need something for your tutorial (just add to this PR).

Contributor Author

huajsj Jun 14, 2022

@areusch, the Jekins file used a fixed docker image, the ci still running the tutorial file without apply the change in Dockerfile.ci_gpu.
I can saw the new gpu docker image get uploaded into aws ecr, and can not found it on tlcstaging of docker hub, could I know what is process to request upload the new docker image to fix my issue?

Contributor

areusch Jun 16, 2022 •

edited

Loading

cc @driazati i think we need to set the ecr image repo to be public, or push those images to dockerhub. thoughts?

Member

driazati Jun 16, 2022

We could definitely do that and probably will soon. as a stop gap in the meantime @huajsj you can run the docker build locally and pass it to ci.py:

bash docker/build.sh ci_gpu --tag my_ci_gpu
python tests/scripts/ci.py docs --docker-image my_ci_gpu

Contributor Author

huajsj Jun 16, 2022

@driazati @areusch , thanks for the follow up, I tried the local test, and it work well, about this PR, to make it become green, do you think that is possible to merge PR 11744 first ? which did the dnnl installation change and is the dependency this PR.

huajsj requested a review from areusch

June 9, 2022 02:56

huajsj force-pushed the pipeline-tutorial branch 4 times, most recently from f1187c0 to d10da4e Compare

June 14, 2022 07:40

driazati mentioned this pull request

[ci][docker gpu] Install dnnl in docker GPU. #11744

Merged

huajsj mentioned this pull request

[CI Image] Update ci_gpu #11774

Closed

7 tasks

Contributor

areusch commented Jun 21, 2022

blocked on #11774

huajsj force-pushed the pipeline-tutorial branch 2 times, most recently from 2691730 to 81f2d49 Compare

June 23, 2022 18:21

huajsj force-pushed the pipeline-tutorial branch 2 times, most recently from 0ca864b to 58c7e85 Compare

July 5, 2022 05:53

Contributor Author

huajsj commented Jul 6, 2022

blocked on #12020

huajsj added 14 commits

July 12, 2022 13:27


          [Runtime][PipelineExecutor] Tutorial of using pipeline executor.

8bf383e

Tutorial of using pipeline executor including the byoc use case.


          fix ci issue

6332de0


          document change.

cb49f99


          triger build

226fc58


          fix doc issue

031b3ad


          fix ci issue

d046177


          doc issue

8d01a7f


          fix ci issue

86cfbe4


          fix ci issue.

22788ba


          fix __file__ not found problem.

9a550fb

this is a known issue of sphinx-gallery
sphinx-gallery/sphinx-gallery#211


          fix byoc with dnnl issue

1b53258


          enable dnnl and pipeline executor

7757b1b


          trigger build

15db48a


          trigger build

3b02c9a

huajsj added 4 commits

July 12, 2022 13:27


          enable DNNL without pipeline

6640dd6


          remove dnnl and add cutlass

f5b61fd


          use cutlass with byoc

50a7eb9


          change into cutlass

0b30034

huajsj force-pushed the pipeline-tutorial branch from 63efbad to 0b30034 Compare

July 17, 2022 07:08

huajsj added 3 commits

July 17, 2022 15:03


          fix doc convention issue

873e027


          remove duplicate variable

73656af


          fix plint issue.

e4d8360

Contributor Author

huajsj commented Jul 18, 2022

@areusch @masahi , @driazati , the CI is green now.
The latest changes is that we are using the CUTLASS to replace DNNL in the BYOC use case, and such change fixed the sphix crash issue. now all CI test passed. Please take a look.

Contributor Author

huajsj commented Jul 19, 2022

@masahi, please take a look.

masahi self-assigned this

masahi requested changes

View reviewed changes

gallery/how_to/work_with_relay/using_with_pipeline_executor.py Outdated Show resolved Hide resolved

gallery/how_to/work_with_relay/using_with_pipeline_executor.py Outdated Show resolved Hide resolved

gallery/how_to/work_with_relay/using_with_pipeline_executor.py Outdated Show resolved Hide resolved

gallery/how_to/work_with_relay/using_with_pipeline_executor.py Outdated Show resolved Hide resolved

gallery/how_to/work_with_relay/using_with_pipeline_executor.py Outdated Show resolved Hide resolved

gallery/how_to/work_with_relay/using_with_pipeline_executor.py Outdated Show resolved Hide resolved

gallery/how_to/work_with_relay/using_with_pipeline_executor.py Outdated Show resolved Hide resolved

gallery/how_to/work_with_relay/using_with_pipeline_executor.py Outdated Show resolved Hide resolved

gallery/how_to/work_with_relay/using_with_pipeline_executor.py Outdated Show resolved Hide resolved

gallery/how_to/work_with_relay/using_with_pipeline_executor.py Outdated Show resolved Hide resolved


          address review comments.

cfd2af2

masahi requested changes

View reviewed changes

gallery/how_to/work_with_relay/using_pipeline_executor.py Outdated Show resolved Hide resolved

gallery/how_to/work_with_relay/using_pipeline_executor.py Outdated Show resolved Hide resolved

gallery/how_to/work_with_relay/using_pipeline_executor.py Outdated Show resolved Hide resolved

gallery/how_to/work_with_relay/using_pipeline_executor.py Outdated Show resolved Hide resolved

huajsj added 4 commits

July 20, 2022 23:07


          address review comments

a1fc852


          fix bug.

60c8953


          polish the document

420e951


          fix plint issue

b998f12

masahi requested changes

View reviewed changes

gallery/how_to/work_with_relay/using_pipeline_executor.py Outdated

+              pipe_config[mod0].target = "llvm"
+              pipe_config[mod0].dev = tvm.cpu(0)
+              ###############################################################################
+              # Set the cpu afinity for control flow, for example using cpu 0 for control flow.

Member

masahi Jul 21, 2022

Please clarify what is meant by "control flow", and why we need to do this.

Contributor Author

huajsj Jul 21, 2022

when we run backend with executor for example cutlass, both cpu and gpu would get involved for the execution, cpu part response for preparing data, pre/post processing, transfer data between layer etc, I call this part as control flow.

under multiple backend situation, for example in this tutorial that is LLVM + CUTLASS, the 2 control flow will compete the cpu resource, and cause a lot of thread context switch, or cpu migration. These type resource competing will slow down the performance. by using the affinity setting, we associate a backend to a particular cpu group to avoid the said overhead.

Member

masahi Jul 21, 2022

"control flow" usually means if/else or for loop in TVM or in general. How about "host operations"?

This also doesn't sound like something most users should be concerned about. I suggest removing affinity stuff from the tutorial and set the default affinity inside some runtime function. If you require affinity control by users, please summarize and add what you said above to the tutorial with correct English.

gallery/how_to/work_with_relay/using_pipeline_executor.py Outdated Show resolved Hide resolved

gallery/how_to/work_with_relay/using_pipeline_executor.py Outdated

+              ###########################################
+              # Splitting the network into two subgraphs.
+              # -----------------------------------------
+              # It is an example that the graph splitting function comes from a unit test. User can create  a

Member

masahi Jul 21, 2022

The first sentence is broken and makes no sense..

Contributor Author

huajsj Jul 21, 2022

changed into “This function called 'graph_split' from a unit test is just an example. User can create a customized logic to split the graph.”


          address review comments.

1a930af

huajsj requested a review from masahi

July 21, 2022 14:59

masahi requested changes

View reviewed changes

gallery/how_to/work_with_relay/using_pipeline_executor.py Outdated

+              import inspect
+              import os
+              test_path = os.path.dirname(inspect.getfile(lambda: None))

Member

masahi Jul 21, 2022

I think you can simply use __file__ here instead of inspect. And rename test_path to tutorial_dir.

Contributor Author

huajsj Jul 22, 2022 •

edited

Loading

replace "test_path" with "tutorial_dir",
the reason we use inspect instead of file is because that __file__ not work with sphinx-gallery which is used by tvm doc
huajsj@8d2bfc3

gallery/how_to/work_with_relay/using_pipeline_executor.py Outdated

+              pipe_config[mod1].export_cc = "nvcc"
+              #################################################################################
+              # Set the cpu afinity for control flow, for example using cpu 1 for control flow.
+              pipe_config[mod1].cpu_affinity = "1"

Member

masahi Jul 21, 2022

pipe_config[mod1].cpu_affinity is written twice, here and at L166.

Contributor Author

huajsj Jul 22, 2022

removed.

gallery/how_to/work_with_relay/using_pipeline_executor.py Outdated

+              pipe_config[mod0].target = "llvm"
+              pipe_config[mod0].dev = tvm.cpu(0)
+              ###############################################################################
+              # Set the cpu afinity for control flow, for example using cpu 0 for control flow.

Member

masahi Jul 21, 2022

"control flow" usually means if/else or for loop in TVM or in general. How about "host operations"?

This also doesn't sound like something most users should be concerned about. I suggest removing affinity stuff from the tutorial and set the default affinity inside some runtime function. If you require affinity control by users, please summarize and add what you said above to the tutorial with correct English.

gallery/how_to/work_with_relay/using_pipeline_executor.py Outdated

+              pipe_config[mod1].build_func = cutlass_build
+              pipe_config[mod1].export_cc = "nvcc"
+              #################################################################################
+              # Set the cpu afinity for control flow, for example using cpu 1 for control flow.

Member

masahi Jul 21, 2022

typo: afinity

Contributor Author

huajsj Jul 22, 2022 •

edited

Loading

removed the affinity and use default tvm threadpoll default affinity logic.

gallery/how_to/work_with_relay/using_pipeline_executor.py Outdated

+              pipe_config[mod1].cpu_affinity = "1"
+              pipe_config["input"]["data"].connect(pipe_config[mod0]["input"]["data"])
+              pipe_config[mod0]["output"][0].connect(pipe_config[mod1]["input"]["data_n_0"])
+              pipe_config[mod1]["output"]["0"].connect(pipe_config["output"][0])

Member

masahi Jul 21, 2022

Are these three lines related to affinity control? You should have another ######## before them and explain what they do.

I have to say, this is not a good API. For example, where the names "data" and "data_n_0" come from? What is pipe_config[mod0]["output"][0]? And why you use "0" at L178?

Contributor Author

huajsj Jul 22, 2022

these three line related connect subgraph to build pipeline instead of affinity, added detail explain.

"data" and "data_n_0" coming from subgraphs which is a list of subgraph, by print(subgraph[0]) , print(subgraph[1]) the said "data" and "data_n_0" will shown. if here give a wrong name which not exist , the API will throw a error.

pipe_config[mod0]["output"][0] means "the first output interface" of "mod0", line 178 "0" is typo , fixed.


          address review comments

7449ff7

masahi requested changes

View reviewed changes

Member

masahi left a comment

okay I think this is the last typo fix.

gallery/how_to/work_with_relay/using_pipeline_executor.py Outdated Show resolved Hide resolved

gallery/how_to/work_with_relay/using_pipeline_executor.py Outdated Show resolved Hide resolved

gallery/how_to/work_with_relay/using_pipeline_executor.py Outdated Show resolved Hide resolved


          address review comments

0dcc5bf

masahi approved these changes

View reviewed changes

masahi merged commit ecd3c88 into apache:main

AndrewZhaoLuo mentioned this pull request

TVM v0.10.0.rc0 Release Candidate Notes #12979

Closed

xinetzone pushed a commit to daobook/tvm that referenced this pull request


          [Runtime][PipelineExecutor] Tutorial of using pipeline executor. (apa…

4d73afe

…che#11557)

* [Runtime][PipelineExecutor]  Tutorial of using pipeline executor.

Tutorial of using pipeline executor including the byoc use case.

* fix ci issue

* document change.

* triger build

* fix doc issue

* fix ci issue

* doc issue

* fix ci issue

* fix ci issue.

* fix __file__ not found problem.

this is a known issue of sphinx-gallery
sphinx-gallery/sphinx-gallery#211

* fix byoc with dnnl issue

* enable dnnl and pipeline executor

* trigger build

* trigger build

* fix build issue

* trigger build

* oneflow cause crash, do test with change

* add sphinx skip

* plint

* remove from_oneflow change test.

* remove pipeline executor change for test

* plint

* enable DNNL and pipeline

* disable DNNL

* enable DNNL without pipeline

* remove dnnl and add cutlass

* use cutlass with byoc

* change into cutlass

* fix doc convention issue

* remove duplicate variable

* fix plint issue.

* address review comments.

* address review comments

* fix bug.

* polish the document

* fix plint issue

* address review comments.

* address review comments

* address review comments

mikeseven pushed a commit to mikeseven/tvm that referenced this pull request


          [Runtime][PipelineExecutor] Tutorial of using pipeline executor. (apa…

fe043bf

…che#11557)

* [Runtime][PipelineExecutor]  Tutorial of using pipeline executor.

Tutorial of using pipeline executor including the byoc use case.

* fix ci issue

* document change.

* triger build

* fix doc issue

* fix ci issue

* doc issue

* fix ci issue

* fix ci issue.

* fix __file__ not found problem.

this is a known issue of sphinx-gallery
sphinx-gallery/sphinx-gallery#211

* fix byoc with dnnl issue

* enable dnnl and pipeline executor

* trigger build

* trigger build

* fix build issue

* trigger build

* oneflow cause crash, do test with change

* add sphinx skip

* plint

* remove from_oneflow change test.

* remove pipeline executor change for test

* plint

* enable DNNL and pipeline

* disable DNNL

* enable DNNL without pipeline

* remove dnnl and add cutlass

* use cutlass with byoc

* change into cutlass

* fix doc convention issue

* remove duplicate variable

* fix plint issue.

* address review comments.

* address review comments

* fix bug.

* polish the document

* fix plint issue

* address review comments.

* address review comments

* address review comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet