[ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores #4747

FrozenGene · 2020-01-20T08:49:52Z

Follow up #4344 . Previous PR solves OpenCV + TVM slow performance and AutoTVM Python CPU affinity problem. However, previous PR has two problem:

doesn't solve ARM BIG.LITTLE heterogeneous multicores.
For example, we have 2xA72 + 4xA53, previous pr will add all cpu to CPU SET even we call config_thread_pool to restrict TVM runs on 4 little cores. So, TVM_MASTER_THREAD could run A72 big core. This is not we want.
Solution: we should restrict master in the little cpus or big cpus according to users's setting.
Not all Linux variant OS has pthread_atfork. For example, Alibaba's AliOS doesn't have this api.
Solution: unified android and linux's way. We will call SetFullCpuAffinity if we don't have TVM_BIND_MASTER_THREAD.

@tqchen @yidawang @vinx13 @ajtulloch Could you help to review it? Thanks.

u99127 · 2020-01-20T09:46:38Z

Follow up #4344 . Previous PR solves OpenCV + TVM slow performance and AutoTVM Python CPU affinity problem. However, previous PR has two problem:

* doesn't solve ARM  BIG.LITTLE heterogeneous multicores.
  For example, we have 2xA72 + 4xA53, previous pr will add all cpu to CPU SET even we call `config_thread_pool` to restrict TVM runs on 4 little cores. So, TVM_MASTER_THREAD could run A72 big core. This is not we want.
  Solution: we should restrict master in the little cpus or big cpus according to users's setting.

This approach needs to be added as comments in the code explaining how this is expected to work. Further I suspect it is TVM_BIND_MASTER_THREAD that you refer to above ? What is the role of the TVM master thread ? Is it another thread that does compute or is it a "controlling" thread ?

* Not all Linux variant OS has `pthread_atfork`. For example, Alibaba's AliOS doesn't have this api.

pthread_atfork IIRC originally comes from the posix group from Issue 5 as documented here. Checking the other libcs suggest that all other mainstream Linux runtimes like glibc, musl and bionic support this and thus are we working around a limitation of AliOS here ?

  Solution: unified android and linux's way. We will call `SetFullCpuAffinity` if we don't have `TVM_BIND_MASTER_THREAD`.
@tqchen @yidawang @vinx13 @ajtulloch Could you help to review it? Thanks.

FrozenGene · 2020-01-20T10:28:04Z

This approach needs to be added as comments in the code explaining how this is expected to work. Further I suspect it is TVM_BIND_MASTER_THREAD that you refer to above ? What is the role of the TVM master thread ? Is it another thread that does compute or is it a "controlling" thread ?

TVM_BIND_MASTER_THREAD is related with exclude_worker0 , the default value of exclude_worker0 is false. https://github.com/apache/incubator-tvm/blob/master/include/tvm/runtime/threading_backend.h#L50. As the comment said, if it is false, we will make main thread compute worker0 and worker0 will not be launched in a new thread. Normally main thread doesn't have much to do. If TVM_BIND_MASTER_THREAD env var is set, we will bind the main thread be cpu core of sorted_order_[0] or sorted_order_[-1] (if we are in the little mode)

So previous solution is to let us not bind the master thread to be one specific core so that our master thread could run any cores. But previous solution omit one situation that ARM CPU has BIG.LITTLE, we can not add all cpus to CPU SET. It will result in main thread could run big core (even users specify tvm run little cores). Also could run little cores (even users specify tvm run big cores).

This PR solves this by restrict tvm master thread's cpu set. That is to say: if we are in the little mode, we should restrict master thread run only little cores. if we are in the big mode, we should restrict master thread run only big cores. Note: This could work well on x86 cpu too. Because x86 cpu doesn't have BIG.Little. our code implementation will use big_count_ to calculate the cpu cores by default. So master thread could run any cores.

I will add comment to explain a bit in code.

pthread_atfork IIRC originally comes from the posix group from Issue 5 as documented here. Checking the other libcs suggest that all other mainstream Linux runtimes like glibc, musl and bionic support this and thus are we working around a limitation of AliOS here ?

AliOS like some versions of Android, doesn't provide pthread_atfork. However, the key to solve this problem is doesn't restrict master thread bind to one specific core. So PR's solution could work fine on Linux / Android too. pthread_atfork is not a key or must.

tqchen · 2020-01-20T17:24:37Z

Overall I like the idea of binding master thread to the set of cores needed rather than a single one, whichi means free migration of master thread, would be great if we can document this idea in the code, let others to check.

Given that pthread_atfork is indeed available in most cases, we should put that as a separate item(perhaps using macro to detect the existence) if we still decides to use it.

For example, we can keep the existing behavior of binding master thread on linux, but use the new behavior on android

FrozenGene · 2020-01-20T17:49:05Z

Overall I like the idea of binding master thread to the set of cores needed rather than a single one, whichi means free migration of master thread, would be great if we can document this idea in the code, let others to check.

Newest code has provided one detail example to show which cores the master thread could run in the comment. Could check it and to see whether to have anything should supply.

Given that pthread_atfork is indeed available in most cases, we should put that as a separate item(perhaps using macro to detect the existence) if we still decides to use it.
For example, we can keep the existing behavior of binding master thread on linux, but use the new behavior on android

pthread_atfork is not a must for solving this problem in fact. The key is we should let master thread could do free migration on available cores. Previous pr wants to do but provide cpu cores set wrong as described. Existing behavior of linux is still has bug if we deploy it on one arm (has big little cpus) linux embed board. So we should resolve linux / android together in my opinion. :-)

src/runtime/threading_backend.cc

yidawang · 2020-01-24T19:21:11Z

FYI, we just notice that if we don't bind the master thread, which is the default behavior, the master thread may cause the contention issue as it is allocated to a seemingly idle logical thread under hyperthreading. So I would propose to at least have the master thread bound to the intended running cores by default. @anijain2305

FrozenGene · 2020-01-25T03:58:28Z

To clarify your meaning, as hyper thread will make the big_count_ is twice times compared with physical cores. For example, phisical cores are 4, hyper thread will make the concurrency number be 8 (big_count_ is 8 too). So for hyper thread, we wish master thread could run the physical cores (id: 0-3), not cpu id 4-7. 

…

------------------ Original ------------------ From: Yida Wang <notifications@github.com> Date: Sat,Jan 25,2020 3:21 AM To: apache/incubator-tvm <incubator-tvm@noreply.github.com> Cc: Zhao Wu <zhaowu@apache.org>, Author <author@noreply.github.com> Subject: Re: [apache/incubator-tvm] [ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores (#4747) FYI, we just notice that if we don't bind the master thread, which is the default behavior, the master thread may cause the contention issue as it is allocated to a seemingly idle logical thread under hyperthreading. So I would propose to at least have the master thread bound to the intended running cores by default. @anijain2305 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

yidawang · 2020-01-25T08:43:49Z

To clarify your meaning, as hyper thread will make the big_count_ is twice times compared with physical cores. For example, phisical cores are 4, hyper thread will make the concurrency number be 8 (big_count_ is 8 too). So for hyper thread, we wish master thread could run the physical cores (id: 0-3), not cpu id 4-7.
…
------------------ Original ------------------ From: Yida Wang <notifications@github.com> Date: Sat,Jan 25,2020 3:21 AM To: apache/incubator-tvm <incubator-tvm@noreply.github.com> Cc: Zhao Wu <zhaowu@apache.org>, Author <author@noreply.github.com> Subject: Re: [apache/incubator-tvm] [ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores (#4747) FYI, we just notice that if we don't bind the master thread, which is the default behavior, the master thread may cause the contention issue as it is allocated to a seemingly idle logical thread under hyperthreading. So I would propose to at least have the master thread bound to the intended running cores by default. @anijain2305 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Yes

FrozenGene · 2020-01-25T08:50:57Z

Thanks for clarification. I will do it together in next code update.------------------ Original ------------------From: Yida Wang <notifications@github.com>Date: Sat,Jan 25,2020 4:44 PMTo: apache/incubator-tvm <incubator-tvm@noreply.github.com>Cc: Zhao Wu <zhaowu@apache.org>, Author <author@noreply.github.com>Subject: Re: [apache/incubator-tvm] [ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores (#4747) To clarify your meaning, as hyper thread will make the big_count_ is twice times compared with physical cores. For example, phisical cores are 4, hyper thread will make the concurrency number be 8 (big_count_ is 8 too). So for hyper thread, we wish master thread could run the physical cores (id: 0-3), not cpu id 4-7. … ------------------ Original ------------------ From: Yida Wang <notifications@github.com> Date: Sat,Jan 25,2020 3:21 AM To: apache/incubator-tvm <incubator-tvm@noreply.github.com> Cc: Zhao Wu <zhaowu@apache.org>, Author <author@noreply.github.com> Subject: Re: [apache/incubator-tvm] [ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores (#4747) FYI, we just notice that if we don't bind the master thread, which is the default behavior, the master thread may cause the contention issue as it is allocated to a seemingly idle logical thread under hyperthreading. So I would propose to at least have the master thread bound to the intended running cores by default. @anijain2305 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Yes —You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or unsubscribe.

FrozenGene · 2020-02-03T02:54:20Z

@yidawang Would you like to review it again? Thanks.

yidawang

Some minor naming/explanatory issues. Other than that, I am good to sign off.

src/runtime/threading_backend.cc

yidawang

LGTM. In addition, it would be better if you don't squash locally to leave only one commit every time in the PR. Having multiple commits would be helpful for reviewers to track your incremental change. Thanks!

tqchen · 2020-02-03T17:53:27Z

Thanks @FrozenGene @yidawang

FrozenGene force-pushed the thread_fix branch from 8d698a2 to 7e70988 Compare January 20, 2020 08:50

FrozenGene force-pushed the thread_fix branch 2 times, most recently from d8bbfb8 to 3cee66e Compare January 20, 2020 11:22

tqchen added the status: need review label Jan 20, 2020

yidawang suggested changes Jan 20, 2020

View reviewed changes

src/runtime/threading_backend.cc Outdated Show resolved Hide resolved

src/runtime/threading_backend.cc Outdated Show resolved Hide resolved

FrozenGene force-pushed the thread_fix branch 2 times, most recently from 3249215 to f2e54c6 Compare February 3, 2020 02:50

yidawang suggested changes Feb 3, 2020

View reviewed changes

src/runtime/threading_backend.cc Show resolved Hide resolved

src/runtime/threading_backend.cc Outdated Show resolved Hide resolved

FrozenGene force-pushed the thread_fix branch 2 times, most recently from d2368f9 to 38a4b53 Compare February 3, 2020 08:04

[ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores

1dec8a1

FrozenGene force-pushed the thread_fix branch from 38a4b53 to 1dec8a1 Compare February 3, 2020 08:05

yidawang approved these changes Feb 3, 2020

View reviewed changes

tqchen approved these changes Feb 3, 2020

View reviewed changes

tqchen merged commit bb1b7db into apache:master Feb 3, 2020

tqchen added status: accepted and removed status: need review labels Feb 3, 2020

alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 26, 2020

[ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores (apache#4747)

cad94d1

alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 28, 2020

[ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores (apache#4747)

eee9f26

zhiics pushed a commit to neo-ai/tvm that referenced this pull request Mar 2, 2020

[ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores (apache#4747)

349e5ec

ZihengJiang mentioned this pull request Sep 17, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores #4747

[ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores #4747

FrozenGene commented Jan 20, 2020

u99127 commented Jan 20, 2020

FrozenGene commented Jan 20, 2020 •

edited

Loading

tqchen commented Jan 20, 2020 •

edited

Loading

FrozenGene commented Jan 20, 2020

yidawang commented Jan 24, 2020

FrozenGene commented Jan 25, 2020 via email

yidawang commented Jan 25, 2020

FrozenGene commented Jan 25, 2020 via email

FrozenGene commented Feb 3, 2020

yidawang left a comment

yidawang left a comment

tqchen commented Feb 3, 2020

[ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores #4747

[ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores #4747

Conversation

FrozenGene commented Jan 20, 2020

u99127 commented Jan 20, 2020

FrozenGene commented Jan 20, 2020 • edited Loading

tqchen commented Jan 20, 2020 • edited Loading

FrozenGene commented Jan 20, 2020

yidawang commented Jan 24, 2020

FrozenGene commented Jan 25, 2020 via email

yidawang commented Jan 25, 2020

FrozenGene commented Jan 25, 2020 via email

FrozenGene commented Feb 3, 2020

yidawang left a comment

Choose a reason for hiding this comment

yidawang left a comment

Choose a reason for hiding this comment

tqchen commented Feb 3, 2020

FrozenGene commented Jan 20, 2020 •

edited

Loading

tqchen commented Jan 20, 2020 •

edited

Loading