-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores #4747
Conversation
8d698a2
to
7e70988
Compare
This approach needs to be added as comments in the code explaining how this is expected to work. Further I suspect it is TVM_BIND_MASTER_THREAD that you refer to above ? What is the role of the TVM master thread ? Is it another thread that does compute or is it a "controlling" thread ?
pthread_atfork IIRC originally comes from the posix group from Issue 5 as documented here. Checking the other libcs suggest that all other mainstream Linux runtimes like glibc, musl and bionic support this and thus are we working around a limitation of AliOS here ?
|
TVM_BIND_MASTER_THREAD is related with So previous solution is to let us not bind the master thread to be one specific core so that our master thread could run any cores. But previous solution omit one situation that ARM CPU has BIG.LITTLE, we can not add all cpus to CPU SET. It will result in main thread could run big core (even users specify tvm run little cores). Also could run little cores (even users specify tvm run big cores). This PR solves this by restrict tvm master thread's cpu set. That is to say: if we are in the little mode, we should restrict master thread run only little cores. if we are in the big mode, we should restrict master thread run only big cores. Note: This could work well on x86 cpu too. Because x86 cpu doesn't have BIG.Little. our code implementation will use big_count_ to calculate the cpu cores by default. So master thread could run any cores. I will add comment to explain a bit in code.
AliOS like some versions of Android, doesn't provide pthread_atfork. However, the key to solve this problem is doesn't restrict master thread bind to one specific core. So PR's solution could work fine on Linux / Android too. pthread_atfork is not a key or must. |
d8bbfb8
to
3cee66e
Compare
Overall I like the idea of binding master thread to the set of cores needed rather than a single one, whichi means free migration of master thread, would be great if we can document this idea in the code, let others to check. Given that pthread_atfork is indeed available in most cases, we should put that as a separate item(perhaps using macro to detect the existence) if we still decides to use it. For example, we can keep the existing behavior of binding master thread on linux, but use the new behavior on android |
Newest code has provided one detail example to show which cores the master thread could run in the comment. Could check it and to see whether to have anything should supply.
pthread_atfork is not a must for solving this problem in fact. The key is we should let master thread could do free migration on available cores. Previous pr wants to do but provide cpu cores set wrong as described. Existing behavior of linux is still has bug if we deploy it on one arm (has big little cpus) linux embed board. So we should resolve linux / android together in my opinion. :-) |
FYI, we just notice that if we don't bind the master thread, which is the default behavior, the master thread may cause the contention issue as it is allocated to a seemingly idle logical thread under hyperthreading. So I would propose to at least have the master thread bound to the intended running cores by default. @anijain2305 |
To clarify your meaning, as hyper thread will make the big_count_ is twice times compared with physical cores. For example, phisical cores are 4, hyper thread will make the concurrency number be 8 (big_count_ is 8 too). So for hyper thread, we wish master thread could run the physical cores (id: 0-3), not cpu id 4-7.
…------------------ Original ------------------
From: Yida Wang <notifications@github.com>
Date: Sat,Jan 25,2020 3:21 AM
To: apache/incubator-tvm <incubator-tvm@noreply.github.com>
Cc: Zhao Wu <zhaowu@apache.org>, Author <author@noreply.github.com>
Subject: Re: [apache/incubator-tvm] [ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores (#4747)
FYI, we just notice that if we don't bind the master thread, which is the default behavior, the master thread may cause the contention issue as it is allocated to a seemingly idle logical thread under hyperthreading. So I would propose to at least have the master thread bound to the intended running cores by default. @anijain2305
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Yes |
Thanks for clarification. I will do it together in next code update.------------------ Original ------------------From: Yida Wang <notifications@github.com>Date: Sat,Jan 25,2020 4:44 PMTo: apache/incubator-tvm <incubator-tvm@noreply.github.com>Cc: Zhao Wu <zhaowu@apache.org>, Author <author@noreply.github.com>Subject: Re: [apache/incubator-tvm] [ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores (#4747)
To clarify your meaning, as hyper thread will make the big_count_ is twice times compared with physical cores. For example, phisical cores are 4, hyper thread will make the concurrency number be 8 (big_count_ is 8 too). So for hyper thread, we wish master thread could run the physical cores (id: 0-3), not cpu id 4-7.
…
------------------ Original ------------------ From: Yida Wang <notifications@github.com> Date: Sat,Jan 25,2020 3:21 AM To: apache/incubator-tvm <incubator-tvm@noreply.github.com> Cc: Zhao Wu <zhaowu@apache.org>, Author <author@noreply.github.com> Subject: Re: [apache/incubator-tvm] [ThreadPool] Solve ARM BIG.LITTLE heterogeneous multicores (#4747) FYI, we just notice that if we don't bind the master thread, which is the default behavior, the master thread may cause the contention issue as it is allocated to a seemingly idle logical thread under hyperthreading. So I would propose to at least have the master thread bound to the intended running cores by default. @anijain2305 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Yes
—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or unsubscribe.
|
3249215
to
f2e54c6
Compare
@yidawang Would you like to review it again? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor naming/explanatory issues. Other than that, I am good to sign off.
d2368f9
to
38a4b53
Compare
38a4b53
to
1dec8a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. In addition, it would be better if you don't squash locally to leave only one commit every time in the PR. Having multiple commits would be helpful for reviewers to track your incremental change. Thanks!
Thanks @FrozenGene @yidawang |
Follow up #4344 . Previous PR solves OpenCV + TVM slow performance and AutoTVM Python CPU affinity problem. However, previous PR has two problem:
For example, we have 2xA72 + 4xA53, previous pr will add all cpu to CPU SET even we call
config_thread_pool
to restrict TVM runs on 4 little cores. So, TVM_MASTER_THREAD could run A72 big core. This is not we want.Solution: we should restrict master in the little cpus or big cpus according to users's setting.
pthread_atfork
. For example, Alibaba's AliOS doesn't have this api.Solution: unified android and linux's way. We will call
SetFullCpuAffinity
if we don't haveTVM_BIND_MASTER_THREAD
.@tqchen @yidawang @vinx13 @ajtulloch Could you help to review it? Thanks.