-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CUDAPinnedPlace #9380
Add CUDAPinnedPlace #9380
Conversation
702ada3
to
b302feb
Compare
b302feb
to
158d6c4
Compare
e6c9e58
to
6b8b216
Compare
6b8b216
to
ab601c1
Compare
c0b3775
to
6378853
Compare
6378853
to
ffa6397
Compare
|
||
size_t usable = paddle::platform::GpuMaxAllocSize() - fallback_alloc_size_; | ||
size_t usable = | ||
paddle::platform::CUDAPinnedMaxAllocSize() - cuda_pinnd_alloc_size_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems default pinned memory max size is determined by system settings, can use ulimit -l
to check out current system locked memory max size. I took a look at our current machines, the default value seems very small (64KB), to increase this setting, need to run ulimit -l [new size]
.probably it's better to add one document to describe this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review!
Searching around the internet, I found that
ulimit -l
does affect the amount of memory we can memlock(), but it does not affect cudaMallocHost() because the CUDA pinning allocator doesn't use memlock.- the pinned allocator on CUDA under the hood is using mmap() with MAP_FIXED. The experiment is here.
- So theoretically, the max size of pinned memory can be the max size of physical memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed information, quiet useful!
paddle/fluid/memory/memory.cc
Outdated
auto* buddy_allocator = GetCUDAPinnedBuddyAllocator(); | ||
void* ptr = buddy_allocator->Alloc(size); | ||
|
||
// if (ptr == nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove these comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/fluid/platform/cpu_info.cc
Outdated
@@ -27,6 +27,10 @@ DEFINE_double(fraction_of_cpu_memory_to_use, 1, | |||
"Default use 100% of CPU memory for PaddlePaddle," | |||
"reserve the rest for page tables, etc"); | |||
|
|||
DEFINE_double(fraction_of_cuda_pinned_memory_to_use, 0.5, | |||
"Default use 100% of CPU memory for PaddlePaddle," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flag description need to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/fluid/platform/cpu_info.cc
Outdated
} | ||
|
||
size_t CUDAPinnedMaxChunkSize() { | ||
// Allow to allocate the maximum chunk size is roughly 0.39% of CUDA_PINNED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just say it's 1/256 total size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
// of host pinned allocation. Allocates too much would reduce | ||
// the amount of memory available to the underlying system for paging. | ||
size_t usable = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is line 58 FLAGS_use_pinned_memory
useful now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
} | ||
|
||
TEST(CPUANDCUDAPinned, CPUAllocator) { | ||
test_pinned_memory<paddle::platform::CPUPlace>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to assert that pinned memory is faster than common memory K series?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
size_t gpu_alloc_size_ = | ||
0; // TODO(zcd): how to define the upper limit of CUDAPinnedMemory? | ||
size_t fallback_alloc_size_ = 0; | ||
size_t cuda_pinnd_alloc_size_ = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments at line 24 should be modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
fdd5dfc
to
0652737
Compare
0652737
to
766c740
Compare
paddle/fluid/memory/CMakeLists.txt
Outdated
if (WITH_GPU) | ||
nv_test(pinned_memory_test SRCS pinned_memory_test.cu DEPS place paddle_memory) | ||
endif() | ||
# if (WITH_GPU) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question.
… feature/add_CUDAPinnedPlace
e43b8c7
to
cf07057
Compare
cf07057
to
fe068e2
Compare
f7178f0
to
638a8b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
#include "paddle/fluid/platform/gpu_info.h" | ||
#include "paddle/fluid/platform/place.h" | ||
|
||
// This unit test is an example comparing the performance between using pinned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need to move benchmark tests using https://github.com/google/benchmark later and save some CI time.
638a8b4
to
51c22fe
Compare
* fix empty state_dict * update sharding split_parma
fix #8728
related PR: #9216
CUDA pinned memory is different with CPU memory and GPU memory, physically, it's at CPU side, but it can be accessed by CPU and GPU. In the last PR, I add an argument,
is_pinned_
, toTensor
, but if a user is careless, it is dangerous that pinned memory's tensor is involved in the model computation, which will lead to slow training speed and very difficult to find. So After talking with @typhoonzero, I add theCUDAPinnedPlace
.Note: Currently, pinned memory is only used for memory copying.
This PR's work: