Add CUDAPinnedPlace #9380

chengduoZH · 2018-03-26T12:25:33Z

fix #8728
related PR: #9216

CUDA pinned memory is different with CPU memory and GPU memory, physically, it's at CPU side, but it can be accessed by CPU and GPU. In the last PR, I add an argument, is_pinned_, to Tensor, but if a user is careless, it is dangerous that pinned memory's tensor is involved in the model computation, which will lead to slow training speed and very difficult to find. So After talking with @typhoonzero, I add the CUDAPinnedPlace.

Note: Currently, pinned memory is only used for memory copying.

This PR's work:

add CUDAPinnedPlace
add CUDAPinnedContext. In order to be compatible with CPUPlace and CUDAPlace.
add Copy case (CUDAPinnedPlace-> CUDAPlace, CUDAPlace-> CUDAPinnedPlace, CUDAPinnedPlace-> CPUPlace, CPUPlace-> CUDAPinnedPlace, CUDAPinnedPlace-> CUDAPinnedPlace)

typhoonzero · 2018-03-30T08:10:11Z

paddle/fluid/memory/detail/system_allocator.cc

-
-  size_t usable = paddle::platform::GpuMaxAllocSize() - fallback_alloc_size_;
+  size_t usable =
+      paddle::platform::CUDAPinnedMaxAllocSize() - cuda_pinnd_alloc_size_;


Seems default pinned memory max size is determined by system settings, can use ulimit -l to check out current system locked memory max size. I took a look at our current machines, the default value seems very small (64KB), to increase this setting, need to run ulimit -l [new size].probably it's better to add one document to describe this.

Thanks for your review!
Searching around the internet, I found that

ulimit -l does affect the amount of memory we can memlock(), but it does not affect cudaMallocHost() because the CUDA pinning allocator doesn't use memlock.

the pinned allocator on CUDA under the hood is using mmap() with MAP_FIXED. The experiment is here.

So theoretically, the max size of pinned memory can be the max size of physical memory.

Thanks for the detailed information, quiet useful!

typhoonzero · 2018-04-02T01:49:23Z

paddle/fluid/memory/memory.cc

+  auto* buddy_allocator = GetCUDAPinnedBuddyAllocator();
+  void* ptr = buddy_allocator->Alloc(size);
+
+  //  if (ptr == nullptr) {


Can remove these comments.

typhoonzero · 2018-04-02T01:55:12Z

paddle/fluid/platform/cpu_info.cc

@@ -27,6 +27,10 @@ DEFINE_double(fraction_of_cpu_memory_to_use, 1,
              "Default use 100% of CPU memory for PaddlePaddle,"
              "reserve the rest for page tables, etc");

+DEFINE_double(fraction_of_cuda_pinned_memory_to_use, 0.5,
+              "Default use 100% of CPU memory for PaddlePaddle,"


flag description need to be updated.

typhoonzero · 2018-04-02T01:57:54Z

paddle/fluid/platform/cpu_info.cc

+}
+
+size_t CUDAPinnedMaxChunkSize() {
+  // Allow to allocate the maximum chunk size is roughly 0.39% of CUDA_PINNED


just say it's 1/256 total size.

gongweibao

Awesome!

gongweibao · 2018-04-03T06:29:33Z

paddle/fluid/memory/detail/system_allocator.cc

  // of host pinned allocation. Allocates too much would reduce
  // the amount of memory available to the underlying system for paging.
+  size_t usable =


Is line 58 FLAGS_use_pinned_memory useful now?

gongweibao · 2018-04-03T06:33:41Z

paddle/fluid/memory/pinned_memory_test.cu

+}
+
+TEST(CPUANDCUDAPinned, CPUAllocator) {
+  test_pinned_memory<paddle::platform::CPUPlace>();


Do we need to assert that pinned memory is faster than common memory K series?

gongweibao · 2018-04-03T09:07:16Z

paddle/fluid/memory/detail/system_allocator.h

-  size_t gpu_alloc_size_ =
-      0;  // TODO(zcd): how to define the upper limit of CUDAPinnedMemory?
-  size_t fallback_alloc_size_ = 0;
+  size_t cuda_pinnd_alloc_size_ = 0;


Comments at line 24 should be modified.

Done, thanks!

gongweibao · 2018-04-03T13:38:42Z

paddle/fluid/memory/CMakeLists.txt

-if (WITH_GPU)
-    nv_test(pinned_memory_test SRCS pinned_memory_test.cu  DEPS place paddle_memory)
-endif()
+# if (WITH_GPU)


gongweibao

Just a question.

… feature/add_CUDAPinnedPlace

typhoonzero

LGTM

typhoonzero · 2018-04-04T02:52:52Z

paddle/fluid/memory/pinned_memory_test.cu

+#include "paddle/fluid/platform/gpu_info.h"
+#include "paddle/fluid/platform/place.h"
+
+// This unit test is an example comparing the performance between using pinned


Maybe we need to move benchmark tests using https://github.com/google/benchmark later and save some CI time.

* fix empty state_dict * update sharding split_parma

add CUDAPinnedPlace

18eb773

chengduoZH requested review from typhoonzero and QiJune March 26, 2018 12:26

chengduoZH force-pushed the feature/add_CUDAPinnedPlace branch 2 times, most recently from 702ada3 to b302feb Compare March 26, 2018 13:01

add unit test

158d6c4

chengduoZH force-pushed the feature/add_CUDAPinnedPlace branch from b302feb to 158d6c4 Compare March 26, 2018 14:43

chengduoZH requested a review from jacquesqiao March 26, 2018 15:42

chengduoZH force-pushed the feature/add_CUDAPinnedPlace branch from e6c9e58 to 6b8b216 Compare March 27, 2018 02:42

Add CUDAPinnedPlace

ab601c1

chengduoZH force-pushed the feature/add_CUDAPinnedPlace branch from 6b8b216 to ab601c1 Compare March 27, 2018 02:48

chengduoZH changed the title ~~[WIP] Add CUDAPinnedPlace~~ Add CUDAPinnedPlace Mar 27, 2018

set the max size of cudapinned memory

58a9f9f

chengduoZH force-pushed the feature/add_CUDAPinnedPlace branch from c0b3775 to 6378853 Compare March 30, 2018 09:06

compare the performance of unpinned memory and pinned memory

ffa6397

chengduoZH force-pushed the feature/add_CUDAPinnedPlace branch from 6378853 to ffa6397 Compare March 30, 2018 09:25

typhoonzero reviewed Apr 2, 2018

View reviewed changes

follow comments

2514d70

gongweibao approved these changes Apr 3, 2018

View reviewed changes

gongweibao requested changes Apr 3, 2018

View reviewed changes

chengduoZH force-pushed the feature/add_CUDAPinnedPlace branch 3 times, most recently from fdd5dfc to 0652737 Compare April 3, 2018 11:49

follow comments

766c740

chengduoZH force-pushed the feature/add_CUDAPinnedPlace branch from 0652737 to 766c740 Compare April 3, 2018 13:03

gongweibao reviewed Apr 3, 2018

View reviewed changes

gongweibao previously approved these changes Apr 3, 2018

View reviewed changes

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e099b18

… feature/add_CUDAPinnedPlace

chengduoZH dismissed gongweibao’s stale review via e43b8c7 April 3, 2018 14:41

chengduoZH force-pushed the feature/add_CUDAPinnedPlace branch from e43b8c7 to cf07057 Compare April 3, 2018 14:43

chengduoZH force-pushed the feature/add_CUDAPinnedPlace branch from cf07057 to fe068e2 Compare April 3, 2018 14:51

follow comments

51c22fe

chengduoZH force-pushed the feature/add_CUDAPinnedPlace branch 2 times, most recently from f7178f0 to 638a8b4 Compare April 4, 2018 02:25

typhoonzero previously approved these changes Apr 4, 2018

View reviewed changes

chengduoZH dismissed typhoonzero’s stale review via 51c22fe April 4, 2018 03:05

chengduoZH force-pushed the feature/add_CUDAPinnedPlace branch from 638a8b4 to 51c22fe Compare April 4, 2018 03:05

typhoonzero approved these changes Apr 4, 2018

View reviewed changes

chengduoZH merged commit c14305f into PaddlePaddle:develop Apr 4, 2018

blacksheep-Aristotle pushed a commit to blacksheep-Aristotle/Paddle that referenced this pull request Nov 22, 2024

[Unified Checkpoint] Support empty state_dict saving (PaddlePaddle#9380)

d526be2

* fix empty state_dict * update sharding split_parma

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CUDAPinnedPlace #9380

Add CUDAPinnedPlace #9380

chengduoZH commented Mar 26, 2018 •

edited by tonyyang-svail

Loading

typhoonzero Mar 30, 2018

chengduoZH Apr 2, 2018

typhoonzero Apr 4, 2018

typhoonzero Apr 2, 2018

chengduoZH Apr 2, 2018

typhoonzero Apr 2, 2018

chengduoZH Apr 2, 2018

typhoonzero Apr 2, 2018

chengduoZH Apr 2, 2018

gongweibao left a comment

gongweibao Apr 3, 2018 •

edited

Loading

chengduoZH Apr 3, 2018

gongweibao Apr 3, 2018

chengduoZH Apr 3, 2018

gongweibao Apr 3, 2018

chengduoZH Apr 3, 2018

gongweibao Apr 3, 2018

gongweibao left a comment

typhoonzero left a comment

typhoonzero Apr 4, 2018

Add CUDAPinnedPlace #9380

Add CUDAPinnedPlace #9380

Conversation

chengduoZH commented Mar 26, 2018 • edited by tonyyang-svail Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gongweibao left a comment

Choose a reason for hiding this comment

gongweibao Apr 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gongweibao left a comment

Choose a reason for hiding this comment

typhoonzero left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chengduoZH commented Mar 26, 2018 •

edited by tonyyang-svail

Loading

gongweibao Apr 3, 2018 •

edited

Loading