-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reuduce memory copy when communication between trainer and pserver. #9271
Conversation
@@ -237,6 +242,8 @@ def train_loop(exe, trainer_prog): | |||
"TRAINING_ROLE", | |||
"TRAINER") # get the training role: trainer/pserver | |||
|
|||
#print(debuger.pprint_program_codes(fluid.default_main_program().desc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -251,6 +258,7 @@ def train_loop(exe, trainer_prog): | |||
if not current_endpoint: | |||
print("need env SERVER_ENDPOINT") | |||
exit(1) | |||
print("get_pserver_program") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove print
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not removed yet
@@ -0,0 +1,315 @@ | |||
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use argument to decide whether to run tf as local or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
paddle/fluid/framework/threadpool.cc
Outdated
@@ -32,7 +32,8 @@ void ThreadPool::Init() { | |||
// TODO(Yancey1989): specify the max threads number | |||
int num_threads = std::thread::hardware_concurrency(); | |||
PADDLE_ENFORCE_GT(num_threads, 0); | |||
threadpool_.reset(new ThreadPool(num_threads)); | |||
// threadpool_.reset(new ThreadPool(num_threads)); | |||
threadpool_.reset(new ThreadPool(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should revert this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
::google::protobuf::io::ZeroCopyInputStream* contents() override { | ||
DeleteStream(); | ||
stream_ = new (&space_) Reader(buffer_); | ||
return stream_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need so many wrappers, just create the ::grpc::GrpcBufferReader
as ZeroCopyInputStream
could be simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So make it don't have any abstract type. The interface is simple enough to understand without this abstract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没看懂。。。。
Parse
函数需要支持ByteBuffer
和grpc_byte_buffer
两种类型的参数,他们都可以转成ZeroCopyInputStream
, 而ZeroCopyInputStream
是不能当做参数类型的。
struct timeval t0_wait, t1_wait; | ||
gettimeofday(&t0_wait, 0); | ||
std::thread::id this_id = std::this_thread::get_id(); | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a lot of work to finish the SerialiseTraits
@@ -251,6 +258,7 @@ def train_loop(exe, trainer_prog): | |||
if not current_endpoint: | |||
print("need env SERVER_ENDPOINT") | |||
exit(1) | |||
print("get_pserver_program") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not removed yet
Done. |
* commit '9c35b0dc1ba0ace5acf721685802a21045ea1249': (36 commits) Fix dist compile error (PaddlePaddle#9320) Fix bug for backward tanspiler when using parallel_do operator. (PaddlePaddle#9282) update fix transpiler bug Update index_en.rst (PaddlePaddle#9286) "fix mixed_vector bug" (PaddlePaddle#9319) Update index_en.rst (PaddlePaddle#9280) Adjust some contents in write_docs_en.rst for Contribue Documentation (PaddlePaddle#9147) CMake refine for HIP support. Fix CI. Reuduce memory copy when communication between trainer and pserver. (PaddlePaddle#9271) Modified build.sh and remove build_doc.sh fix doc Enhance device context pool (PaddlePaddle#9293) Device blobs are created only in training. Added testing attribute Shrink batch_norm_grad's inputs updates prepare and create op before run wip small fix initial commit ... # Conflicts: # cmake/external/eigen.cmake
…ddle#9271) * chatglm2 support block_attn and fix some bugs * fix ci * fix more ut error * update
No description provided.