Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

for support ascvrq #62

Merged
merged 54 commits into from
Jun 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
8b21ef4
for support ascvrq
xymyeah Mar 22, 2024
6ce59ed
for support ascvrq
xymyeah Mar 22, 2024
062e7ea
fix bug
xymyeah Mar 26, 2024
fc44928
fix bug
xymyeah Mar 26, 2024
ca3c115
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Mar 26, 2024
22fdf85
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Mar 26, 2024
4182388
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Mar 26, 2024
a3462e7
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Mar 26, 2024
80e9100
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Mar 26, 2024
892af60
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Mar 26, 2024
6bbeb45
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Mar 26, 2024
61e89b7
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Mar 26, 2024
e960f46
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Mar 27, 2024
82e5d29
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Mar 27, 2024
8f9f0ea
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Mar 27, 2024
0d0b1c8
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Mar 27, 2024
645c27b
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Mar 29, 2024
b69b948
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Mar 29, 2024
66506d1
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 1, 2024
9c83a59
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 1, 2024
94fce75
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 7, 2024
80190f4
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 7, 2024
7606148
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 15, 2024
d779503
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 15, 2024
5387f8b
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 15, 2024
e830827
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 15, 2024
42bf354
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 23, 2024
21f6cf0
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 23, 2024
2aa11b4
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 23, 2024
79d0290
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 23, 2024
00c38ba
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 26, 2024
812fe81
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 26, 2024
ab3dae2
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 28, 2024
f3f7736
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 28, 2024
29734b2
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 28, 2024
8bd8f7d
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Apr 28, 2024
3dfa33d
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah May 6, 2024
a5de133
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah May 6, 2024
52503c7
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah May 8, 2024
1c66502
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah May 8, 2024
01b1276
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah May 15, 2024
e1794a9
Merge branch 'paddlebox' into support_ascvrq
xymyeah Jun 4, 2024
57d98a1
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah May 15, 2024
29bbdde
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Jun 4, 2024
e25324d
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Jun 4, 2024
2da08db
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Jun 5, 2024
0b27dd5
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Jun 5, 2024
2556322
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Jun 5, 2024
777f054
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Jun 5, 2024
b95223b
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Jun 5, 2024
75b40b3
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Jun 5, 2024
ab6a191
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Jun 5, 2024
82a37ff
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Jun 5, 2024
ef9e3b3
Merge branch 'support_ascvrq' of https://github.com/paddlebox-xpu/Pad…
xymyeah Jun 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cmake/external/xpu.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ if (WITH_BOX_PS OR WITH_XPU_KP)
CACHE STRING "" FORCE)
#"https://klx-sdk-release-public.su.bcebos.com/xdnn/release/2.6.0.1/${XPU_XDNN_DIR_NAME}.tar.gz"
set(XPU_XDNN_URL
"https://klx-sdk-release-public.su.bcebos.com/xdnn_train/dev/paddlebox/20240408/${XPU_XDNN_DIR_NAME}.tar.gz"
"https://klx-sdk-release-public.su.bcebos.com/xdnn_train/dev/paddlebox/20240605/${XPU_XDNN_DIR_NAME}.tar.gz"
CACHE STRING "" FORCE)
set(SCALOPUS_URL
"https://klx-sdk-release-public.su.bcebos.com/xdnn_train/dev/paddlebox/20230306/scalopus.tar.gz"
Expand Down
58 changes: 30 additions & 28 deletions paddle/fluid/framework/boxps_trainer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ void BoxPSTrainer::Initialize(const TrainerDesc& trainer_desc,
}

void BoxPSTrainer::InitOtherEnv(const ProgramDesc& main_program) {
if (need_dump_field_ || need_dump_param_) {
if (need_dump_field_) {
InitDumpEnv();
}
VLOG(3) << "init other env done.";
Expand Down Expand Up @@ -138,28 +138,29 @@ void BoxPSTrainer::DumpWork(int tid) {
}
}
void BoxPSTrainer::InitDumpEnv() {
queue_ = paddle::framework::MakeChannel<std::string>();
// Only set dump channel on the last section
for (int i = 0; i < thread_num_; ++i) {
workers_[i]->SetChannelWriter(queue_.get());
}
// TODO(hutuxian): should make it as a config
dump_futures_.clear();
auto pool = GetDumpThreadPool(dump_thread_num_);
for (int i = 0; i < dump_thread_num_; i++) {
dump_futures_.emplace_back(pool->Run([this, i]() { this->DumpWork(i); }));
}
VLOG(0) << "init dump write file thread num=" << dump_thread_num_;
// queue_ = paddle::framework::MakeChannel<std::string>();
// // Only set dump channel on the last section
// for (int i = 0; i < thread_num_; ++i) {
// workers_[i]->SetChannelWriter(queue_.get());
// }
// // TODO(hutuxian): should make it as a config
// dump_futures_.clear();
// auto pool = GetDumpThreadPool(dump_thread_num_);
// for (int i = 0; i < dump_thread_num_; i++) {
// dump_futures_.emplace_back(pool->Run([this, i]() { this->DumpWork(i); }));
// }
// VLOG(0) << "init dump write file thread num=" << dump_thread_num_;
localfs_mkdir(dump_fields_path_);
}
// final dump env
void BoxPSTrainer::FinalizeDumpEnv() {
queue_->Close();
for (auto& th : dump_futures_) {
th.get();
}
dump_futures_.clear();
queue_.reset();
VLOG(0) << "finalize dump write file thread";
// queue_->Close();
// for (auto& th : dump_futures_) {
// th.get();
// }
// dump_futures_.clear();
// queue_.reset();
// VLOG(0) << "finalize dump write file thread";
}

inline std::vector<std::shared_ptr<paddle::framework::ThreadPool>>&
Expand Down Expand Up @@ -221,15 +222,15 @@ void BoxPSTrainer::InitTrainerEnv(const ProgramDesc& main_program,
for (int i = 0; i < thread_num_; ++i) {
wait_futures_.emplace_back(
pool[i]->Run([this, i, &async_param_name, &main_program]() {
auto this_worker =
auto this_worker =
std::dynamic_pointer_cast<paddle::framework::BoxPSWorker>(
workers_[i]);
this_worker->SetRootScope(root_scope_);
if (async_mode_) {
this_worker->SetDenseTable(dense_table_.get());
this_worker->SetAsyncParamName(async_param_name);
}
this_worker->CreateDeviceResource(main_program);
this_worker->SetRootScope(root_scope_);
if (async_mode_) {
this_worker->SetDenseTable(dense_table_.get());
this_worker->SetAsyncParamName(async_param_name);
}
this_worker->CreateDeviceResource(main_program);
}));
}
RemoveOtherDeviceVars(main_program, root_scope_);
Expand Down Expand Up @@ -263,6 +264,7 @@ void BoxPSTrainer::RemoveOtherDeviceVars(const ProgramDesc& main_program,
unpersist_var_names.insert(name);
}
}

}
VLOG(0) << "root scope remove_params size = " << unpersist_var_names.size();
// 2. Get moment param
Expand Down Expand Up @@ -308,7 +310,7 @@ void BoxPSTrainer::Finalize() {
// must be after train thread, otherwise the ps_buffer_ will be closed first
dense_table_->Finalize();
}
if (need_dump_field_ || need_dump_param_) {
if (need_dump_field_) {
FinalizeDumpEnv();
}
root_scope_->DropKids();
Expand Down
Loading