Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1938 remove unneeded calls to mpi wtime in trace and lb #1998

Merged

Conversation

stmcgovern
Copy link
Contributor

@stmcgovern stmcgovern commented Oct 18, 2022

Closes #1938.
Pass the current time from the scheduler to LB and Trace contexts.
This has been modified to allow the scheduler to hold a current_sched_time_ and then the appropriate runnable contexts (here LB and Trace) can query their start time from the scheduler. This change was primarily the result of not wanting to force all runnable contexts to take a time (even in contexts where that is inappropriate). Still targets the issue by reducing calls to MPI_Wtime (although increased dereferencing of the theSched())

@stmcgovern stmcgovern linked an issue Oct 18, 2022 that may be closed by this pull request
@github-actions
Copy link

github-actions bot commented Oct 18, 2022

Pipelines results

PR tests (gcc-12, ubuntu, mpich)

Build for d46c6d1 (2023-02-15 01:01:05 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-5, ubuntu, mpich)

Build for 5046d72 (2022-11-17 20:24:02 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-3.9, ubuntu, mpich)

Build for 5046d72 (2022-11-17 20:24:02 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-5.0, ubuntu, mpich)

Build for 5046d72 (2022-11-17 20:24:02 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-10, ubuntu, openmpi, no LB)

Build for d46c6d1 (2023-02-15 01:01:05 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-6, ubuntu, mpich)

Build for 5046d72 (2022-11-17 20:24:02 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-9, ubuntu, mpich)

Build for d46c6d1 (2023-02-15 01:01:05 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-13, alpine, mpich)

Build for 5046d72 (2022-11-17 20:24:02 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-9, ubuntu, mpich, zoltan)

Build for d46c6d1 (2023-02-15 01:01:05 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-8, ubuntu, mpich, address sanitizer)

Build for d46c6d1 (2023-02-15 01:01:05 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-11, ubuntu, mpich)

Build for d46c6d1 (2023-02-15 01:01:05 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-12, ubuntu, mpich)

Build for d46c6d1 (2023-02-15 01:01:05 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (intel icpx, ubuntu, mpich)

Build for 5046d72 (2022-11-17 20:24:02 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-13, ubuntu, mpich)

Build for d46c6d1 (2023-02-15 01:01:05 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (nvidia cuda 10.1, ubuntu, mpich)

Build for 5046d72 (2022-11-17 20:24:02 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (nvidia cuda 11.0, ubuntu, mpich)

Build for d46c6d1 (2023-02-15 01:01:05 UTC)

nvcc_wrapper does not accept standard flags -std=c++1z since partial standard flags and standards after C++17 are not supported. nvcc_wrapper will use -std=c++14 instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration.
nvcc_wrapper does not accept standard flags -std=c++1z since partial standard flags and standards after C++17 are not supported. nvcc_wrapper will use -std=c++14 instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration.
nvcc_wrapper does not accept standard flags -std=c++1z since partial standard flags and standards after C++17 are not supported. nvcc_wrapper will use -std=c++14 instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration.
nvcc_wrapper does not accept standard flags -std=c++1z since partial standard flags and standards after C++17 are not supported. nvcc_wrapper will use -std=c++14 instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration.
nvcc_wrapper does not accept standard flags -std=c++1z since partial standard flags and standards after C++17 are not supported. nvcc_wrapper will use -std=c++14 instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration.
nvcc_wrapper does not accept standard flags -std=c++1z since partial standard flags and standards after C++17 are not supported. nvcc_wrapper will use -std=c++14 instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration.
FAILED: tests/CMakeFiles/epoch_nompi.dir/Unity/unity_0_cxx.cxx.o 
/usr/bin/ccache /nvcc_wrapper/build/nvcc_wrapper -DJSON_USE_IMPLICIT_CONVERSIONS=1 -DVT_NO_COLOR_ENABLED -I/vt/tests/unit -I/vt/lib/CLI -I/vt/lib/json/include -I/vt/lib/brotli/c/include -Irelease -I/vt/src -isystem /vt/tests/extern/googletest/googletest/include -isystem /vt/tests/extern/googletest/googletest -isystem /vt/lib/fmt/include -isystem /vt/lib/EngFormat-Cpp/include -isystem /build/checkpoint/install/include -isystem /build/detector/install/include -Wno-deprecated-gpu-targets -O3 -DNDEBUG -fdiagnostics-color=always -Wall -pedantic -Wshadow -Wno-unknown-pragmas -Wsign-compare -ftemplate-backtrace-limit=100 -Werror -fPIC -std=c++1z -MD -MT tests/CMakeFiles/epoch_nompi.dir/Unity/unity_0_cxx.cxx.o -MF tests/CMakeFiles/epoch_nompi.dir/Unity/unity_0_cxx.cxx.o.d -o tests/CMakeFiles/epoch_nompi.dir/Unity/unity_0_cxx.cxx.o -c tests/CMakeFiles/epoch_nompi.dir/Unity/unity_0_cxx.cxx
nvcc_wrapper does not accept standard flags -std=c++1z since partial standard flags and standards after C++17 are not supported. nvcc_wrapper will use -std=c++14 instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration.
/build/checkpoint/install/include/checkpoint/container/tuple_serialize.h(86): error%0D%0A%0D%0A%0D%0A ==> And there is more. Read log. <==

Build log


PR tests (clang-14, ubuntu, mpich)

Build for d46c6d1 (2023-02-15 01:01:05 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (clang-10, ubuntu, mpich)

Build for d46c6d1 (2023-02-15 01:01:05 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (gcc-11, ubuntu, mpich, json schema test)

Build for d46c6d1 (2023-02-15 01:01:05 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (intel icpc, ubuntu, mpich)

Build for d46c6d1 (2023-02-15 01:01:05 UTC)

intel-cc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
intel-cc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
intel-cc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
intel-cc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
intel-cc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
intel-cc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
intel-cc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
intel-cc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
intel-cc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.-%0D%0A%0D%0A%0D%0A ==> And there is more. Read log. <==

Build log


PR tests (gcc-7, ubuntu, mpich, trace runtime, LB)

Build for 5046d72 (2022-11-17 20:24:02 UTC)

Compilation - successful

Testing - passed

Build log


PR tests (nvidia cuda 11.2, ubuntu, mpich)

Build for ( UTC)

nvcc_wrapper does not accept standard flags -std=c++1z since partial standard flags and standards after C++17 are not supported. nvcc_wrapper will use -std=c++14 instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration.
nvcc_wrapper does not accept standard flags -std=c++1z since partial standard flags and standards after C++17 are not supported. nvcc_wrapper will use -std=c++14 instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration.
nvcc_wrapper does not accept standard flags -std=c++1z since partial standard flags and standards after C++17 are not supported. nvcc_wrapper will use -std=c++14 instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration.
nvcc_wrapper does not accept standard flags -std=c++1z since partial standard flags and standards after C++17 are not supported. nvcc_wrapper will use -std=c++14 instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration.
FAILED: tests/CMakeFiles/epoch_nompi.dir/Unity/unity_0_cxx.cxx.o 
/usr/bin/ccache /nvcc_wrapper/build/nvcc_wrapper -DJSON_USE_IMPLICIT_CONVERSIONS=1 -DVT_NO_COLOR_ENABLED -I/vt/tests/unit -I/vt/lib/CLI -I/vt/lib/json/include -I/vt/lib/brotli/c/include -Irelease -I/vt/src -isystem /vt/tests/extern/googletest/googletest/include -isystem /vt/tests/extern/googletest/googletest -isystem /vt/lib/fmt/include -isystem /vt/lib/EngFormat-Cpp/include -isystem /build/checkpoint/install/include -isystem /build/detector/install/include -Wno-deprecated-gpu-targets -O3 -DNDEBUG -fdiagnostics-color=always -Wall -pedantic -Wshadow -Wno-unknown-pragmas -Wsign-compare -ftemplate-backtrace-limit=100 -Werror -fPIC -std=c++1z -MD -MT tests/CMakeFiles/epoch_nompi.dir/Unity/unity_0_cxx.cxx.o -MF tests/CMakeFiles/epoch_nompi.dir/Unity/unity_0_cxx.cxx.o.d -o tests/CMakeFiles/epoch_nompi.dir/Unity/unity_0_cxx.cxx.o -c tests/CMakeFiles/epoch_nompi.dir/Unity/unity_0_cxx.cxx
nvcc_wrapper does not accept standard flags -std=c++1z since partial standard flags and standards after C++17 are not supported. nvcc_wrapper will use -std=c++14 instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration.
/build/checkpoint/install/include/checkpoint/container/tuple_serialize.h(86): error: namespace "std" has no member "launder"

/build/checkpoint/install/include/checkpoint/container/tuple_serialize.h(86): error: type name is not allowed

/vt/src/vt/runnable/runnable.h(238): warning: constexpr if statements are a C++17 feature

/vt/src/vt/runnable/runnable.h(238): error: namespace "std" has no member "is_void_v"

/vt/src/vt/runnable/runnable.h(238): error: namespace "std" has no member "invoke_result_t"

/vt/src/vt/runnable/runnable.h(238): error: type name is not allowed-=-%0D%0A%0D%0A%0D%0A ==> And there is more. Read log. <==

Build log


@stmcgovern stmcgovern marked this pull request as ready for review October 24, 2022 23:32
@stmcgovern stmcgovern force-pushed the 1938-remove-unneeded-calls-to-mpi_wtime-in-trace-and-lb branch from 57775d0 to 6ef8a04 Compare October 24, 2022 23:34
@codecov
Copy link

codecov bot commented Oct 25, 2022

Codecov Report

Merging #1998 (9fe7f0a) into develop (54ad63d) will decrease coverage by 1.87%.
The diff coverage is 66.66%.

❗ Current head 9fe7f0a differs from pull request most recent head d46c6d1. Consider uploading reports for the commit d46c6d1 to get more accurate results

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1998      +/-   ##
===========================================
- Coverage    84.86%   82.99%   -1.87%     
===========================================
  Files          720      731      +11     
  Lines        25674    25851     +177     
===========================================
- Hits         21788    21456     -332     
- Misses        3886     4395     +509     
Impacted Files Coverage Δ
src/vt/elm/elm_lb_data.h 100.00% <ø> (ø)
src/vt/parameterization/parameterization.h 100.00% <ø> (ø)
src/vt/scheduler/scheduler.cc 67.87% <0.00%> (-0.84%) ⬇️
src/vt/trace/trace.cc 49.71% <ø> (ø)
src/vt/trace/trace.h 100.00% <ø> (ø)
src/vt/rdma/state/rdma_state.cc 37.06% <50.00%> (ø)
src/vt/context/runnable_context/lb_data.cc 70.00% <100.00%> (+1.57%) ⬆️
src/vt/context/runnable_context/trace.cc 71.42% <100.00%> (+1.42%) ⬆️
src/vt/elm/elm_lb_data.cc 87.91% <100.00%> (-0.14%) ⬇️
src/vt/runnable/invoke.h 100.00% <100.00%> (ø)
... and 134 more

src/vt/event/event.h Outdated Show resolved Hide resolved
@PhilMiller
Copy link
Member

I would approve the first 3 commits of this PR as resolving 1938 (modulo my comments about default parameters). The latter two commits seem to be less motivated, and less clearly correct.

@stmcgovern
Copy link
Contributor Author

Thanks for your comments @PhilMiller. I'll drop the last 2 commits and see about the default arguments.

@stmcgovern stmcgovern force-pushed the 1938-remove-unneeded-calls-to-mpi_wtime-in-trace-and-lb branch 2 times, most recently from 7f463fa to 31b91f3 Compare November 15, 2022 23:55
@stmcgovern stmcgovern force-pushed the 1938-remove-unneeded-calls-to-mpi_wtime-in-trace-and-lb branch from 31b91f3 to 5046d72 Compare November 17, 2022 20:24
@stmcgovern stmcgovern force-pushed the 1938-remove-unneeded-calls-to-mpi_wtime-in-trace-and-lb branch from 5046d72 to 5880acc Compare November 29, 2022 00:44
lifflander
lifflander previously approved these changes Nov 29, 2022
Copy link
Collaborator

@lifflander lifflander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Member

@PhilMiller PhilMiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I was much happier with the approach of just passing a time down the call stack.

This PR now adds a bunch of calls to timing::getCurrentTime, though I haven't checked whether they're in specifically cold paths.

This change in approach may have been discussed, but that discussion is supposed to be memorialized in comments here. I'll note that the PR description no longer quite matches the code.

Let's talk about this in the meeting this week.

@@ -258,7 +258,8 @@ void Scheduler::runSchedulerOnceImpl(bool msg_only) {
}
} else if (work_queue_.empty()) {
if (curRT->needsCurrentTime()) {
runProgress(msg_only, timing::getCurrentTime());
current_sched_time_ = timing::getCurrentTime();
runProgress(msg_only, current_sched_time_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be worth changing, but since runProgress is a sibling member, it doesn't need to take current_sched_time_ as an argument, since it can access it directly.

@stmcgovern
Copy link
Contributor Author

Frankly, I was much happier with the approach of just passing a time down the call stack.

This PR now adds a bunch of calls to timing::getCurrentTime, though I haven't checked whether they're in specifically cold paths.

This change in approach may have been discussed, but that discussion is supposed to be memorialized in comments here. I'll note that the PR description no longer quite matches the code.

Let's talk about this in the meeting this week.

Thanks @PhilMiller. Yes, the idea was to just pass time down the call stack. After removing the default arguments (per your first review), then a time argument needed to be passed through to any runnable (void RunnableNew::run(). Aside from complicating the RunnableNew interface, @lifflander remarked that not all runnables need a time. Then the change of tactic to have the scheduler store a time and allow the appropriate runnable contexts (in particular LB and trace) to ask for a start time was adopted.

theTrace()->beginProcessing called timing::getCurrentTime by default. Removing this implicit use required getting the current time explicitly at certain call sites.

I'll update the PR description to reflect these changes.

@@ -140,7 +140,7 @@ static trace::TraceProcessingTag BeginProcessingInvokeEvent() {
const auto trace_event = theTrace()->messageCreation(trace_id, 0);
const auto from_node = theContext()->getNode();

return theTrace()->beginProcessing(trace_id, 0, trace_event, from_node);
return theTrace()->beginProcessing(trace_id, 0, trace_event, from_node, timing::getCurrentTime());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may actually illustrate a case for why I like the 'pass the time as an argument to contexts' approach better - if we want to add LB attribution for methods called through invoke, we would want to share this timer call for the start time, rather than have it reference the scheduler's recorded time, or need another new interface for the divergent case.

@@ -62,6 +62,7 @@ void BaseUnit::execute() {
#endif
} else if (work_) {
work_();
theSched()->setRecentTimeToStale();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be outside the if / else if, or duplicated in both arms? Why shouldn't the time be updated after we've run stuff in a thread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's in the run function (runnable.cc line 183). Should we just pull it out to here (outside the if/else)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left as is, since the runnable branch will also need to get a finish time.

@PhilMiller
Copy link
Member

I think I'm generally ok with this, and just need to see the couple comments addressed.

This does need to be run through the same fine-grain overhead benchmarks we used to identify time calls as a concern to begin with. I'd like to be confident that added calls to theSched() aren't losing us whatever ground we're gaining, or even a large fraction of it. If they are, then I'd push for accepting that every context is going to take a time argument, useful or not, and shifting the logic for when to check the clock and what time to pass down the stack more thoroughly into the scheduler.

@stmcgovern stmcgovern force-pushed the 1938-remove-unneeded-calls-to-mpi_wtime-in-trace-and-lb branch from e6bb911 to 44f3132 Compare February 14, 2023 16:53
@PhilMiller
Copy link
Member

Some runs of the microbenchmark were getting ~30% reduction in time, directly attributable to reduced time spent in PMPI_Wtime. So, I think we're good on achieving the goal here.

@stmcgovern stmcgovern marked this pull request as ready for review February 14, 2023 23:20
@stmcgovern
Copy link
Contributor Author

Improvements to the tests will be pursued in VT #2017

@lifflander lifflander merged commit 893ca59 into develop Feb 15, 2023
@nmm0 nmm0 mentioned this pull request May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove unneeded calls to MPI_Wtime in Trace and LB
3 participants