Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP --- Temporal reduction profile #776

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

burlen
Copy link
Collaborator

@burlen burlen commented Sep 8, 2023

time each stage in the app.
this may need work/cleanup before merge
this info is already captured by the profiler.

@burlen
Copy link
Collaborator Author

burlen commented Sep 8, 2023

Fastest

perlmutter_kernel_profiling_Fastest

Average

perlmutter_kernel_profiling_Average

Slowest

perlmutter_kernel_profiling_Slowest

Takeaway: The temporal reduction is much faster on the GPU. I/O is slower, and has a lot more variability when GPU is used. Timing captures everything within execute of each stage

@burlen burlen force-pushed the temporal_reduction_profile branch from 218d905 to 02d3db8 Compare September 12, 2023 16:16
@burlen
Copy link
Collaborator Author

burlen commented Sep 12, 2023

varying steps per request (1 reduce thread, 1 writer thread)

steps_per_request_single_thread_1red_1wri

varying steps per request (4 reduce thread, 2 writer thread)

steps_per_request_single_thread_4red_2wri

@burlen
Copy link
Collaborator Author

burlen commented Sep 13, 2023

round 2 steps per request

I redid the tests this time going to larger steps per request. The same patterns appear.

varying steps per request (1 reduce thread, 1 writer thread)

steps_per_request_single_thread_1red_1wri_789

varying steps per request (4 reduce thread, 2 writer thread)

steps_per_request_single_thread_4red_2wri_789

@burlen
Copy link
Collaborator Author

burlen commented Sep 13, 2023

single node w. MPI

perlmutter_1_node_gpu_cpu_mpi_spr

@burlen
Copy link
Collaborator Author

burlen commented Sep 14, 2023

new vs old

perlmutter_1_node_gpu_cpu_mpi_spr_strm

@burlen
Copy link
Collaborator Author

burlen commented Sep 16, 2023

steps_per_request_single_thread_1red_1wri_cfs_scratch

Base automatically changed from temporal_reduction_multiple_steps_per_request to develop September 16, 2023 00:41
@burlen
Copy link
Collaborator Author

burlen commented Sep 19, 2023

steps_per_request_single_thread_cfs_lfs_nocomp
steps_per_request_single_thread_cfs_lfs_comp

@burlen
Copy link
Collaborator Author

burlen commented Sep 19, 2023

steps_per_request_single_thread_cfs_lfs_comp_nocomp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant