feature: HPCToolkit data reduction + reading parallel profiles from single application run #141

draganaurosgrbic · 2024-07-11T20:48:33Z

No description provided.

dyokelson

Hi @draganaurosgrbic, I've left a couple specific comments already but there are some high-level things that should be addressed before I do a more thorough re-review.

Please add comments to significant functions at the declaration level and sporadically throughout the code (even if internal) that would help another developer make changes in the correct place especially functions like _parse_context(), _read_profiles_metadata(), _read_summary_profile(), and _read_cct() where there is a lot of functionality. For example, it looks like _parse_context() is also where some of the data reduction happens - this would be important to comment and note for others or future self.
Please add more tests where you turn on and off more of the data reduction flags - e.g., _parallel_profiles_mode, filtering out mpi, openmp, cuda. Ideally there would be at least one test for every single flag, I realize a lot of them have similar logic, but it would be best to be as thorough here as possible. At a minimum we should have one test for filtering out one of the library calls and probably the parallel profiles mode added.

hatchet/graphframe.py

hatchet/readers/hpctoolkit_reader_latest.py

michaelmckinsey1

As Dewi said, it would be nice to get as many unit tests as possible. And it would give us a baseline for what to expect from these arguments.

hatchet/graphframe.py

michaelmckinsey1 · 2024-08-27T17:46:42Z

hatchet/graphframe.py

-            min_percentage_of_application_time (int): minimum percentage of application time that nodes in the CCT must have to be imported in Hatchet
-            min_percentage_of_parent_time (int): minimum percentage of parent time that nodes in the CCT must have to be imported in Hatchet
+            directory_mapping (dict): Python dictionary that maps file system location to a name
+            parallel_profiles_mode (bool): fleg whether the reader should extract parallel profiles from the database (true) or the summary profile (false)


Running with parallel_profiles_mode:

gf = ht.GraphFrame.from_hpctoolkit_latest( dirname=f"hatchet/tests/data/hpctoolkit-gamess", parallel_profiles_mode=True, )

I get an error:

{ "name": "KeyError", "message": "1195", "stack": "--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[9], line 1 ----> 1 gf = ht.GraphFrame.from_hpctoolkit_latest( 2 dirname=f\"hatchet/tests/data/hpctoolkit-gamess\", 3 #max_depth=10, 4 parallel_profiles_mode=True, 5 ) File ~/Documents/repos/hatchet/hatchet/graphframe.py:181, in GraphFrame.from_hpctoolkit_latest(dirname, directory_mapping, parallel_profiles_mode, max_depth, min_percentage_of_application_time, exclude_mpi_function_details, exclude_openmp_function_details, exclude_cuda_function_details, exclude_system_libraries_source_code, exclude_function_call_lines, exclude_no_source_code_instructions, exclude_instructions, exclude_non_function_nodes, label_function_nodes, metric_names, metric_scopes, summary_metrics, profile_ranks) 159 # import this lazily to avoid circular dependencies 160 from .readers.hpctoolkit_reader_latest import HPCToolkitReaderLatest 162 return HPCToolkitReaderLatest( 163 dirname, 164 directory_mapping=directory_mapping, 165 parallel_profiles_mode=parallel_profiles_mode, 166 max_depth=max_depth, 167 min_application_percentage_time=min_percentage_of_application_time, 168 exclude_mpi_function_details=exclude_mpi_function_details, 169 exclude_openmp_function_details=exclude_openmp_function_details, 170 exclude_cuda_function_details=exclude_cuda_function_details, 171 exclude_system_libraries_source_code=exclude_system_libraries_source_code, 172 exclude_function_call_lines=exclude_function_call_lines, 173 exclude_no_source_code_instructions=exclude_no_source_code_instructions, 174 exclude_instructions=exclude_instructions, 175 exclude_non_function_nodes=exclude_non_function_nodes, 176 label_function_nodes=label_function_nodes, 177 metric_names=metric_names, 178 metric_scopes=metric_scopes, 179 summary_metrics=summary_metrics, 180 profile_ranks=profile_ranks, --> 181 ).read() File ~/Documents/repos/hatchet/hatchet/readers/hpctoolkit_reader_latest.py:781, in HPCToolkitReaderLatest.read(self) 778 if self._parallel_profiles_mode: 779 self._read_profiles_metadata() --> 781 return self._read_cct() File ~/Documents/repos/hatchet/hatchet/readers/hpctoolkit_reader_latest.py:669, in HPCToolkitReaderLatest._read_cct(self) 660 (szChildren, pChildren, ctxId, entryPoint) = safe_unpack( 661 \"<QQLH\", 662 meta_db, (...) 665 szEntryPoint, 666 ) 668 if entryPoint == 1: --> 669 self._total_execution_time = self._summary_profile[ctxId][ 670 self._time_metric 671 ] 673 node = self._store_cct_node( 674 ctxId, 675 {\"type\": \"entry\", \"name\": \"entry\"}, 676 ) 677 self._parse_context( 678 pChildren, 679 szChildren, 680 node, 681 meta_db, 682 ) KeyError: 1195" }

If this is expected can we add an error message if the parallel profiles are not available? Like parallel profiles are not available for this database.

hatchet/graphframe.py

…ingle application run

ilumsden requested review from ilumsden and michaelmckinsey1 July 17, 2024 17:49

ilumsden added area-readers Issues and PRs involving Hatchet's data readers priority-normal Normal priority issues and PRs status-ready-for-review This PR is ready to be reviewed by assigned reviewers type-feature Requests for new features or PRs which implement new features labels Jul 17, 2024

pearce8 requested a review from dyokelson August 26, 2024 17:05

dyokelson requested changes Aug 26, 2024

View reviewed changes

hatchet/graphframe.py Show resolved Hide resolved

hatchet/readers/hpctoolkit_reader_latest.py Show resolved Hide resolved

michaelmckinsey1 requested changes Aug 27, 2024

View reviewed changes

Dragana Grbic and others added 4 commits August 29, 2024 16:04

feature: HPCToolkit data reduction + reading parallel profiles from s…

585b0ed

…ingle application run

fix: tests fixed

978b8ed

fix: black formatting

b2452f0

minnor: replace 'fleg' with 'flag' spelling error

f11929d

draganaurosgrbic force-pushed the hpctoolkit-data-reduction-parallel-profiles branch from 77abf3a to f11929d Compare September 12, 2024 18:37

format: HPCToolkit reader class

1c77e54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: HPCToolkit data reduction + reading parallel profiles from single application run #141

feature: HPCToolkit data reduction + reading parallel profiles from single application run #141

draganaurosgrbic commented Jul 11, 2024

dyokelson left a comment •

edited

Loading

michaelmckinsey1 left a comment

michaelmckinsey1 Aug 27, 2024

feature: HPCToolkit data reduction + reading parallel profiles from single application run #141

Are you sure you want to change the base?

feature: HPCToolkit data reduction + reading parallel profiles from single application run #141

Conversation

draganaurosgrbic commented Jul 11, 2024

dyokelson left a comment • edited Loading

Choose a reason for hiding this comment

michaelmckinsey1 left a comment

Choose a reason for hiding this comment

michaelmckinsey1 Aug 27, 2024

Choose a reason for hiding this comment

dyokelson left a comment •

edited

Loading