[WIP] Use file metadata to determine whether profiler config should be reloaded. #463

ndodda-amazon · 2021-03-16T07:39:32Z

Description of changes:

For each step, we need to determine if the profiler config JSON has changed, and if so, we should reload the profiler config. Currently, we reload the JSON into memory and physically check whether the file contents have changed in order to determine if the profiler config should be reloaded. However, this may pose problems for performance at scale because we would be loading a JSON object into memory at each step.

This change replaces the above check by inspecting the file metadata for the last modified time. If the last modified time has changed, that means the file has changed and we should reload the profiler config. This is done without loading the JSON into memory (see tests, which verify that the config file is not accessed (read into memory) if the file has not been modified).

Style and formatting:

I have run pre-commit install to ensure that auto-formatting happens with every commit.

Issue number, if available

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov-io · 2021-03-16T08:07:49Z

Codecov Report

Merging #463 (70b65d0) into master (433348d) will decrease coverage by 15.49%.
The diff coverage is 100.00%.

❗ Current head 70b65d0 differs from pull request most recent head 76a6cab. Consider uploading reports for the commit 76a6cab to get more accurate results

@@             Coverage Diff             @@
##           master     #463       +/-   ##
===========================================
- Coverage   65.62%   50.13%   -15.50%     
===========================================
  Files         172      162       -10     
  Lines       13260    12919      -341     
===========================================
- Hits         8702     6477     -2225     
- Misses       4558     6442     +1884

Impacted Files	Coverage Δ
smdebug/profiler/profiler_config_parser.py	`68.66% <100.00%> (-15.80%)`	⬇️
smdebug/profiler/utils.py	`26.66% <100.00%> (-45.56%)`	⬇️
smdebug/pytorch/__init__.py	`0.00% <0.00%> (-100.00%)`	⬇️
smdebug/pytorch/singleton_utils.py	`0.00% <0.00%> (-100.00%)`	⬇️
smdebug/profiler/analysis/utils/merge_timelines.py	`0.00% <0.00%> (-97.52%)`	⬇️
...ug/profiler/analysis/utils/pandas_data_analysis.py	`0.00% <0.00%> (-97.37%)`	⬇️
smdebug/profiler/analysis/python_stats_reader.py	`0.00% <0.00%> (-94.29%)`	⬇️
...debug/profiler/analysis/python_profile_analysis.py	`0.00% <0.00%> (-90.91%)`	⬇️
smdebug/pytorch/collection.py	`0.00% <0.00%> (-90.00%)`	⬇️
...er/analysis/utils/python_profile_analysis_utils.py	`0.00% <0.00%> (-88.41%)`	⬇️
... and 49 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 433348d...76a6cab. Read the comment docs.

NihalHarish · 2021-03-16T23:27:43Z

tests/profiler/core/test_profiler_config_parser.py

    assert case_insensitive_profiler_config_parser.config.detailed_profiling_config.num_steps == 3


 def test_update_step_profiler_config_parser(


Can you run this test inside of a loop, to ensure that we do not have flaky behavior.

Would be better to trivially use pytest.mark.parametrize so that the test itself doesn't change, but we simply run it multiple times and ensure it passes each time.

How many times do you suggest the test be run? 3?

Also, first and foremost, can you explain why this behavior might be flaky? The workflow seems to be clear cut:

Set up the profiler config JSON

Set up the profiler config parser object (which automatically loads the config)

Load the config again

Verify the last modified time has not changed

Replace the config JSON

Load the config again

Verify the last modified time has changed

TL;DR the calls to load_config are controlled, so we know exactly the JSON that is at the path before loading the config each time.

NihalHarish · 2021-03-16T23:29:54Z

smdebug/profiler/utils.py

+    """
+    Get the last time that the file at the given filepath was modified, in the form of a datetime object.
+    """
+    last_modified_time = Path(filepath).stat().st_mtime


Can you explain in comments, the reasons for the choice of this implementation?
A simple link to documentation might also suffice.
https://docs.python.org/3/library/stat.html#stat.ST_MTIME

The advantage of relying on file metadata to determine whether a profiler config should be reloaded is stated in the PR description - we cut down on the amount of times the JSON needs to be loaded into memory.

The choice of specifically using pathlib to get the file metadata was arbitrary - we can just as easily use os with os.path.getmtime(path). In fact, I probably will switch to os.path.getmtime(path) since it appears cleaner.

ndodda-amazon · 2021-03-17T03:44:14Z

tests/profiler/core/test_profiler_config_parser.py


-    # check that reloading the config when it has changed will update the config fields.
-    monkeypatch.setenv("SMPROFILER_CONFIG_PATH", new_step_profiler_config_parser_path)
+    shutil.copy(new_step_profiler_config_parser_path, step_profiler_config_parser_path)


Tests are currently failing since this call has different behavior for Linux and Unix (for Linux, this copies the file metadata too).

Will need to fix this by finding a better API for copying files or writing the JSON manually.

ndodda-amazon · 2021-03-17T07:53:37Z

Closing for now since the root cause of the CI failures has to do with Linux flakiness in the metadata of the profiler config JSON not being updated. Will reopen once I've fixed this issue.

Use file metadata to determine whether profiler config has changed

cb4b9a8

reorder assert statements

716e234

NihalHarish suggested changes Mar 16, 2021

View reviewed changes

ndodda-amazon commented Mar 17, 2021

View reviewed changes

ndodda-amazon changed the title ~~Use file metadata to determine whether profiler config should be reloaded.~~ [WIP] Use file metadata to determine whether profiler config should be reloaded. Mar 17, 2021

ndodda-amazon closed this Mar 17, 2021

fix flakiness

70b65d0

ndodda-amazon reopened this Mar 17, 2021

ndodda-amazon added 2 commits March 17, 2021 03:07

increase sleep time

76a6cab

increase sleep time

221e78a

ndodda-amazon closed this Mar 17, 2021

ndodda-amazon mentioned this pull request Mar 17, 2021

Use file metadata to determine whether profiler config should be reloaded. #464

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Use file metadata to determine whether profiler config should be reloaded. #463

[WIP] Use file metadata to determine whether profiler config should be reloaded. #463

Uh oh!

ndodda-amazon commented Mar 16, 2021

Uh oh!

codecov-io commented Mar 16, 2021 •

edited

Loading

Uh oh!

NihalHarish Mar 16, 2021

Uh oh!

ndodda-amazon Mar 17, 2021

Uh oh!

ndodda-amazon Mar 17, 2021 •

edited

Loading

Uh oh!

NihalHarish Mar 16, 2021

Uh oh!

ndodda-amazon Mar 17, 2021 •

edited

Loading

Uh oh!

ndodda-amazon Mar 17, 2021

Uh oh!

ndodda-amazon commented Mar 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		assert case_insensitive_profiler_config_parser.config.detailed_profiling_config.num_steps == 3


		def test_update_step_profiler_config_parser(

[WIP] Use file metadata to determine whether profiler config should be reloaded. #463

[WIP] Use file metadata to determine whether profiler config should be reloaded. #463

Uh oh!

Conversation

ndodda-amazon commented Mar 16, 2021

Description of changes:

Style and formatting:

Issue number, if available

Uh oh!

codecov-io commented Mar 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

NihalHarish Mar 16, 2021

Choose a reason for hiding this comment

Uh oh!

ndodda-amazon Mar 17, 2021

Choose a reason for hiding this comment

Uh oh!

ndodda-amazon Mar 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NihalHarish Mar 16, 2021

Choose a reason for hiding this comment

Uh oh!

ndodda-amazon Mar 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ndodda-amazon Mar 17, 2021

Choose a reason for hiding this comment

Uh oh!

ndodda-amazon commented Mar 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-io commented Mar 16, 2021 •

edited

Loading

ndodda-amazon Mar 17, 2021 •

edited

Loading

ndodda-amazon Mar 17, 2021 •

edited

Loading