Adding script to read the events from trace file #246

leleamol · 2020-05-11T21:30:15Z

Description of changes:

This is work in progress script to read the M and X type of trace events from tensorboard.

The changes include the tensorfboard file that can be used to test the script.
The script can:

read the X and M events from trace.
store the timestamps in nanoseconds
Return the events sorted either on start time or end time.
Return the events that are in progress at the given timestamp.
Return the events that are started and finished within the given range.

Style and formatting:

I have run pre-commit install to ensure that auto-formatting happens with every commit.

Issue number, if available

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov-io · 2020-05-11T21:44:14Z

Codecov Report

Merging #246 into master will decrease coverage by 24.77%.
The diff coverage is 88.43%.

@@             Coverage Diff             @@
##           master     #246       +/-   ##
===========================================
- Coverage   85.77%   60.99%   -24.78%     
===========================================
  Files          78       81        +3     
  Lines        5799     5946      +147     
===========================================
- Hits         4974     3627     -1347     
- Misses        825     2319     +1494

Impacted Files	Coverage Δ
smdebug/profiler/tf_profiler_parser.py	`81.03% <81.03%> (ø)`
smdebug/profiler/trace_event_file_parser.py	`93.10% <93.10%> (ø)`
smdebug/profiler/__init__.py	`100.00% <100.00%> (ø)`
smdebug/mxnet/__init__.py	`0.00% <0.00%> (-100.00%)`	⬇️
smdebug/tensorflow/__init__.py	`0.00% <0.00%> (-100.00%)`	⬇️
smdebug/mxnet/singleton_utils.py	`0.00% <0.00%> (-100.00%)`	⬇️
smdebug/tensorflow/singleton_utils.py	`0.00% <0.00%> (-100.00%)`	⬇️
smdebug/tensorflow/collection.py	`0.00% <0.00%> (-96.00%)`	⬇️
smdebug/tensorflow/callable_cache.py	`0.00% <0.00%> (-95.66%)`	⬇️
smdebug/tensorflow/keras.py	`1.67% <0.00%> (-92.37%)`	⬇️
... and 42 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80153e7...fd75207. Read the comment docs.

utils/tensorboard_parser.py

Vikas-kum · 2020-05-11T23:50:20Z

utils/tensorboard_parser.py

+                        self._start_time_known = True
+
+            if phase_type == "X":
+                start_time = (event["ts"] * 1000) - self._start_time_stamp  # In nano seconds


Look for unit key, if not there assume certain unit.
It is assuming that unit is always us,

utils/tensorboard_parser.py

Introduced base event parser and the child classes TFEventProfiler and SMTFEventProfiler

…to address the test failures.

… class.

Vikas-kum

Can you write test SMTFProfiler classs and please make sure the code coverage is close to 100.

Vikas-kum · 2020-05-12T22:37:28Z

smdebug/profiler/trace_event_file_parser.py

+
+        if phase_type == "X":
+            # In nano seconds
+            start_time = (event["ts"] - self._start_timestamp) * self._timescale_multiplier_for_ns


this should be +

smdebug/profiler/trace_event_file_parser.py

Vikas-kum · 2020-05-12T22:43:03Z

tests/profiler/test_tfprofiler_events.py

+    print(f"Number of events read = {num_trace_events}")
+    assert num_trace_events == 1663
+
+    event_list = t_events.get_events_at(15013686)  # nanoseconds


15013686 what is the unit of this ?

Vikas-kum · 2020-05-13T01:55:55Z

smdebug/profiler/trace_event_file_parser.py

+# First Party
+from smdebug.core.logger import get_logger
+
+TIMESCALE_MULTIPLIER = {"ns": 1000, "us": 1, "ms": 0.001, "s": 0.000001}


if tensorflow says displayTimeUnit="ns" , does the time reported there are in ns or some other unit?

I was under impression that displayTimeUnit tells what unit is for time in trace.json file. Individual reader can have its own _timescale_multiplier_for_ns which defines what to multiply to convert into ns time format. Example SMTFProfiler knows events are in us, so multiplier would be 1000 to convert to ns.
NS is time interval that reader normalizes all events to. This normalization will give us unique timeline view if there are timeline files from multiple sources.

Thinking this I kept
{"ns": 1, "us": 1000, "ms": 1000000, "s": 1000000000}

That means that if displayTimeUnit is "us", we will multiply by 1000 and so on.
Can you decribe and write mroe comments about displayTimeUnit and timescale multiplier.

@Vikas-kum As per the Trace event format https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/edit The 'ts' is always in uSecond granularity

"ts: The tracing clock timestamp of the event. The timestamps are provided at microsecond granularity.
"

In case of Tensorboard generated trace file, the 'displayTimeUnit' is specified to be 'ns'. This is a hint to display that for higher precision show the events on 'ns' scale. Therefore, we would have to multiply it by 1000, if we want to keep the same precision.

The short answer, as per the standard, the 'ts' and 'durations' will always be in uSeconds.
It will be up to us whether we want to convert into ns scale for precise calculations. IMO, we should always convert it to ns scale.

yeah in that case we should ignore displayTimeUnit, and always convert from us and default TIMESCLAE_MULTIPLIER to 1000 .

yeah in that case we should ignore displayTimeUnit, and always convert from us and default TIMESCLAE_MULTIPLIER to 1000 .

@Vikas-kum
Ok sounds good. I will remove the code for reading the displayTimeUnit and will always convert the timestamp values (the 'ts' field and start_time_since_epoch_in_micros) from microSeconds to nanoSeconds.
I will also cut down the sample json files for testing.

Vikas-kum

We should also cut down ip-..json file size to some thing below 100.
We don't need 20K lines big file.

…this file.

smdebug/profiler/trace_event_file_parser.py

smdebug/profiler/tf_profiler_parser.py

smdebug/profiler/trace_event_file_parser.py

smdebug/profiler/tf_profiler_parser.py

WIP adding script to read the events from tensorboard trace.

7e7aad6

leleamol requested a review from Vikas-kum May 11, 2020 21:58

Vikas-kum suggested changes May 11, 2020

View reviewed changes

Refactored code

1cd29cc

Introduced base event parser and the child classes TFEventProfiler and SMTFEventProfiler

Vikas-kum changed the title ~~[WIP] Aadding script to read the events from tensorboard trace.~~ [WIP] Adding script to read the events from trace file May 12, 2020

leleamol added 3 commits May 12, 2020 13:46

Added the test file for TFProfilerEvent class. Miscellaneous changes …

970e54f

…to address the test failures.

Added the test to CI

96e3e6b

fixed the error message that was accesing a varibale not declaring in…

b803b53

… class.

Vikas-kum reviewed May 12, 2020

View reviewed changes

smdebug/profiler/trace_event_file_parser.py Show resolved Hide resolved

leleamol added 2 commits May 12, 2020 16:40

Added the tests for smtpprofiler class

f7df7e2

Adding logger to the profiler.

4e20eaf

Vikas-kum reviewed May 13, 2020

View reviewed changes

Vikas-kum suggested changes May 13, 2020

View reviewed changes

Reduced the size of trace file and adjusted the test file related to …

fd75207

…this file.

Vikas-kum reviewed May 13, 2020

View reviewed changes

smdebug/profiler/trace_event_file_parser.py Outdated Show resolved Hide resolved

Vikas-kum suggested changes May 13, 2020

View reviewed changes

Removed the reference to TIMESCALE_MULTIPLIER

670c126

Vikas-kum changed the title ~~[WIP] Adding script to read the events from trace file~~ Adding script to read the events from trace file May 13, 2020

Vikas-kum approved these changes May 13, 2020

View reviewed changes

Vikas-kum merged commit 4cac480 into master May 13, 2020

Vikas-kum deleted the trace-events-reader branch May 13, 2020 18:54

Adding script to read the events from trace file #246

Adding script to read the events from trace file #246

Uh oh!

Conversation

leleamol commented May 11, 2020

Description of changes:

Style and formatting:

Issue number, if available

Uh oh!

codecov-io commented May 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Vikas-kum May 11, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Vikas-kum left a comment

Choose a reason for hiding this comment

Uh oh!

Vikas-kum May 12, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Vikas-kum May 12, 2020

Choose a reason for hiding this comment

Uh oh!

Vikas-kum May 13, 2020

Choose a reason for hiding this comment

Uh oh!

leleamol May 13, 2020

Choose a reason for hiding this comment

Uh oh!

Vikas-kum May 13, 2020

Choose a reason for hiding this comment

Uh oh!

leleamol May 13, 2020

Choose a reason for hiding this comment

Uh oh!

Vikas-kum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-io commented May 11, 2020 •

edited

Loading