-
Notifications
You must be signed in to change notification settings - Fork 82
Adding script to read the events from trace file #246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #246 +/- ##
===========================================
- Coverage 85.77% 60.99% -24.78%
===========================================
Files 78 81 +3
Lines 5799 5946 +147
===========================================
- Hits 4974 3627 -1347
- Misses 825 2319 +1494
Continue to review full report at Codecov.
|
utils/tensorboard_parser.py
Outdated
| self._start_time_known = True | ||
|
|
||
| if phase_type == "X": | ||
| start_time = (event["ts"] * 1000) - self._start_time_stamp # In nano seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look for unit key, if not there assume certain unit.
It is assuming that unit is always us,
Introduced base event parser and the child classes TFEventProfiler and SMTFEventProfiler
Vikas-kum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write test SMTFProfiler classs and please make sure the code coverage is close to 100.
|
|
||
| if phase_type == "X": | ||
| # In nano seconds | ||
| start_time = (event["ts"] - self._start_timestamp) * self._timescale_multiplier_for_ns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be +
| print(f"Number of events read = {num_trace_events}") | ||
| assert num_trace_events == 1663 | ||
|
|
||
| event_list = t_events.get_events_at(15013686) # nanoseconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
15013686 what is the unit of this ?
| # First Party | ||
| from smdebug.core.logger import get_logger | ||
|
|
||
| TIMESCALE_MULTIPLIER = {"ns": 1000, "us": 1, "ms": 0.001, "s": 0.000001} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if tensorflow says displayTimeUnit="ns" , does the time reported there are in ns or some other unit?
I was under impression that displayTimeUnit tells what unit is for time in trace.json file. Individual reader can have its own _timescale_multiplier_for_ns which defines what to multiply to convert into ns time format. Example SMTFProfiler knows events are in us, so multiplier would be 1000 to convert to ns.
NS is time interval that reader normalizes all events to. This normalization will give us unique timeline view if there are timeline files from multiple sources.
Thinking this I kept
{"ns": 1, "us": 1000, "ms": 1000000, "s": 1000000000}
That means that if displayTimeUnit is "us", we will multiply by 1000 and so on.
Can you decribe and write mroe comments about displayTimeUnit and timescale multiplier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vikas-kum As per the Trace event format https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/edit The 'ts' is always in uSecond granularity
"ts: The tracing clock timestamp of the event. The timestamps are provided at microsecond granularity.
"
In case of Tensorboard generated trace file, the 'displayTimeUnit' is specified to be 'ns'. This is a hint to display that for higher precision show the events on 'ns' scale. Therefore, we would have to multiply it by 1000, if we want to keep the same precision.
The short answer, as per the standard, the 'ts' and 'durations' will always be in uSeconds.
It will be up to us whether we want to convert into ns scale for precise calculations. IMO, we should always convert it to ns scale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah in that case we should ignore displayTimeUnit, and always convert from us and default TIMESCLAE_MULTIPLIER to 1000 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah in that case we should ignore displayTimeUnit, and always convert from us and default TIMESCLAE_MULTIPLIER to 1000 .
@Vikas-kum
Ok sounds good. I will remove the code for reading the displayTimeUnit and will always convert the timestamp values (the 'ts' field and start_time_since_epoch_in_micros) from microSeconds to nanoSeconds.
I will also cut down the sample json files for testing.
Vikas-kum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also cut down ip-..json file size to some thing below 100.
We don't need 20K lines big file.
Description of changes:
This is work in progress script to read the M and X type of trace events from tensorboard.
The changes include the tensorfboard file that can be used to test the script.
The script can:
Style and formatting:
I have run
pre-commit installto ensure that auto-formatting happens with every commit.Issue number, if available
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.