-
Notifications
You must be signed in to change notification settings - Fork 1.7k
hparams: read from generic data APIs #3419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Until now, `context.data_provider` was provided unless `generic_data` was explicitly set to `false`. Now, it’s provided in all cases (except the legacy DB modes). This was always expected to be safe, and has been live by default since TensorBoard 1.15.0. This is needed to write plugins that unconditionally assume that a data provider is available. A consequence of this is that `is_active`’s use of the experimental `list_plugins` method is now always on as well. This has been on by default for two months, but not yet in any release (other than the upcoming TensorBoard 2.2.0), so it’s slightly riskier, but it’s also expected to be safe. Test Plan: Run TensorBoard with `--generic_data false`. If pointing at an empty logdir, no plugins are shown as active. If pointing at a full plugin, all appropriate plugins are shown as active. The debugger plugin is still marked as active if `--debugger_port` is specified and as inactive if no relevant flags are given, so the fallback `is_active` calls are still working. wchargin-branch: backend-always-data-provider
Summary: Until now, the hparams plugin has been active only when hparams data is present *and* the scalars plugin is loaded and active. This cross-plugin dependency is unique and doesn’t fit well into the data provider world, so we remove it. The hparams plugin is now active iff hparams data is available, via the standard `data_plugin_names` API. Correspondingly, we loosen the error handling in parts of the plugin that rely on the scalars plugin being active. As long as the scalars plugin is included in the TensorBoard instance, this should not be a problem: TensorBoard core loads all plugins before accepting requests to any of them. If the hparams plugin is used in an installation with no scalars plugin, the errors will now be 404s instead of 500s. This seems acceptable, as running without a scalars plugin is not a common pattern. Test Plan: The hparams plugin is still marked as active in logdirs that contain hparams data and inactive in logdirs that don’t. wchargin-branch: hparams-data-provider-is-active wchargin-source: 19dedbb932881bbdb471b4d3d9796be47d1e8956
Summary: The hparams plugin makes three kinds of queries against the multiplexer: listing hparams metadata, listing scalars metadata, and reading scalar points. This commit encapsulates each such operation in a method so that we can migrate them to use data provider APIs. For now, the underlying implementations are unchanged. Test Plan: Running `git grep '\.multiplexer' tensorboard/plugins/hparams/` shows only matches in `backend_context.py` and tests. Existing tests pass, and all three views of the hparams plugin, including embedded scalar charts and the data download links, still work with and without `--generic_data true` (though in all cases the plugin actually still reads directly from the multiplexer). wchargin-branch: hparams-encapsulate-mux wchargin-source: a583e4f940723bd01dc433eb83e24776b686dc44
Summary: All data access in the hparams plugin is now performed via methods that require an experiment ID. The ID is currently unused except for an assertion. In a subsequent change, these methods will switch to using a data provider instead of the multiplexer, at which point the experiment ID will be required. Test Plan: Unit tests suffice for the internals. To check the wiring at the plugin level, note that all three views of the hparams plugin, including embedded scalar charts and the download links, still render properly. wchargin-branch: hparams-thread-eid wchargin-source: c82e1e20f220436002f95bf8f9575839b26c9293
Summary: This commit changes the data access methods in `hparams.backend_context` to read from the context’s data provider rather than directly from its multiplexer. All data access in the hparams plugin already goes through these methods. Test Plan: Ideally, existing tests would have sufficed, but the tests made heavy use of mocking and so required manual intervention. The updated tests pass, and an inspection of the various views of the hparams dashboard (including embedded scalar charts and data download links) checks out, even when TensorBoard core is patched to not provide a multiplexer on the context. wchargin-branch: hparams-generic-data wchargin-source: 13a548280e753556f7fdb1378040db17c78061f3
nfelt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting this working!
| ) | ||
| for e in data | ||
| ] | ||
| return self._deprecated_multiplexer.Tensors(run, tag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, thanks!
| if data is None: | ||
| raise KeyError("No scalar data for run=%r, tag=%r" % (run, tag)) | ||
| return [ | ||
| event_accumulator.TensorEvent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe file a cleanup github issue to remove the vestigial bits of the multiplexer API from the hparams plugin and/or from plugins generally after they're migrated? Would be nice to at least leave a TODO here, since it's kind of weird to still have TensorEvent used for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely. I was planning to do so right after these changes
(switch these functions to directly return the data provider results and
simultaneously migrate callers), but wanted to get these in first to
unblock some internal work. Filed #3425.
For what it’s worth, this is not a performance bottleneck. The
post-processing takes on the order of 1–5 microseconds. The call to
data_provider.read_scalars is slow because it calls list_scalars
internally, and list_scalars itself iterates over all the tensors to
generate metadata that we just throw away. It currently takes on the
order of 10 milliseconds, so that’s not good. And the hparams usage of
read_scalars is slow, too, because we call it once for each of the
O(|metrics| × |sessions|) time series instead of passing a single
RunTagFilter that represents the cross product, so that’s not good,
either. The data-munging here is negligible in comparison.
(I will see if I can speed up _list_scalars before merging this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wasn't concerned at all about performance. Just seemed like a vestigial weirdness to leave behind - thanks for filing the issue!
| def setUp(self): | ||
| self._mock_tb_context = mock.create_autospec(base_plugin.TBContext) | ||
| self._mock_tb_context = base_plugin.TBContext() | ||
| self._mock_multiplexer = mock.create_autospec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here re: having a TODO to clean up the test code here so that if it's going to use mocks it at least mocks the DataProvider APIs directly rather than mocking EventMultiplexer and then wrapping that in MultiplexerDataProvider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
| def setUp(self): | ||
| self._mock_tb_context = mock.create_autospec(base_plugin.TBContext) | ||
| self._mock_tb_context = base_plugin.TBContext() | ||
| self._mock_multiplexer = mock.create_autospec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here re: TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
wchargin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
| ) | ||
| for e in data | ||
| ] | ||
| return self._deprecated_multiplexer.Tensors(run, tag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, thanks!
| if data is None: | ||
| raise KeyError("No scalar data for run=%r, tag=%r" % (run, tag)) | ||
| return [ | ||
| event_accumulator.TensorEvent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely. I was planning to do so right after these changes
(switch these functions to directly return the data provider results and
simultaneously migrate callers), but wanted to get these in first to
unblock some internal work. Filed #3425.
For what it’s worth, this is not a performance bottleneck. The
post-processing takes on the order of 1–5 microseconds. The call to
data_provider.read_scalars is slow because it calls list_scalars
internally, and list_scalars itself iterates over all the tensors to
generate metadata that we just throw away. It currently takes on the
order of 10 milliseconds, so that’s not good. And the hparams usage of
read_scalars is slow, too, because we call it once for each of the
O(|metrics| × |sessions|) time series instead of passing a single
RunTagFilter that represents the cross product, so that’s not good,
either. The data-munging here is negligible in comparison.
(I will see if I can speed up _list_scalars before merging this PR.)
| def setUp(self): | ||
| self._mock_tb_context = mock.create_autospec(base_plugin.TBContext) | ||
| self._mock_tb_context = base_plugin.TBContext() | ||
| self._mock_multiplexer = mock.create_autospec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
| def setUp(self): | ||
| self._mock_tb_context = mock.create_autospec(base_plugin.TBContext) | ||
| self._mock_tb_context = base_plugin.TBContext() | ||
| self._mock_multiplexer = mock.create_autospec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
| if data is None: | ||
| raise KeyError("No scalar data for run=%r, tag=%r" % (run, tag)) | ||
| return [ | ||
| event_accumulator.TensorEvent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wasn't concerned at all about performance. Just seemed like a vestigial weirdness to leave behind - thanks for filing the issue!
| ] | ||
| return self._deprecated_multiplexer.Tensors(run, tag) | ||
|
|
||
| def _find_experiment_tag(self, experiment_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above in this PR hparams_metadata() will resolve now to an unfiltered list_tensors() call, but in this particular callsite within _find_experiment_tag() we only want a single tag, so IMO it'd be nice if we just call the data provider with an appropriate tag filter rather than postfiltering all tags. (Or optionally I guess hparams_metadata() could learn a tags argument, if you prefer.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point; thanks. Added run_tag_filter argument, which is a nice
simplification.
Summary: Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to `list_scalars` (resp. `list_tensors`) to find the set of time series to read. This is slower than it might sound, because `list_scalars` itself needs to scan over all relevant `multiplexer.Tensors` to identify `max_step` and `max_wall_time`, which are thrown away by `read_scalars`. (That `list_scalars` needs this full scan at all is its own issue; ideally, these would be memoized onto the event multiplexer.) When a `RunTagFilter` specifying a single run and tag is given, we optimize further by requesting individual `SummaryMetadata` rather than paring down `AllSummaryMetadata`. Resolves a comment of @nfelt on #2980: <#2980 (comment)> Test Plan: When applied on top of #3419, `:list_session_groups_test` improves from taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t seem to fully generalize; I see only ~13% speedups in a microbenchmark that hammers `read_scalars` on a logdir with all the demo data, but the improvement on that test is important. wchargin-branch: data-read-without-list wchargin-source: bc728c60dcb0039a6f802eaf154205b7161e4796
wchargin-branch: hparams-encapsulate-mux wchargin-source: 118c7f3a1c7c29f5983b3ad2547669d9342e6091
wchargin-branch: hparams-thread-eid wchargin-source: 14bd159933cd83968f0a1fa891853695d7bcd954
wchargin-branch: hparams-generic-data wchargin-source: cdbc1c14f85d567c26ed174e14c5b5f7b46f8850
wchargin-branch: hparams-generic-data wchargin-source: cdbc1c14f85d567c26ed174e14c5b5f7b46f8850
wchargin-branch: data-read-without-list wchargin-source: d768ced329672f2b307bd25681f111ebe1b929a8
wchargin-branch: data-read-without-list wchargin-source: d768ced329672f2b307bd25681f111ebe1b929a8
wchargin-branch: hparams-encapsulate-mux wchargin-source: fb03f6591420b8a6f7f150dd4e30dd593d5f111e
wchargin-branch: hparams-thread-eid wchargin-source: f7768a2fe912914cc4dde2b22e8dd7be0d62912b
wchargin-branch: hparams-generic-data wchargin-source: d5f1853702bbf77dd0e072994806ce77b5bc775a
wchargin-branch: data-blob-sequence-tests wchargin-source: 664b9b53b60a76eacbd85ecca3335e62c172acf0
wchargin-branch: data-read-without-list wchargin-source: 674ec15dd632579245b7a4aed36a8cc654928123
wchargin-branch: hparams-encapsulate-mux wchargin-source: f1350f44bdb92a4f70430f30e54023c27ac42ca4
wchargin-branch: hparams-thread-eid wchargin-source: 518152451965bd3dadfe4737d888b9cdced56f85
wchargin-branch: hparams-generic-data wchargin-source: 77e550e62e8ee9efa7f57dfe660ca11e3ae048e8
wchargin-branch: hparams-generic-data wchargin-source: 5f8c33f41daaf25676b878ca375d1bd46125c3d8
|
The perf improvements from #3433 do have similar effect in google3, |
wchargin-branch: data-blob-sequence-tests wchargin-source: e657e58995373fa202f2ec50573c23c1aa56c947
wchargin-branch: data-read-without-list wchargin-source: 6fc6b606944cb4e40a36a95cc76221ce9471c3e3
wchargin-branch: hparams-generic-data wchargin-source: ffceb95a6b175d50bdc82c46ce0672f961283217 # Conflicts: # tensorboard/plugins/hparams/backend_context.py
wchargin-branch: hparams-generic-data wchargin-source: ffceb95a6b175d50bdc82c46ce0672f961283217
| tags=[metadata.EXPERIMENT_TAG] | ||
| ), | ||
| ) | ||
| if not mapping: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: next(iter(mapping.values())) seems slightly simpler than feeding the key back into the dict? Same below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes; that’s nicer. Done.
| run_tag_filter=provider.RunTagFilter( | ||
| tags=[metadata.EXPERIMENT_TAG] | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This momentarily confused me between the hparams notion of experiment and the experiment ID experiment. Perhaps slightly clearer to say "expect only a single run/tag pair with HParams experiment plugin data"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, with slightly different wording because “HParams experiment plugin
data” took me more than one try to parse.
wchargin-branch: data-blob-sequence-tests wchargin-source: 317d4fc9ae0fb952360f5aa7a2f8c235ffc6b177
wchargin-branch: data-read-without-list wchargin-source: 3edb09cd5991191a6e5947388ad263770bbfb16b
wchargin-branch: hparams-generic-data wchargin-source: 8ec113f5dcba7ba0e84da91ab51a1f01dbe43794
wchargin-branch: data-read-without-list wchargin-source: e80cdf315b1ae6ed31e5e60f43361a4f1d45a0ee
wchargin-branch: data-read-without-list wchargin-source: e80cdf315b1ae6ed31e5e60f43361a4f1d45a0ee
wchargin-branch: hparams-generic-data wchargin-source: 93b60837e079f5a947297387e0a327780a278f01
wchargin-branch: hparams-generic-data wchargin-source: 15ce10b654500da9cb99cb4dbb89c96450e169bc
Summary: Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to `list_scalars` (resp. `list_tensors`) to find the set of time series to read. This is slower than it might sound, because `list_scalars` itself needs to scan over all relevant `multiplexer.Tensors` to identify `max_step` and `max_wall_time`, which are thrown away by `read_scalars`. (That `list_scalars` needs this full scan at all is its own issue; ideally, these would be memoized onto the event multiplexer.) When a `RunTagFilter` specifying a single run and tag is given, we optimize further by requesting individual `SummaryMetadata` rather than paring down `AllSummaryMetadata`. Resolves a comment of @nfelt on #2980: <#2980 (comment)> Test Plan: When applied on top of #3419, `:list_session_groups_test` improves from taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t seem to fully generalize; I see only ~13% speedups in a microbenchmark that hammers `read_scalars` on a logdir with all the demo data, but the improvement on that test is important. wchargin-branch: data-read-without-list
wchargin-branch: hparams-generic-data wchargin-source: f23d640e574813eb259e251f58c3992545cbdae0
Summary: Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to `list_scalars` (resp. `list_tensors`) to find the set of time series to read. This is slower than it might sound, because `list_scalars` itself needs to scan over all relevant `multiplexer.Tensors` to identify `max_step` and `max_wall_time`, which are thrown away by `read_scalars`. (That `list_scalars` needs this full scan at all is its own issue; ideally, these would be memoized onto the event multiplexer.) When a `RunTagFilter` specifying a single run and tag is given, we optimize further by requesting individual `SummaryMetadata` rather than paring down `AllSummaryMetadata`. Resolves a comment of @nfelt on tensorflow#2980: <tensorflow#2980 (comment)> Test Plan: When applied on top of tensorflow#3419, `:list_session_groups_test` improves from taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t seem to fully generalize; I see only ~13% speedups in a microbenchmark that hammers `read_scalars` on a logdir with all the demo data, but the improvement on that test is important. wchargin-branch: data-read-without-list
Summary: This commit changes the data access methods in `hparams.backend_context` to read from the context’s data provider rather than directly from its multiplexer. All data access in the hparams plugin already goes through these methods. Test Plan: Ideally, existing tests would have sufficed, but the tests made heavy use of mocking and so required manual intervention. The updated tests pass, and an inspection of the various views of the hparams dashboard (including embedded scalar charts and data download links) checks out, even when TensorBoard core is patched to not provide a multiplexer on the context. wchargin-branch: hparams-generic-data
Summary: Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to `list_scalars` (resp. `list_tensors`) to find the set of time series to read. This is slower than it might sound, because `list_scalars` itself needs to scan over all relevant `multiplexer.Tensors` to identify `max_step` and `max_wall_time`, which are thrown away by `read_scalars`. (That `list_scalars` needs this full scan at all is its own issue; ideally, these would be memoized onto the event multiplexer.) When a `RunTagFilter` specifying a single run and tag is given, we optimize further by requesting individual `SummaryMetadata` rather than paring down `AllSummaryMetadata`. Resolves a comment of @nfelt on #2980: <#2980 (comment)> Test Plan: When applied on top of #3419, `:list_session_groups_test` improves from taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t seem to fully generalize; I see only ~13% speedups in a microbenchmark that hammers `read_scalars` on a logdir with all the demo data, but the improvement on that test is important. wchargin-branch: data-read-without-list
Summary: This commit changes the data access methods in `hparams.backend_context` to read from the context’s data provider rather than directly from its multiplexer. All data access in the hparams plugin already goes through these methods. Test Plan: Ideally, existing tests would have sufficed, but the tests made heavy use of mocking and so required manual intervention. The updated tests pass, and an inspection of the various views of the hparams dashboard (including embedded scalar charts and data download links) checks out, even when TensorBoard core is patched to not provide a multiplexer on the context. wchargin-branch: hparams-generic-data
Summary:
This commit changes the data access methods in
hparams.backend_contextto read from the context’s data provider rather than directly from its
multiplexer. All data access in the hparams plugin already goes through
these methods.
Test Plan:
Ideally, existing tests would have sufficed, but the tests made heavy
use of mocking and so required manual intervention. The updated tests
pass, and an inspection of the various views of the hparams dashboard
(including embedded scalar charts and data download links) checks out,
even when TensorBoard core is patched to not provide a multiplexer on
the context.
wchargin-branch: hparams-generic-data