Skip to content

Conversation

@wchargin
Copy link
Contributor

Summary:
This commit changes the data access methods in hparams.backend_context
to read from the context’s data provider rather than directly from its
multiplexer. All data access in the hparams plugin already goes through
these methods.

Test Plan:
Ideally, existing tests would have sufficed, but the tests made heavy
use of mocking and so required manual intervention. The updated tests
pass, and an inspection of the various views of the hparams dashboard
(including embedded scalar charts and data download links) checks out,
even when TensorBoard core is patched to not provide a multiplexer on
the context.

wchargin-branch: hparams-generic-data

Summary:
Until now, `context.data_provider` was provided unless `generic_data`
was explicitly set to `false`. Now, it’s provided in all cases (except
the legacy DB modes). This was always expected to be safe, and has been
live by default since TensorBoard 1.15.0.

This is needed to write plugins that unconditionally assume that a data
provider is available.

A consequence of this is that `is_active`’s use of the experimental
`list_plugins` method is now always on as well. This has been on by
default for two months, but not yet in any release (other than the
upcoming TensorBoard 2.2.0), so it’s slightly riskier, but it’s also
expected to be safe.

Test Plan:
Run TensorBoard with `--generic_data false`. If pointing at an empty
logdir, no plugins are shown as active. If pointing at a full plugin,
all appropriate plugins are shown as active. The debugger plugin is
still marked as active if `--debugger_port` is specified and as inactive
if no relevant flags are given, so the fallback `is_active` calls are
still working.

wchargin-branch: backend-always-data-provider
Summary:
Until now, the hparams plugin has been active only when hparams data is
present *and* the scalars plugin is loaded and active. This cross-plugin
dependency is unique and doesn’t fit well into the data provider world,
so we remove it. The hparams plugin is now active iff hparams data is
available, via the standard `data_plugin_names` API.

Correspondingly, we loosen the error handling in parts of the plugin
that rely on the scalars plugin being active. As long as the scalars
plugin is included in the TensorBoard instance, this should not be a
problem: TensorBoard core loads all plugins before accepting requests to
any of them. If the hparams plugin is used in an installation with no
scalars plugin, the errors will now be 404s instead of 500s. This seems
acceptable, as running without a scalars plugin is not a common pattern.

Test Plan:
The hparams plugin is still marked as active in logdirs that contain
hparams data and inactive in logdirs that don’t.

wchargin-branch: hparams-data-provider-is-active
wchargin-source: 19dedbb932881bbdb471b4d3d9796be47d1e8956
Summary:
The hparams plugin makes three kinds of queries against the multiplexer:
listing hparams metadata, listing scalars metadata, and reading scalar
points. This commit encapsulates each such operation in a method so that
we can migrate them to use data provider APIs. For now, the underlying
implementations are unchanged.

Test Plan:
Running `git grep '\.multiplexer' tensorboard/plugins/hparams/` shows
only matches in `backend_context.py` and tests. Existing tests pass, and
all three views of the hparams plugin, including embedded scalar charts
and the data download links, still work with and without `--generic_data
true` (though in all cases the plugin actually still reads directly from
the multiplexer).

wchargin-branch: hparams-encapsulate-mux
wchargin-source: a583e4f940723bd01dc433eb83e24776b686dc44
Summary:
All data access in the hparams plugin is now performed via methods that
require an experiment ID. The ID is currently unused except for an
assertion. In a subsequent change, these methods will switch to using
a data provider instead of the multiplexer, at which point the
experiment ID will be required.

Test Plan:
Unit tests suffice for the internals. To check the wiring at the plugin
level, note that all three views of the hparams plugin, including
embedded scalar charts and the download links, still render properly.

wchargin-branch: hparams-thread-eid
wchargin-source: c82e1e20f220436002f95bf8f9575839b26c9293
Summary:
This commit changes the data access methods in `hparams.backend_context`
to read from the context’s data provider rather than directly from its
multiplexer. All data access in the hparams plugin already goes through
these methods.

Test Plan:
Ideally, existing tests would have sufficed, but the tests made heavy
use of mocking and so required manual intervention. The updated tests
pass, and an inspection of the various views of the hparams dashboard
(including embedded scalar charts and data download links) checks out,
even when TensorBoard core is patched to not provide a multiplexer on
the context.

wchargin-branch: hparams-generic-data
wchargin-source: 13a548280e753556f7fdb1378040db17c78061f3
Copy link
Contributor

@nfelt nfelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this working!

)
for e in data
]
return self._deprecated_multiplexer.Tensors(run, tag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, thanks!

if data is None:
raise KeyError("No scalar data for run=%r, tag=%r" % (run, tag))
return [
event_accumulator.TensorEvent(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe file a cleanup github issue to remove the vestigial bits of the multiplexer API from the hparams plugin and/or from plugins generally after they're migrated? Would be nice to at least leave a TODO here, since it's kind of weird to still have TensorEvent used for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely. I was planning to do so right after these changes
(switch these functions to directly return the data provider results and
simultaneously migrate callers), but wanted to get these in first to
unblock some internal work. Filed #3425.

For what it’s worth, this is not a performance bottleneck. The
post-processing takes on the order of 1–5 microseconds. The call to
data_provider.read_scalars is slow because it calls list_scalars
internally, and list_scalars itself iterates over all the tensors to
generate metadata that we just throw away. It currently takes on the
order of 10 milliseconds, so that’s not good. And the hparams usage of
read_scalars is slow, too, because we call it once for each of the
O(|metrics| × |sessions|) time series instead of passing a single
RunTagFilter that represents the cross product, so that’s not good,
either. The data-munging here is negligible in comparison.

(I will see if I can speed up _list_scalars before merging this PR.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't concerned at all about performance. Just seemed like a vestigial weirdness to leave behind - thanks for filing the issue!

def setUp(self):
self._mock_tb_context = mock.create_autospec(base_plugin.TBContext)
self._mock_tb_context = base_plugin.TBContext()
self._mock_multiplexer = mock.create_autospec(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here re: having a TODO to clean up the test code here so that if it's going to use mocks it at least mocks the DataProvider APIs directly rather than mocking EventMultiplexer and then wrapping that in MultiplexerDataProvider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

def setUp(self):
self._mock_tb_context = mock.create_autospec(base_plugin.TBContext)
self._mock_tb_context = base_plugin.TBContext()
self._mock_multiplexer = mock.create_autospec(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here re: TODO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor Author

@wchargin wchargin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

)
for e in data
]
return self._deprecated_multiplexer.Tensors(run, tag)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, thanks!

if data is None:
raise KeyError("No scalar data for run=%r, tag=%r" % (run, tag))
return [
event_accumulator.TensorEvent(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely. I was planning to do so right after these changes
(switch these functions to directly return the data provider results and
simultaneously migrate callers), but wanted to get these in first to
unblock some internal work. Filed #3425.

For what it’s worth, this is not a performance bottleneck. The
post-processing takes on the order of 1–5 microseconds. The call to
data_provider.read_scalars is slow because it calls list_scalars
internally, and list_scalars itself iterates over all the tensors to
generate metadata that we just throw away. It currently takes on the
order of 10 milliseconds, so that’s not good. And the hparams usage of
read_scalars is slow, too, because we call it once for each of the
O(|metrics| × |sessions|) time series instead of passing a single
RunTagFilter that represents the cross product, so that’s not good,
either. The data-munging here is negligible in comparison.

(I will see if I can speed up _list_scalars before merging this PR.)

def setUp(self):
self._mock_tb_context = mock.create_autospec(base_plugin.TBContext)
self._mock_tb_context = base_plugin.TBContext()
self._mock_multiplexer = mock.create_autospec(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

def setUp(self):
self._mock_tb_context = mock.create_autospec(base_plugin.TBContext)
self._mock_tb_context = base_plugin.TBContext()
self._mock_multiplexer = mock.create_autospec(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

if data is None:
raise KeyError("No scalar data for run=%r, tag=%r" % (run, tag))
return [
event_accumulator.TensorEvent(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't concerned at all about performance. Just seemed like a vestigial weirdness to leave behind - thanks for filing the issue!

]
return self._deprecated_multiplexer.Tensors(run, tag)

def _find_experiment_tag(self, experiment_id):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above in this PR hparams_metadata() will resolve now to an unfiltered list_tensors() call, but in this particular callsite within _find_experiment_tag() we only want a single tag, so IMO it'd be nice if we just call the data provider with an appropriate tag filter rather than postfiltering all tags. (Or optionally I guess hparams_metadata() could learn a tags argument, if you prefer.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; thanks. Added run_tag_filter argument, which is a nice
simplification.

Summary:
Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to
`list_scalars` (resp. `list_tensors`) to find the set of time series to
read. This is slower than it might sound, because `list_scalars` itself
needs to scan over all relevant `multiplexer.Tensors` to identify
`max_step` and `max_wall_time`, which are thrown away by `read_scalars`.
(That `list_scalars` needs this full scan at all is its own issue;
ideally, these would be memoized onto the event multiplexer.)

When a `RunTagFilter` specifying a single run and tag is given, we
optimize further by requesting individual `SummaryMetadata` rather than
paring down `AllSummaryMetadata`.

Resolves a comment of @nfelt on #2980:
<#2980 (comment)>

Test Plan:
When applied on top of #3419, `:list_session_groups_test` improves from
taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t
seem to fully generalize; I see only ~13% speedups in a microbenchmark
that hammers `read_scalars` on a logdir with all the demo data, but the
improvement on that test is important.

wchargin-branch: data-read-without-list
wchargin-source: bc728c60dcb0039a6f802eaf154205b7161e4796
wchargin added 16 commits March 25, 2020 18:15
wchargin-branch: hparams-encapsulate-mux
wchargin-source: 118c7f3a1c7c29f5983b3ad2547669d9342e6091
wchargin-branch: hparams-thread-eid
wchargin-source: 14bd159933cd83968f0a1fa891853695d7bcd954
wchargin-branch: hparams-generic-data
wchargin-source: cdbc1c14f85d567c26ed174e14c5b5f7b46f8850
wchargin-branch: hparams-generic-data
wchargin-source: cdbc1c14f85d567c26ed174e14c5b5f7b46f8850
Summary:
Follow-up to #2991. Fixes #3434.

Test Plan:
Tests pass as written.

wchargin-branch: data-blob-sequence-tests
wchargin-source: fbd3302933cb0c50609df970edf137202723c769
wchargin-branch: data-read-without-list
wchargin-source: d768ced329672f2b307bd25681f111ebe1b929a8
wchargin-branch: data-read-without-list
wchargin-source: d768ced329672f2b307bd25681f111ebe1b929a8
wchargin-branch: hparams-encapsulate-mux
wchargin-source: fb03f6591420b8a6f7f150dd4e30dd593d5f111e
wchargin-branch: hparams-thread-eid
wchargin-source: f7768a2fe912914cc4dde2b22e8dd7be0d62912b
wchargin-branch: hparams-generic-data
wchargin-source: d5f1853702bbf77dd0e072994806ce77b5bc775a
wchargin-branch: data-blob-sequence-tests
wchargin-source: 664b9b53b60a76eacbd85ecca3335e62c172acf0
wchargin-branch: data-read-without-list
wchargin-source: 674ec15dd632579245b7a4aed36a8cc654928123
wchargin-branch: hparams-encapsulate-mux
wchargin-source: f1350f44bdb92a4f70430f30e54023c27ac42ca4
wchargin-branch: hparams-thread-eid
wchargin-source: 518152451965bd3dadfe4737d888b9cdced56f85
wchargin-branch: hparams-generic-data
wchargin-source: 77e550e62e8ee9efa7f57dfe660ca11e3ae048e8
wchargin-branch: hparams-generic-data
wchargin-source: 5f8c33f41daaf25676b878ca375d1bd46125c3d8
@wchargin
Copy link
Contributor Author

The perf improvements from #3433 do have similar effect in google3,
taking the average test run time (over 100 trials) to… 59.9 seconds!
That’s just a wee bit close to the timeout, so I’m setting bumping it to
moderate to avoid flakiness, while noting that doing less work (by
making only one read_scalars call) is the real solution.

@wchargin wchargin changed the base branch from wchargin-hparams-thread-eid to wchargin-data-read-without-list March 26, 2020 16:21
wchargin-branch: data-blob-sequence-tests
wchargin-source: e657e58995373fa202f2ec50573c23c1aa56c947
wchargin-branch: data-read-without-list
wchargin-source: 6fc6b606944cb4e40a36a95cc76221ce9471c3e3
wchargin-branch: hparams-generic-data
wchargin-source: ffceb95a6b175d50bdc82c46ce0672f961283217

# Conflicts:
#	tensorboard/plugins/hparams/backend_context.py
wchargin-branch: hparams-generic-data
wchargin-source: ffceb95a6b175d50bdc82c46ce0672f961283217
tags=[metadata.EXPERIMENT_TAG]
),
)
if not mapping:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: next(iter(mapping.values())) seems slightly simpler than feeding the key back into the dict? Same below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes; that’s nicer. Done.

run_tag_filter=provider.RunTagFilter(
tags=[metadata.EXPERIMENT_TAG]
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This momentarily confused me between the hparams notion of experiment and the experiment ID experiment. Perhaps slightly clearer to say "expect only a single run/tag pair with HParams experiment plugin data"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, with slightly different wording because “HParams experiment plugin
data” took me more than one try to parse.

wchargin-branch: data-blob-sequence-tests
wchargin-source: 317d4fc9ae0fb952360f5aa7a2f8c235ffc6b177
wchargin-branch: data-read-without-list
wchargin-source: 3edb09cd5991191a6e5947388ad263770bbfb16b
wchargin-branch: hparams-generic-data
wchargin-source: 8ec113f5dcba7ba0e84da91ab51a1f01dbe43794
wchargin-branch: data-read-without-list
wchargin-source: e80cdf315b1ae6ed31e5e60f43361a4f1d45a0ee
wchargin-branch: data-read-without-list
wchargin-source: e80cdf315b1ae6ed31e5e60f43361a4f1d45a0ee
wchargin-branch: hparams-generic-data
wchargin-source: 93b60837e079f5a947297387e0a327780a278f01
wchargin-branch: hparams-generic-data
wchargin-source: 15ce10b654500da9cb99cb4dbb89c96450e169bc
@wchargin wchargin changed the base branch from wchargin-data-read-without-list to master March 26, 2020 19:21
wchargin added a commit that referenced this pull request Mar 26, 2020
Summary:
Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to
`list_scalars` (resp. `list_tensors`) to find the set of time series to
read. This is slower than it might sound, because `list_scalars` itself
needs to scan over all relevant `multiplexer.Tensors` to identify
`max_step` and `max_wall_time`, which are thrown away by `read_scalars`.
(That `list_scalars` needs this full scan at all is its own issue;
ideally, these would be memoized onto the event multiplexer.)

When a `RunTagFilter` specifying a single run and tag is given, we
optimize further by requesting individual `SummaryMetadata` rather than
paring down `AllSummaryMetadata`.

Resolves a comment of @nfelt on #2980:
<#2980 (comment)>

Test Plan:
When applied on top of #3419, `:list_session_groups_test` improves from
taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t
seem to fully generalize; I see only ~13% speedups in a microbenchmark
that hammers `read_scalars` on a logdir with all the demo data, but the
improvement on that test is important.

wchargin-branch: data-read-without-list
wchargin-branch: hparams-generic-data
wchargin-source: f23d640e574813eb259e251f58c3992545cbdae0
@wchargin wchargin merged commit 7b7486d into master Mar 26, 2020
@wchargin wchargin deleted the wchargin-hparams-generic-data branch March 26, 2020 19:56
bileschi pushed a commit to bileschi/tensorboard that referenced this pull request Apr 15, 2020
Summary:
Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to
`list_scalars` (resp. `list_tensors`) to find the set of time series to
read. This is slower than it might sound, because `list_scalars` itself
needs to scan over all relevant `multiplexer.Tensors` to identify
`max_step` and `max_wall_time`, which are thrown away by `read_scalars`.
(That `list_scalars` needs this full scan at all is its own issue;
ideally, these would be memoized onto the event multiplexer.)

When a `RunTagFilter` specifying a single run and tag is given, we
optimize further by requesting individual `SummaryMetadata` rather than
paring down `AllSummaryMetadata`.

Resolves a comment of @nfelt on tensorflow#2980:
<tensorflow#2980 (comment)>

Test Plan:
When applied on top of tensorflow#3419, `:list_session_groups_test` improves from
taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t
seem to fully generalize; I see only ~13% speedups in a microbenchmark
that hammers `read_scalars` on a logdir with all the demo data, but the
improvement on that test is important.

wchargin-branch: data-read-without-list
bileschi pushed a commit to bileschi/tensorboard that referenced this pull request Apr 15, 2020
Summary:
This commit changes the data access methods in `hparams.backend_context`
to read from the context’s data provider rather than directly from its
multiplexer. All data access in the hparams plugin already goes through
these methods.

Test Plan:
Ideally, existing tests would have sufficed, but the tests made heavy
use of mocking and so required manual intervention. The updated tests
pass, and an inspection of the various views of the hparams dashboard
(including embedded scalar charts and data download links) checks out,
even when TensorBoard core is patched to not provide a multiplexer on
the context.

wchargin-branch: hparams-generic-data
bileschi pushed a commit that referenced this pull request Apr 15, 2020
Summary:
Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to
`list_scalars` (resp. `list_tensors`) to find the set of time series to
read. This is slower than it might sound, because `list_scalars` itself
needs to scan over all relevant `multiplexer.Tensors` to identify
`max_step` and `max_wall_time`, which are thrown away by `read_scalars`.
(That `list_scalars` needs this full scan at all is its own issue;
ideally, these would be memoized onto the event multiplexer.)

When a `RunTagFilter` specifying a single run and tag is given, we
optimize further by requesting individual `SummaryMetadata` rather than
paring down `AllSummaryMetadata`.

Resolves a comment of @nfelt on #2980:
<#2980 (comment)>

Test Plan:
When applied on top of #3419, `:list_session_groups_test` improves from
taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t
seem to fully generalize; I see only ~13% speedups in a microbenchmark
that hammers `read_scalars` on a logdir with all the demo data, but the
improvement on that test is important.

wchargin-branch: data-read-without-list
bileschi pushed a commit that referenced this pull request Apr 15, 2020
Summary:
This commit changes the data access methods in `hparams.backend_context`
to read from the context’s data provider rather than directly from its
multiplexer. All data access in the hparams plugin already goes through
these methods.

Test Plan:
Ideally, existing tests would have sufficed, but the tests made heavy
use of mocking and so required manual intervention. The updated tests
pass, and an inspection of the various views of the hparams dashboard
(including embedded scalar charts and data download links) checks out,
even when TensorBoard core is patched to not provide a multiplexer on
the context.

wchargin-branch: hparams-generic-data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants