data: add tensor support to multiplexer provider #2980

wchargin · 2019-11-27T22:14:57Z

Summary:
This commit implements the new list_tensors and read_tensors methods
for the data provider implementation backed by the event multiplexer.

Test Plan:
Unit tests included.

wchargin-branch: data-tensors-mux

Summary: This commit specifies the `list_tensors` and `read_tensors` methods on the data provider interface. These methods are optional for now (i.e., not decorated with `abc.abstractmethod`) for compatibility reasons, but we’ll make them required soon. Test Plan: Unit tests included. wchargin-branch: data-tensors-interface wchargin-source: 457a308946d36b67adfac5bcea7aabb78d926de5

Summary: This commit implements the new `list_tensors` and `read_tensors` methods for the data provider implementation backed by the event multiplexer. Test Plan: Unit tests included. wchargin-branch: data-tensors-mux wchargin-source: da45903e563afe8d9198a16a7ccade02b74ba378

Summary: We now raise `errors.NotFoundError` rather than manually propagating an error message and status code. The response code was previously 400 Bad Request, but 404 Not Found is more appropriate, and is consistent with the scalars plugin. Test Plan: Requesting `/data/plugin/histograms/histograms?run=foo&tag=bar` yields an error with the same message as before (but now with a “Not found:” prefix), and the histograms and distributions dashboards both work. wchargin-branch: histograms-notfound wchargin-source: 6e2644e5910c8d9ff1a939d328787a372d320dfe

wchargin-branch: histograms-notfound wchargin-source: 83fe89e2899b9dfb84f7bd952381981a37653c74

Summary: Analogous to 2fe5301, but for histograms and distributions. Test Plan: Tests updated. Manually confirmed that runs like `text_demo` continue to appear in the run selectors for the histogram and distributions dashboards even though they have no relevant data. wchargin-branch: histograms-omit-empty wchargin-source: 660918175855a7e84147253237864f2accfb2424

wchargin-branch: data-tensors-interface wchargin-source: e88f46105418e9155d08ed312338c4080827f987

wchargin-branch: data-tensors-mux wchargin-source: bc80a39f66e93bcc01acff82c893993670a06c49

stephanwlee · 2019-12-03T00:29:22Z

tensorboard/backend/event_processing/data_provider.py

+
+    return self._read(convert_scalar_event, index)
+
+  def list_tensors(self, experiment_id, plugin_name, run_tag_filter=None):


I generally don't think copy/paste is a bad thing but do we expect these implementations (list_tensors and read_tensors) to differ much beyond TensorTimeSeries and convert_tensor_event?

I expect list_tensors and read_tensors to remain similar to their
scalar counterparts, though the scalar counterparts could gain
additional features, like “list scalar time series with tag name
accuracy whose moving average is above 0.999”, that don’t make sense
for general tensor time series. (This is why we chose to separate scalar
time series out in the first place, even though the basic functionality
is subsumed by tensor time series.)

In such a future, I’d be happy to inline self._read and self._list
with appropriate changes. If you’d prefer that these be inlined today,
that’s also fine with me. I’m not exactly sure what you’re asking, so
let me know if you’d like any changes?

stephanwlee · 2019-12-03T00:37:08Z

tensorboard/backend/event_processing/data_provider_test.py

+
+    self.assertItemsEqual(result.keys(), ["pictures"])
+    self.assertItemsEqual(result["pictures"].keys(), ["purple", "green"])
+    for run in result:


optional: It may be easier to assert by forming an expected map of TensorTimeSeries and just do assertEqual since we implemented eq.

Indeed, but I wanted to use np.testing.assert_equal for better error
messages than “giant_numpy_array_1 != giant_numpy_array_2”.

stephanwlee

Both answers were 👍

wchargin-branch: data-tensors-interface wchargin-source: e6fb932cf828c9dc4481c5d1700c0ca52973129d

wchargin-branch: data-tensors-mux wchargin-source: f335f1109804b7dfc3ec187b66aa94ee41546489

wchargin-branch: data-tensors-mux wchargin-source: 224a87bb9e8f7e49e836c698bfccd375bc6f7692

nfelt · 2019-12-03T22:53:15Z

tensorboard/backend/event_processing/data_provider.py

+        experiment_id, plugin_name, run_tag_filter=run_tag_filter
+    )
+
+    def convert_scalar_event(event):


No real need to change this but FWIW I'd be kind of inclined not to inline these, even though they're only used within single methods, since they don't actually close over any state, so this somewhat needlessly redefines the helper each time we call the API. In general I'm more of a fan of nested functions than the style guide but they do have some downsides like worse stack traces, etc.

Fair enough; done in #2994.

nfelt · 2019-12-03T23:01:20Z

tensorboard/backend/event_processing/data_provider.py

+    # is already downsampled. We could downsample on top of the existing
+    # sampling, which would be nice for testing.
+    del downsample  # ignored for now
+    index = self.list_tensors(


I guess I had no objection in the original PR, but isn't it kind of inefficient to implement the read operation in terms of listing? The list operation already iterates over all the tensors once in order to determine max step and wall time, but then the read operation doesn't return that metadata at all (it's just discarded) so the computation is wasted, and we have to iterate over the tensors again a second time to get their actual values.

Performance wise it's probably not the end of the world, but it seems suboptimal. (The same argument we had for the TB.dev backend might suggest also that we don't really need to return max step and wall time all the time; if we made those optional then listing could be made more efficient and the reuse wouldn't result in as much redundant iteration, although it's still a little duplicative even then.)

Yes, agreed. I went with this because it was convenient and because, as
you say, it’s not the end of the world (due to downsampling, these
queries are all bounded). For a data provider implementation where this
actually required an extra RPC, I of course would make a different call.

I’d be happy to accept a change that streamlined this, but it’s not high
enough on my priorities for me to do so myself, unless you feel strongly
about it.

@nfelt

Summary: Per suggestion of @nfelt in post-review comment on #2980. This way, the functions are only defined once. Test Plan: Unit tests pass. wchargin-branch: mux-toplevel-read-helpers

@nfelt

Summary: Per suggestion of @nfelt in post-review comment on #2980. This way, the functions are only defined once. Test Plan: Unit tests pass. wchargin-branch: mux-toplevel-read-helpers

@nfelt

Summary: Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to `list_scalars` (resp. `list_tensors`) to find the set of time series to read. This is slower than it might sound, because `list_scalars` itself needs to scan over all relevant `multiplexer.Tensors` to identify `max_step` and `max_wall_time`, which are thrown away by `read_scalars`. (That `list_scalars` needs this full scan at all is its own issue; ideally, these would be memoized onto the event multiplexer.) When a `RunTagFilter` specifying a single run and tag is given, we optimize further by requesting individual `SummaryMetadata` rather than paring down `AllSummaryMetadata`. Resolves a comment of @nfelt on #2980: <#2980 (comment)> Test Plan: When applied on top of #3419, `:list_session_groups_test` improves from taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t seem to fully generalize; I see only ~13% speedups in a microbenchmark that hammers `read_scalars` on a logdir with all the demo data, but the improvement on that test is important. wchargin-branch: data-read-without-list wchargin-source: bc728c60dcb0039a6f802eaf154205b7161e4796

@nfelt

Summary: Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to `list_scalars` (resp. `list_tensors`) to find the set of time series to read. This is slower than it might sound, because `list_scalars` itself needs to scan over all relevant `multiplexer.Tensors` to identify `max_step` and `max_wall_time`, which are thrown away by `read_scalars`. (That `list_scalars` needs this full scan at all is its own issue; ideally, these would be memoized onto the event multiplexer.) When a `RunTagFilter` specifying a single run and tag is given, we optimize further by requesting individual `SummaryMetadata` rather than paring down `AllSummaryMetadata`. Resolves a comment of @nfelt on #2980: <#2980 (comment)> Test Plan: When applied on top of #3419, `:list_session_groups_test` improves from taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t seem to fully generalize; I see only ~13% speedups in a microbenchmark that hammers `read_scalars` on a logdir with all the demo data, but the improvement on that test is important. wchargin-branch: data-read-without-list

@nfelt

Summary: Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to `list_scalars` (resp. `list_tensors`) to find the set of time series to read. This is slower than it might sound, because `list_scalars` itself needs to scan over all relevant `multiplexer.Tensors` to identify `max_step` and `max_wall_time`, which are thrown away by `read_scalars`. (That `list_scalars` needs this full scan at all is its own issue; ideally, these would be memoized onto the event multiplexer.) When a `RunTagFilter` specifying a single run and tag is given, we optimize further by requesting individual `SummaryMetadata` rather than paring down `AllSummaryMetadata`. Resolves a comment of @nfelt on tensorflow#2980: <tensorflow#2980 (comment)> Test Plan: When applied on top of tensorflow#3419, `:list_session_groups_test` improves from taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t seem to fully generalize; I see only ~13% speedups in a microbenchmark that hammers `read_scalars` on a logdir with all the demo data, but the improvement on that test is important. wchargin-branch: data-read-without-list

@nfelt

Summary: Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to `list_scalars` (resp. `list_tensors`) to find the set of time series to read. This is slower than it might sound, because `list_scalars` itself needs to scan over all relevant `multiplexer.Tensors` to identify `max_step` and `max_wall_time`, which are thrown away by `read_scalars`. (That `list_scalars` needs this full scan at all is its own issue; ideally, these would be memoized onto the event multiplexer.) When a `RunTagFilter` specifying a single run and tag is given, we optimize further by requesting individual `SummaryMetadata` rather than paring down `AllSummaryMetadata`. Resolves a comment of @nfelt on #2980: <#2980 (comment)> Test Plan: When applied on top of #3419, `:list_session_groups_test` improves from taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t seem to fully generalize; I see only ~13% speedups in a microbenchmark that hammers `read_scalars` on a logdir with all the demo data, but the improvement on that test is important. wchargin-branch: data-read-without-list

wchargin added 2 commits November 27, 2019 14:14

wchargin added type:feature core:backend labels Nov 27, 2019

googlebot added the cla: yes label Nov 27, 2019

wchargin added 5 commits December 2, 2019 14:12

[update diffbase]

4c4062c

wchargin-branch: histograms-notfound wchargin-source: 83fe89e2899b9dfb84f7bd952381981a37653c74

[update diffbase]

528682c

wchargin-branch: data-tensors-interface wchargin-source: e88f46105418e9155d08ed312338c4080827f987

[update diffbase]

86f6833

wchargin-branch: data-tensors-mux wchargin-source: bc80a39f66e93bcc01acff82c893993670a06c49

wchargin requested a review from stephanwlee December 3, 2019 00:01

stephanwlee reviewed Dec 3, 2019

View reviewed changes

stephanwlee approved these changes Dec 3, 2019

View reviewed changes

wchargin added 3 commits December 2, 2019 18:32

[update diffbase]

25b7b72

wchargin-branch: data-tensors-interface wchargin-source: e6fb932cf828c9dc4481c5d1700c0ca52973129d

[update patch]

1fdd64c

wchargin-branch: data-tensors-interface wchargin-source: e6fb932cf828c9dc4481c5d1700c0ca52973129d

[update diffbase]

d83ff25

wchargin-branch: data-tensors-mux wchargin-source: f335f1109804b7dfc3ec187b66aa94ee41546489

wchargin changed the base branch from wchargin-data-tensors-interface to master December 3, 2019 03:38

[update diffbase]

45c8a7f

wchargin-branch: data-tensors-mux wchargin-source: 224a87bb9e8f7e49e836c698bfccd375bc6f7692

wchargin merged commit 9cb8ab5 into master Dec 3, 2019

wchargin deleted the wchargin-data-tensors-mux branch December 3, 2019 04:26

nfelt reviewed Dec 3, 2019

View reviewed changes

wchargin mentioned this pull request Dec 4, 2019

data: define convert_*_event at module level #2994

Merged

wchargin mentioned this pull request Mar 26, 2020

data: optimize read_scalars by skipping scans #3433

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

data: add tensor support to multiplexer provider #2980

data: add tensor support to multiplexer provider #2980

Uh oh!

wchargin commented Nov 27, 2019

Uh oh!

stephanwlee Dec 3, 2019

Uh oh!

wchargin Dec 3, 2019

Uh oh!

stephanwlee Dec 3, 2019

Uh oh!

wchargin Dec 3, 2019

Uh oh!

stephanwlee left a comment

Uh oh!

nfelt Dec 3, 2019

Uh oh!

wchargin Dec 4, 2019

Uh oh!

nfelt Dec 3, 2019

Uh oh!

wchargin Dec 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		return self._read(convert_scalar_event, index)

		def list_tensors(self, experiment_id, plugin_name, run_tag_filter=None):

data: add tensor support to multiplexer provider #2980

data: add tensor support to multiplexer provider #2980

Uh oh!

Conversation

wchargin commented Nov 27, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephanwlee left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants