Skip to content

Conversation

@wchargin
Copy link
Contributor

Summary:
This commit implements the new list_tensors and read_tensors methods
for the data provider implementation backed by the event multiplexer.

Test Plan:
Unit tests included.

wchargin-branch: data-tensors-mux

Summary:
This commit specifies the `list_tensors` and `read_tensors` methods on
the data provider interface. These methods are optional for now (i.e.,
not decorated with `abc.abstractmethod`) for compatibility reasons, but
we’ll make them required soon.

Test Plan:
Unit tests included.

wchargin-branch: data-tensors-interface
wchargin-source: 457a308946d36b67adfac5bcea7aabb78d926de5
Summary:
This commit implements the new `list_tensors` and `read_tensors` methods
for the data provider implementation backed by the event multiplexer.

Test Plan:
Unit tests included.

wchargin-branch: data-tensors-mux
wchargin-source: da45903e563afe8d9198a16a7ccade02b74ba378
Summary:
We now raise `errors.NotFoundError` rather than manually propagating an
error message and status code.

The response code was previously 400 Bad Request, but 404 Not Found is
more appropriate, and is consistent with the scalars plugin.

Test Plan:
Requesting `/data/plugin/histograms/histograms?run=foo&tag=bar` yields
an error with the same message as before (but now with a “Not found:”
prefix), and the histograms and distributions dashboards both work.

wchargin-branch: histograms-notfound
wchargin-source: 6e2644e5910c8d9ff1a939d328787a372d320dfe
wchargin-branch: histograms-notfound
wchargin-source: 83fe89e2899b9dfb84f7bd952381981a37653c74
Summary:
Analogous to 2fe5301, but for histograms and distributions.

Test Plan:
Tests updated. Manually confirmed that runs like `text_demo` continue to
appear in the run selectors for the histogram and distributions
dashboards even though they have no relevant data.

wchargin-branch: histograms-omit-empty
wchargin-source: 660918175855a7e84147253237864f2accfb2424
wchargin-branch: data-tensors-interface
wchargin-source: e88f46105418e9155d08ed312338c4080827f987
wchargin-branch: data-tensors-mux
wchargin-source: bc80a39f66e93bcc01acff82c893993670a06c49
@wchargin wchargin requested a review from stephanwlee December 3, 2019 00:01

return self._read(convert_scalar_event, index)

def list_tensors(self, experiment_id, plugin_name, run_tag_filter=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally don't think copy/paste is a bad thing but do we expect these implementations (list_tensors and read_tensors) to differ much beyond TensorTimeSeries and convert_tensor_event?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect list_tensors and read_tensors to remain similar to their
scalar counterparts, though the scalar counterparts could gain
additional features, like “list scalar time series with tag name
accuracy whose moving average is above 0.999”, that don’t make sense
for general tensor time series. (This is why we chose to separate scalar
time series out in the first place, even though the basic functionality
is subsumed by tensor time series.)

In such a future, I’d be happy to inline self._read and self._list
with appropriate changes. If you’d prefer that these be inlined today,
that’s also fine with me. I’m not exactly sure what you’re asking, so
let me know if you’d like any changes?


self.assertItemsEqual(result.keys(), ["pictures"])
self.assertItemsEqual(result["pictures"].keys(), ["purple", "green"])
for run in result:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: It may be easier to assert by forming an expected map of TensorTimeSeries and just do assertEqual since we implemented eq.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but I wanted to use np.testing.assert_equal for better error
messages than “giant_numpy_array_1 != giant_numpy_array_2”.

Copy link
Contributor

@stephanwlee stephanwlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both answers were 👍

wchargin-branch: data-tensors-interface
wchargin-source: e6fb932cf828c9dc4481c5d1700c0ca52973129d
wchargin-branch: data-tensors-interface
wchargin-source: e6fb932cf828c9dc4481c5d1700c0ca52973129d
wchargin-branch: data-tensors-mux
wchargin-source: f335f1109804b7dfc3ec187b66aa94ee41546489
@wchargin wchargin changed the base branch from wchargin-data-tensors-interface to master December 3, 2019 03:38
wchargin-branch: data-tensors-mux
wchargin-source: 224a87bb9e8f7e49e836c698bfccd375bc6f7692
@wchargin wchargin merged commit 9cb8ab5 into master Dec 3, 2019
@wchargin wchargin deleted the wchargin-data-tensors-mux branch December 3, 2019 04:26
experiment_id, plugin_name, run_tag_filter=run_tag_filter
)

def convert_scalar_event(event):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real need to change this but FWIW I'd be kind of inclined not to inline these, even though they're only used within single methods, since they don't actually close over any state, so this somewhat needlessly redefines the helper each time we call the API. In general I'm more of a fan of nested functions than the style guide but they do have some downsides like worse stack traces, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough; done in #2994.

# is already downsampled. We could downsample on top of the existing
# sampling, which would be nice for testing.
del downsample # ignored for now
index = self.list_tensors(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I had no objection in the original PR, but isn't it kind of inefficient to implement the read operation in terms of listing? The list operation already iterates over all the tensors once in order to determine max step and wall time, but then the read operation doesn't return that metadata at all (it's just discarded) so the computation is wasted, and we have to iterate over the tensors again a second time to get their actual values.

Performance wise it's probably not the end of the world, but it seems suboptimal. (The same argument we had for the TB.dev backend might suggest also that we don't really need to return max step and wall time all the time; if we made those optional then listing could be made more efficient and the reuse wouldn't result in as much redundant iteration, although it's still a little duplicative even then.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed. I went with this because it was convenient and because, as
you say, it’s not the end of the world (due to downsampling, these
queries are all bounded). For a data provider implementation where this
actually required an extra RPC, I of course would make a different call.

I’d be happy to accept a change that streamlined this, but it’s not high
enough on my priorities for me to do so myself, unless you feel strongly
about it.

wchargin added a commit that referenced this pull request Dec 4, 2019
Summary:
Per suggestion of @nfelt in post-review comment on #2980. This way, the
functions are only defined once.

Test Plan:
Unit tests pass.

wchargin-branch: mux-toplevel-read-helpers
wchargin added a commit that referenced this pull request Dec 4, 2019
Summary:
Per suggestion of @nfelt in post-review comment on #2980. This way, the
functions are only defined once.

Test Plan:
Unit tests pass.

wchargin-branch: mux-toplevel-read-helpers
wchargin added a commit that referenced this pull request Mar 26, 2020
Summary:
Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to
`list_scalars` (resp. `list_tensors`) to find the set of time series to
read. This is slower than it might sound, because `list_scalars` itself
needs to scan over all relevant `multiplexer.Tensors` to identify
`max_step` and `max_wall_time`, which are thrown away by `read_scalars`.
(That `list_scalars` needs this full scan at all is its own issue;
ideally, these would be memoized onto the event multiplexer.)

When a `RunTagFilter` specifying a single run and tag is given, we
optimize further by requesting individual `SummaryMetadata` rather than
paring down `AllSummaryMetadata`.

Resolves a comment of @nfelt on #2980:
<#2980 (comment)>

Test Plan:
When applied on top of #3419, `:list_session_groups_test` improves from
taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t
seem to fully generalize; I see only ~13% speedups in a microbenchmark
that hammers `read_scalars` on a logdir with all the demo data, but the
improvement on that test is important.

wchargin-branch: data-read-without-list
wchargin-source: bc728c60dcb0039a6f802eaf154205b7161e4796
wchargin added a commit that referenced this pull request Mar 26, 2020
Summary:
Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to
`list_scalars` (resp. `list_tensors`) to find the set of time series to
read. This is slower than it might sound, because `list_scalars` itself
needs to scan over all relevant `multiplexer.Tensors` to identify
`max_step` and `max_wall_time`, which are thrown away by `read_scalars`.
(That `list_scalars` needs this full scan at all is its own issue;
ideally, these would be memoized onto the event multiplexer.)

When a `RunTagFilter` specifying a single run and tag is given, we
optimize further by requesting individual `SummaryMetadata` rather than
paring down `AllSummaryMetadata`.

Resolves a comment of @nfelt on #2980:
<#2980 (comment)>

Test Plan:
When applied on top of #3419, `:list_session_groups_test` improves from
taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t
seem to fully generalize; I see only ~13% speedups in a microbenchmark
that hammers `read_scalars` on a logdir with all the demo data, but the
improvement on that test is important.

wchargin-branch: data-read-without-list
bileschi pushed a commit to bileschi/tensorboard that referenced this pull request Apr 15, 2020
Summary:
Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to
`list_scalars` (resp. `list_tensors`) to find the set of time series to
read. This is slower than it might sound, because `list_scalars` itself
needs to scan over all relevant `multiplexer.Tensors` to identify
`max_step` and `max_wall_time`, which are thrown away by `read_scalars`.
(That `list_scalars` needs this full scan at all is its own issue;
ideally, these would be memoized onto the event multiplexer.)

When a `RunTagFilter` specifying a single run and tag is given, we
optimize further by requesting individual `SummaryMetadata` rather than
paring down `AllSummaryMetadata`.

Resolves a comment of @nfelt on tensorflow#2980:
<tensorflow#2980 (comment)>

Test Plan:
When applied on top of tensorflow#3419, `:list_session_groups_test` improves from
taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t
seem to fully generalize; I see only ~13% speedups in a microbenchmark
that hammers `read_scalars` on a logdir with all the demo data, but the
improvement on that test is important.

wchargin-branch: data-read-without-list
bileschi pushed a commit that referenced this pull request Apr 15, 2020
Summary:
Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to
`list_scalars` (resp. `list_tensors`) to find the set of time series to
read. This is slower than it might sound, because `list_scalars` itself
needs to scan over all relevant `multiplexer.Tensors` to identify
`max_step` and `max_wall_time`, which are thrown away by `read_scalars`.
(That `list_scalars` needs this full scan at all is its own issue;
ideally, these would be memoized onto the event multiplexer.)

When a `RunTagFilter` specifying a single run and tag is given, we
optimize further by requesting individual `SummaryMetadata` rather than
paring down `AllSummaryMetadata`.

Resolves a comment of @nfelt on #2980:
<#2980 (comment)>

Test Plan:
When applied on top of #3419, `:list_session_groups_test` improves from
taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t
seem to fully generalize; I see only ~13% speedups in a microbenchmark
that hammers `read_scalars` on a logdir with all the demo data, but the
improvement on that test is important.

wchargin-branch: data-read-without-list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants