-
Notifications
You must be signed in to change notification settings - Fork 1.7k
data: add tensor support to multiplexer provider #2980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: This commit specifies the `list_tensors` and `read_tensors` methods on the data provider interface. These methods are optional for now (i.e., not decorated with `abc.abstractmethod`) for compatibility reasons, but we’ll make them required soon. Test Plan: Unit tests included. wchargin-branch: data-tensors-interface wchargin-source: 457a308946d36b67adfac5bcea7aabb78d926de5
Summary: This commit implements the new `list_tensors` and `read_tensors` methods for the data provider implementation backed by the event multiplexer. Test Plan: Unit tests included. wchargin-branch: data-tensors-mux wchargin-source: da45903e563afe8d9198a16a7ccade02b74ba378
Summary: We now raise `errors.NotFoundError` rather than manually propagating an error message and status code. The response code was previously 400 Bad Request, but 404 Not Found is more appropriate, and is consistent with the scalars plugin. Test Plan: Requesting `/data/plugin/histograms/histograms?run=foo&tag=bar` yields an error with the same message as before (but now with a “Not found:” prefix), and the histograms and distributions dashboards both work. wchargin-branch: histograms-notfound wchargin-source: 6e2644e5910c8d9ff1a939d328787a372d320dfe
wchargin-branch: histograms-notfound wchargin-source: 83fe89e2899b9dfb84f7bd952381981a37653c74
Summary: Analogous to 2fe5301, but for histograms and distributions. Test Plan: Tests updated. Manually confirmed that runs like `text_demo` continue to appear in the run selectors for the histogram and distributions dashboards even though they have no relevant data. wchargin-branch: histograms-omit-empty wchargin-source: 660918175855a7e84147253237864f2accfb2424
wchargin-branch: data-tensors-interface wchargin-source: e88f46105418e9155d08ed312338c4080827f987
wchargin-branch: data-tensors-mux wchargin-source: bc80a39f66e93bcc01acff82c893993670a06c49
|
|
||
| return self._read(convert_scalar_event, index) | ||
|
|
||
| def list_tensors(self, experiment_id, plugin_name, run_tag_filter=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally don't think copy/paste is a bad thing but do we expect these implementations (list_tensors and read_tensors) to differ much beyond TensorTimeSeries and convert_tensor_event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect list_tensors and read_tensors to remain similar to their
scalar counterparts, though the scalar counterparts could gain
additional features, like “list scalar time series with tag name
accuracy whose moving average is above 0.999”, that don’t make sense
for general tensor time series. (This is why we chose to separate scalar
time series out in the first place, even though the basic functionality
is subsumed by tensor time series.)
In such a future, I’d be happy to inline self._read and self._list
with appropriate changes. If you’d prefer that these be inlined today,
that’s also fine with me. I’m not exactly sure what you’re asking, so
let me know if you’d like any changes?
|
|
||
| self.assertItemsEqual(result.keys(), ["pictures"]) | ||
| self.assertItemsEqual(result["pictures"].keys(), ["purple", "green"]) | ||
| for run in result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional: It may be easier to assert by forming an expected map of TensorTimeSeries and just do assertEqual since we implemented eq.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, but I wanted to use np.testing.assert_equal for better error
messages than “giant_numpy_array_1 != giant_numpy_array_2”.
stephanwlee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both answers were 👍
wchargin-branch: data-tensors-interface wchargin-source: e6fb932cf828c9dc4481c5d1700c0ca52973129d
wchargin-branch: data-tensors-interface wchargin-source: e6fb932cf828c9dc4481c5d1700c0ca52973129d
wchargin-branch: data-tensors-mux wchargin-source: f335f1109804b7dfc3ec187b66aa94ee41546489
wchargin-branch: data-tensors-mux wchargin-source: 224a87bb9e8f7e49e836c698bfccd375bc6f7692
| experiment_id, plugin_name, run_tag_filter=run_tag_filter | ||
| ) | ||
|
|
||
| def convert_scalar_event(event): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No real need to change this but FWIW I'd be kind of inclined not to inline these, even though they're only used within single methods, since they don't actually close over any state, so this somewhat needlessly redefines the helper each time we call the API. In general I'm more of a fan of nested functions than the style guide but they do have some downsides like worse stack traces, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough; done in #2994.
| # is already downsampled. We could downsample on top of the existing | ||
| # sampling, which would be nice for testing. | ||
| del downsample # ignored for now | ||
| index = self.list_tensors( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I had no objection in the original PR, but isn't it kind of inefficient to implement the read operation in terms of listing? The list operation already iterates over all the tensors once in order to determine max step and wall time, but then the read operation doesn't return that metadata at all (it's just discarded) so the computation is wasted, and we have to iterate over the tensors again a second time to get their actual values.
Performance wise it's probably not the end of the world, but it seems suboptimal. (The same argument we had for the TB.dev backend might suggest also that we don't really need to return max step and wall time all the time; if we made those optional then listing could be made more efficient and the reuse wouldn't result in as much redundant iteration, although it's still a little duplicative even then.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed. I went with this because it was convenient and because, as
you say, it’s not the end of the world (due to downsampling, these
queries are all bounded). For a data provider implementation where this
actually required an extra RPC, I of course would make a different call.
I’d be happy to accept a change that streamlined this, but it’s not high
enough on my priorities for me to do so myself, unless you feel strongly
about it.
Summary: Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to `list_scalars` (resp. `list_tensors`) to find the set of time series to read. This is slower than it might sound, because `list_scalars` itself needs to scan over all relevant `multiplexer.Tensors` to identify `max_step` and `max_wall_time`, which are thrown away by `read_scalars`. (That `list_scalars` needs this full scan at all is its own issue; ideally, these would be memoized onto the event multiplexer.) When a `RunTagFilter` specifying a single run and tag is given, we optimize further by requesting individual `SummaryMetadata` rather than paring down `AllSummaryMetadata`. Resolves a comment of @nfelt on #2980: <#2980 (comment)> Test Plan: When applied on top of #3419, `:list_session_groups_test` improves from taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t seem to fully generalize; I see only ~13% speedups in a microbenchmark that hammers `read_scalars` on a logdir with all the demo data, but the improvement on that test is important. wchargin-branch: data-read-without-list wchargin-source: bc728c60dcb0039a6f802eaf154205b7161e4796
Summary: Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to `list_scalars` (resp. `list_tensors`) to find the set of time series to read. This is slower than it might sound, because `list_scalars` itself needs to scan over all relevant `multiplexer.Tensors` to identify `max_step` and `max_wall_time`, which are thrown away by `read_scalars`. (That `list_scalars` needs this full scan at all is its own issue; ideally, these would be memoized onto the event multiplexer.) When a `RunTagFilter` specifying a single run and tag is given, we optimize further by requesting individual `SummaryMetadata` rather than paring down `AllSummaryMetadata`. Resolves a comment of @nfelt on #2980: <#2980 (comment)> Test Plan: When applied on top of #3419, `:list_session_groups_test` improves from taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t seem to fully generalize; I see only ~13% speedups in a microbenchmark that hammers `read_scalars` on a logdir with all the demo data, but the improvement on that test is important. wchargin-branch: data-read-without-list
Summary: Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to `list_scalars` (resp. `list_tensors`) to find the set of time series to read. This is slower than it might sound, because `list_scalars` itself needs to scan over all relevant `multiplexer.Tensors` to identify `max_step` and `max_wall_time`, which are thrown away by `read_scalars`. (That `list_scalars` needs this full scan at all is its own issue; ideally, these would be memoized onto the event multiplexer.) When a `RunTagFilter` specifying a single run and tag is given, we optimize further by requesting individual `SummaryMetadata` rather than paring down `AllSummaryMetadata`. Resolves a comment of @nfelt on tensorflow#2980: <tensorflow#2980 (comment)> Test Plan: When applied on top of tensorflow#3419, `:list_session_groups_test` improves from taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t seem to fully generalize; I see only ~13% speedups in a microbenchmark that hammers `read_scalars` on a logdir with all the demo data, but the improvement on that test is important. wchargin-branch: data-read-without-list
Summary: Prior to this change, `read_scalars` (resp. `read_tensors`) delegated to `list_scalars` (resp. `list_tensors`) to find the set of time series to read. This is slower than it might sound, because `list_scalars` itself needs to scan over all relevant `multiplexer.Tensors` to identify `max_step` and `max_wall_time`, which are thrown away by `read_scalars`. (That `list_scalars` needs this full scan at all is its own issue; ideally, these would be memoized onto the event multiplexer.) When a `RunTagFilter` specifying a single run and tag is given, we optimize further by requesting individual `SummaryMetadata` rather than paring down `AllSummaryMetadata`. Resolves a comment of @nfelt on #2980: <#2980 (comment)> Test Plan: When applied on top of #3419, `:list_session_groups_test` improves from taking 11.1 seconds to taking 6.6 seconds on my machine. This doesn’t seem to fully generalize; I see only ~13% speedups in a microbenchmark that hammers `read_scalars` on a logdir with all the demo data, but the improvement on that test is important. wchargin-branch: data-read-without-list
Summary:
This commit implements the new
list_tensorsandread_tensorsmethodsfor the data provider implementation backed by the event multiplexer.
Test Plan:
Unit tests included.
wchargin-branch: data-tensors-mux