-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Scalars: Multiplex fetch (one tag, many runs). #3835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
(haven’t reviewed patch, but assuming it does what it says on the tin—) |
34ee321 to
8fbebca
Compare
|
Haha @wchargin I think you just volunteered to review ;) |
|
Ah, sorry, need to fix tests. Something weird happened with my local tests so I didn't trust it, but I see the CI also failed for obvious reasons. Will do today or tomorrow, depending. |
tensorboard/plugins/scalar/tf_scalar_dashboard/tf-scalar-card.html
Outdated
Show resolved
Hide resolved
tensorboard/plugins/scalar/tf_scalar_dashboard/tf-scalar-card.html
Outdated
Show resolved
Hide resolved
| this.requestManager.request(this.getDataLoadUrl(datum)); | ||
| return (datum) => { | ||
| const dataLoadUrl = this.getDataLoadUrl(datum); | ||
| var url; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| // A function that takes a datum and returns a string URL for fetching | ||
| // data. | ||
| // data. Optionally, returns a tuple of (URL, postdata). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, sure.
tensorboard/plugins/scalar/tf_scalar_dashboard/tf-scalar-card.html
Outdated
Show resolved
Hide resolved
wchargin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second pass. Assuming that nothing new comes up once these are
addressed, I think that this will look good to me. Let me know when the
corresponding internal changes (as discussed) are ready.
tensorboard/components/tf_dashboard_common/data-loader-behavior.ts
Outdated
Show resolved
Hide resolved
tensorboard/components/tf_dashboard_common/data-loader-behavior.ts
Outdated
Show resolved
Hide resolved
| for run in all_scalars: | ||
| scalars = all_scalars.get(run, {}).get(tag, []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The fallback in all_scalars.get(run, {}) is superfluous, since
we iterate for run in all_scalars. The fallback in ….get(tag, [])
is, I think, also superfluous, since run won’t appear in the output at
all if it doesn’t have data for tag (since your RunTagFilter has a
singleton tag axis). So this could just be all_scalars[run][tag],
which I think is clearer because the current version suggests codepaths
that cannot actually exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks. Done
| response = server.post( | ||
| "/data/plugin/scalars/scalarsmulti", | ||
| data={ | ||
| "runs": json.dumps([self._RUN_WITH_SCALARS]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test; thanks! Maybe include a run that doesn’t have scalar data
and/or a run that doesn’t have any data to test those edge cases, since
it’d be easy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works; thanks! I actually meant that it would suffice to just set
runs to [_RUN_WITH_SCALARS, _RUN_WITH_HISTOGRAMS] and keep the same
expected output. (That way, you’re also actually testing the multi-run
functionality of the /scalars_multirun route.) But having multiple
test cases is also perfectly fine with me if you prefer that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned up a bit (doing both what you said and multiple test cases).
davidsoergel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks-- all super helpful!
| for run in all_scalars: | ||
| scalars = all_scalars.get(run, {}).get(tag, []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks. Done
tensorboard/components/tf_dashboard_common/data-loader-behavior.ts
Outdated
Show resolved
Hide resolved
| response = server.post( | ||
| "/data/plugin/scalars/scalarsmulti", | ||
| data={ | ||
| "runs": json.dumps([self._RUN_WITH_SCALARS]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
wchargin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me modulo inline.
Feel free to send corresponding internal changes to me for review.
| # Note we do not raise an error if data for a given run was not | ||
| # found; we just omit the run that case. | ||
| if scalars: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What’s the purpose of this if scalars check? We know that scalars is
a list of ScalarDatum values (e.g., it can’t be None), so this omits
the run if it has empty data. But then a missing time series is
indistinguishable from a time series that is simply empty.
It’s (currently) true that in both the multiplexer data provider and
TensorBoard.dev we don’t keep track of empty time series, but there’s no
such restriction in the data provider model,* so I don’t see why we
should go out of our way to lose information here. Can we just drop the
if-condition?
Sorry if I was unclear earlier; what I meant to convey is that the
structure of the data provider methods already has a simple, lossless
way to keep track of which time series exist and whether they have data,
and that plugin code should try to stick to that when possible. In most
cases this should generally amount to just doing the straightforward
thing without extra conditions.
* If you want a motivating example, imagine some tuning service that
knows ahead of time what metrics you want to track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I cleaned up this case-- thanks for clarifying. I see now that what confused me is line 109 in the single-run case, which (per your other comment) suggests impossible states, e.g. the run is populated but the tag is not. IIUC, that one would also be clearer as
if !all_scalars:
raise NotFoundError(...)
scalars = all_scalars[run][tag]
(Not changing it here, though, so as not to pollute the PR.)
| response = server.post( | ||
| "/data/plugin/scalars/scalarsmulti", | ||
| data={ | ||
| "runs": json.dumps([self._RUN_WITH_SCALARS]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works; thanks! I actually meant that it would suffice to just set
runs to [_RUN_WITH_SCALARS, _RUN_WITH_HISTOGRAMS] and keep the same
expected output. (That way, you’re also actually testing the multi-run
functionality of the /scalars_multirun route.) But having multiple
test cases is also perfectly fine with me if you prefer that.
|
Unfortunately this implementation breaks the caching in data_loader_behavior.ts. Setting aside for now. |
Summary: As of this patch, a `tf-scalar-card` can make just one network request to fetch its data, instead of one request per time series (i.e., one request per run, since each scalar chart renders a single tag). This reduces network overhead, improves throughput due to higher request concurrency, and offers the opportunity for data providers to more efficiently request the data in batch. This is implemented with a new POST route `/scalars_multirun`, since the list of run names may be long. The frontend is configured to batch requests to at most 64 runs at once, so the multiplexing is bounded. This only affects the scalars plugin. Other scalar chart sources, like PR curves, custom scalars, and the hparams metrics views, are not affected. Supersedes #3835, with the same idea and same general backend approach, but using the frontend APIs enabled by #4045. Test Plan: On the hparams demo with four charts showing, each fetching 50 runs, we now make only four requests as opposed to 200. On a Google-internal networked data provider, this improves end-to-end time (measured from “first spinner appears” to “last spinner disappears”) by about 50%, from 22±1 seconds to 11±1 seconds. (Before this patch, the network requests were being queued all the way to the end of the test period.) Changing the batch size to 4 and then running on a dataset with 14 runs shows that the requests are properly batched, including the last one with just 2 runs. Testing hparams, custom scalars, and PR curves shows that they continue to work, even when multiple time series are requested. wchargin-branch: scalars-mux-runs
|
Superseded by #4050. |
This dramatically speeds up scalars dashboard loading when there are many runs, and reduces load on our hosted services as well.