Scalars: Multiplex fetch (one tag, many runs). #3835

davidsoergel · 2020-07-13T21:51:35Z

This dramatically speeds up scalars dashboard loading when there are many runs, and reduces load on our hosted services as well.

wchargin · 2020-07-13T21:56:04Z

(haven’t reviewed patch, but assuming it does what it says on the tin—)
this is awesome, thanks! Was planning to do this in the next few weeks,
but even better to have it in earlier. :-)

davidsoergel · 2020-07-13T22:17:33Z

Haha @wchargin I think you just volunteered to review ;)

davidsoergel · 2020-07-14T15:56:08Z

Ah, sorry, need to fix tests. Something weird happened with my local tests so I didn't trust it, but I see the CI also failed for obvious reasons. Will do today or tomorrow, depending.

tensorboard/plugins/scalar/scalars_plugin.py

tensorboard/plugins/scalar/tf_scalar_dashboard/tf-scalar-card.html

tensorboard/plugins/scalar/scalars_plugin.py

tensorboard/plugins/scalar/tf_scalar_dashboard/tf-scalar-card.html

tensorboard/plugins/scalar/scalars_plugin.py

davidsoergel · 2020-07-15T03:31:25Z

tensorboard/components/tf_dashboard_common/data-loader-behavior.ts

-            this.requestManager.request(this.getDataLoadUrl(datum));
+          return (datum) => {
+            const dataLoadUrl = this.getDataLoadUrl(datum);
+            var url;


davidsoergel · 2020-07-15T03:33:05Z

tensorboard/components/tf_dashboard_common/data-loader-behavior.ts


      // A function that takes a datum and returns a string URL for fetching
-      // data.
+      // data.  Optionally, returns a tuple of (URL, postdata).


tensorboard/plugins/scalar/tf_scalar_dashboard/tf-scalar-card.html

tensorboard/plugins/scalar/scalars_plugin.py

tensorboard/plugins/scalar/tf_scalar_dashboard/tf-scalar-card.html

wchargin

Second pass. Assuming that nothing new comes up once these are
addressed, I think that this will look good to me. Let me know when the
corresponding internal changes (as discussed) are ready.

tensorboard/plugins/scalar/tf_scalar_dashboard/tf-scalar-card.html

tensorboard/components/tf_dashboard_common/data-loader-behavior.ts

tensorboard/plugins/scalar/http_api.md

tensorboard/plugins/scalar/scalars_plugin.py

wchargin · 2020-07-16T02:28:08Z

tensorboard/plugins/scalar/scalars_plugin.py

+        for run in all_scalars:
+            scalars = all_scalars.get(run, {}).get(tag, [])


nit: The fallback in all_scalars.get(run, {}) is superfluous, since
we iterate for run in all_scalars. The fallback in ….get(tag, [])
is, I think, also superfluous, since run won’t appear in the output at
all if it doesn’t have data for tag (since your RunTagFilter has a
singleton tag axis). So this could just be all_scalars[run][tag],
which I think is clearer because the current version suggests codepaths
that cannot actually exist.

Good point, thanks. Done

wchargin · 2020-07-16T02:40:37Z

tensorboard/plugins/scalar/scalars_plugin_test.py

+        response = server.post(
+            "/data/plugin/scalars/scalarsmulti",
+            data={
+                "runs": json.dumps([self._RUN_WITH_SCALARS]),


Nice test; thanks! Maybe include a run that doesn’t have scalar data
and/or a run that doesn’t have any data to test those edge cases, since
it’d be easy?

This works; thanks! I actually meant that it would suffice to just set
runs to [_RUN_WITH_SCALARS, _RUN_WITH_HISTOGRAMS] and keep the same
expected output. (That way, you’re also actually testing the multi-run
functionality of the /scalars_multirun route.) But having multiple
test cases is also perfectly fine with me if you prefer that.

Cleaned up a bit (doing both what you said and multiple test cases).

davidsoergel

Thanks-- all super helpful!

tensorboard/plugins/scalar/tf_scalar_dashboard/tf-scalar-card.html

davidsoergel · 2020-07-16T03:15:14Z

tensorboard/plugins/scalar/scalars_plugin.py

+        for run in all_scalars:
+            scalars = all_scalars.get(run, {}).get(tag, [])


Good point, thanks. Done

tensorboard/plugins/scalar/scalars_plugin.py

tensorboard/components/tf_dashboard_common/data-loader-behavior.ts

tensorboard/plugins/scalar/scalars_plugin.py

tensorboard/plugins/scalar/http_api.md

tensorboard/plugins/scalar/scalars_plugin.py

davidsoergel · 2020-07-16T04:19:18Z

tensorboard/plugins/scalar/scalars_plugin_test.py

+        response = server.post(
+            "/data/plugin/scalars/scalarsmulti",
+            data={
+                "runs": json.dumps([self._RUN_WITH_SCALARS]),


wchargin

Looks good to me modulo inline.

Feel free to send corresponding internal changes to me for review.

wchargin · 2020-07-16T05:28:10Z

tensorboard/plugins/scalar/scalars_plugin.py

+            # Note we do not raise an error if data for a given run was not
+            # found; we just omit the run that case.
+            if scalars:


What’s the purpose of this if scalars check? We know that scalars is
a list of ScalarDatum values (e.g., it can’t be None), so this omits
the run if it has empty data. But then a missing time series is
indistinguishable from a time series that is simply empty.

It’s (currently) true that in both the multiplexer data provider and
TensorBoard.dev we don’t keep track of empty time series, but there’s no
such restriction in the data provider model,* so I don’t see why we
should go out of our way to lose information here. Can we just drop the
if-condition?

Sorry if I was unclear earlier; what I meant to convey is that the
structure of the data provider methods already has a simple, lossless
way to keep track of which time series exist and whether they have data,
and that plugin code should try to stick to that when possible. In most
cases this should generally amount to just doing the straightforward
thing without extra conditions.

* If you want a motivating example, imagine some tuning service that
knows ahead of time what metrics you want to track.

OK, I cleaned up this case-- thanks for clarifying. I see now that what confused me is line 109 in the single-run case, which (per your other comment) suggests impossible states, e.g. the run is populated but the tag is not. IIUC, that one would also be clearer as

if !all_scalars: raise NotFoundError(...) scalars = all_scalars[run][tag]

(Not changing it here, though, so as not to pollute the PR.)

wchargin · 2020-07-16T05:31:10Z

tensorboard/plugins/scalar/scalars_plugin_test.py

+        response = server.post(
+            "/data/plugin/scalars/scalarsmulti",
+            data={
+                "runs": json.dumps([self._RUN_WITH_SCALARS]),


This works; thanks! I actually meant that it would suffice to just set
runs to [_RUN_WITH_SCALARS, _RUN_WITH_HISTOGRAMS] and keep the same
expected output. (That way, you’re also actually testing the multi-run
functionality of the /scalars_multirun route.) But having multiple
test cases is also perfectly fine with me if you prefer that.

davidsoergel · 2020-07-23T16:15:11Z

Unfortunately this implementation breaks the caching in data_loader_behavior.ts. Setting aside for now.

Summary: As of this patch, a `tf-scalar-card` can make just one network request to fetch its data, instead of one request per time series (i.e., one request per run, since each scalar chart renders a single tag). This reduces network overhead, improves throughput due to higher request concurrency, and offers the opportunity for data providers to more efficiently request the data in batch. This is implemented with a new POST route `/scalars_multirun`, since the list of run names may be long. The frontend is configured to batch requests to at most 64 runs at once, so the multiplexing is bounded. This only affects the scalars plugin. Other scalar chart sources, like PR curves, custom scalars, and the hparams metrics views, are not affected. Supersedes #3835, with the same idea and same general backend approach, but using the frontend APIs enabled by #4045. Test Plan: On the hparams demo with four charts showing, each fetching 50 runs, we now make only four requests as opposed to 200. On a Google-internal networked data provider, this improves end-to-end time (measured from “first spinner appears” to “last spinner disappears”) by about 50%, from 22±1 seconds to 11±1 seconds. (Before this patch, the network requests were being queued all the way to the end of the test period.) Changing the batch size to 4 and then running on a dataset with 14 runs shows that the requests are properly batched, including the last one with just 2 runs. Testing hparams, custom scalars, and PR curves shows that they continue to work, even when multiple time series are requested. wchargin-branch: scalars-mux-runs

wchargin · 2020-08-24T19:54:17Z

Superseded by #4050.

googlebot added the cla: yes label Jul 13, 2020

wchargin added the theme:performance Performance, scalability, large data sizes, slowness, etc. label Jul 13, 2020

Multiplex scalar fetch: one tag, many runs

8fbebca

davidsoergel force-pushed the multiplex-scalar-fetch branch from 34ee321 to 8fbebca Compare July 13, 2020 22:00

davidsoergel added 2 commits July 13, 2020 18:05

lint, etc.

ada58c9

lint again

883b2a2

davidsoergel requested a review from wchargin July 13, 2020 22:17

wchargin reviewed Jul 14, 2020

View reviewed changes

wchargin mentioned this pull request Jul 14, 2020

scalars: refactor tests to use public interfaces #3838

Merged

davidsoergel added 4 commits July 15, 2020 13:43

Reviewer comments

85d79c5

Merge branch 'master' into multiplex-scalar-fetch

0e80a10

debugging...

1b0a00b

fixed bug

745ee2e

davidsoergel commented Jul 15, 2020

View reviewed changes

remove crufty 'experiment' references

4b4205a

davidsoergel requested a review from wchargin July 15, 2020 23:00

Fix inadventent switch to POST instead of GET

b228f46

wchargin reviewed Jul 16, 2020

View reviewed changes

davidsoergel added 3 commits July 15, 2020 22:59

Revert unrelated test-breaking change

e012072

Reviewer comments

501de44

Reviewer comments

495bfba

davidsoergel commented Jul 16, 2020

View reviewed changes

davidsoergel requested a review from wchargin July 16, 2020 04:24

wchargin reviewed Jul 16, 2020

View reviewed changes

davidsoergel added 3 commits July 16, 2020 11:05

Reviewer comments

403c463

Merge branch 'master' into multiplex-scalar-fetch

97969c8

Trigger Polymer update on run selection change

bd02b54

wchargin mentioned this pull request Aug 19, 2020

scalars: multiplex data fetches within a tag #4050

Merged

wchargin closed this Aug 24, 2020

wchargin deleted the multiplex-scalar-fetch branch August 24, 2020 19:55

		for run in all_scalars:
		scalars = all_scalars.get(run, {}).get(tag, [])

Scalars: Multiplex fetch (one tag, many runs). #3835

Scalars: Multiplex fetch (one tag, many runs). #3835

Uh oh!

Conversation

davidsoergel commented Jul 13, 2020

Uh oh!

wchargin commented Jul 13, 2020

Uh oh!

davidsoergel commented Jul 13, 2020

Uh oh!

davidsoergel commented Jul 14, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wchargin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidsoergel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wchargin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidsoergel commented Jul 23, 2020