Skip to content

Conversation

@wchargin
Copy link
Contributor

@wchargin wchargin commented Feb 26, 2020

Summary:
When the server sends a PluginControl message in the initial
handshake, we now only send data for plugins specified therein.

Test Plan:
Unit tests included. As end-to-end tests:

  • When running against prod (which does not send plugin_control),
    experiment upload is unchanged. Info logs now include “Skipping time
    series […] with unsupported plugin name 'histograms'”, just once per
    time series.

  • When running against a local server that serves scalars-only
    explicitly—

    plugin_control {
      allowed_plugins: "scalars"
    }
    

    —the behavior is also the same.

  • When running against a local server that serves “no plugins allowed”
    explicitly—

    plugin_control {
    }
    

    —nothing is uploaded, and the logs include “Skipping…” messages for
    scalar time series.

wchargin-branch: uploader-plugincontrol

Summary:
This enables us to flag features on the server side, so that we can
stage releases, and so that we can roll back in a pinch without
requiring an expedited PyPI push and end user version upgrade.

Test Plan:
No new behavior in this change. The protos still build.

wchargin-branch: uploader-plugincontrol-protos
wchargin-source: cc6c5c6e096bbba2c24cda7ba5b2b97cb142e8fd
Summary:
When the server sends a `PluginControl` message in the initial
handshake, we now only send data for plugins specified therein.

Test Plan:
Unit tests included. DO NOT SUBMIT until end-to-end tested on both old
and new servers.

wchargin-branch: uploader-plugincontrol
wchargin-source: 8d294c291041b688c5cc42cd19d3f6e6fef8fc78
wchargin-branch: uploader-plugincontrol
wchargin-source: 046c05ee3cf246ce38f9c4674e53077eecb3f8f9
@wchargin
Copy link
Contributor Author

Server-side change (Google-internal): http://cl/297460920

wchargin-branch: uploader-plugincontrol-protos
wchargin-source: 4f801e62f79e9be1f54d473337da1ca092239184
wchargin-branch: uploader-plugincontrol
wchargin-source: e32537bff8875fb6609c6fc2977d8d637fe2f12d
wchargin-branch: uploader-plugincontrol-protos
wchargin-source: 929b65552dfeb9db6e7c5767a8ce1f253b3c3852
wchargin-branch: uploader-plugincontrol-protos
wchargin-source: 929b65552dfeb9db6e7c5767a8ce1f253b3c3852
wchargin-branch: uploader-plugincontrol
wchargin-source: 75c92834307a679ced4afad62aadfdbc127f8df7
@wchargin wchargin changed the base branch from wchargin-uploader-plugincontrol-protos to master February 27, 2020 20:45
wchargin-branch: uploader-plugincontrol
wchargin-source: 73b86407637afc795a8753ffc874c485f84134c1
Summary:
Tests create uploaders and request senders with dependency-injected
stubs. As we add more such stubs, the test code bloats and requires
broad changes. By centralizing helpers, we can inject defaults that are
selectively overridden.

Test Plan:
Existing tests suffice.

wchargin-branch: uploader-test-di-helpers
wchargin-source: b5da05e5125eea1248b67841b80a3094b0d9d5a4
@wchargin wchargin changed the base branch from master to wchargin-uploader-test-di-helpers February 28, 2020 00:42
wchargin-branch: uploader-plugincontrol
wchargin-source: b76b10849e0372570992b700b00842ad5d2bf502

# Conflicts:
#	tensorboard/uploader/uploader_test.py
wchargin-branch: uploader-plugincontrol
wchargin-source: b76b10849e0372570992b700b00842ad5d2bf502
@wchargin
Copy link
Contributor Author

(Made this patch a bunch shorter and cleaner by pulling out #3304, which
facilitates #3305 as well.)

# If later events arrive with a mismatching plugin_name, they are
# ignored with a warning.
metadata = self._tag_metadata.get(time_series_key)
first_in_time_series = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mechanism of detecting 'first time' is a little entangled hard to follow. How about maintaining a collection of plugin_names_seen and testing if plugin_name in plugin_names_seen:?

Related, the 'first time' gets reset for every _upload_once(). Is that desirable, or should we maintain a singleton plugin_names_seen (or similar) that persists over cycles? In this case you'd change the error message too, of course, so it's not time-series-specific but just appears once per plugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mechanism of detecting 'first time' is a little entangled hard to
follow. How about maintaining a collection of plugin_names_seen and
testing if plugin_name in plugin_names_seen:?

We warn once per time series, not once per plugin. So rather than a
collection of “plugin names seen”, we maintain a collection of “time
series seen”—but this is just the key set of self._tag_metadata.

Related, the 'first time' gets reset for every _upload_once().

Hmm; I’m not able to reproduce that? We create a _BatchedRequestSender
only once, at create_experiment time, and the “already seen?” source
of truth is on its self._tag_metadata instance attribute, which is
never cleared.

It may help to look at the log output to address both questions. The
output looks like:

directory_loader.py:131] Loading data from path /HOMEDIR/tensorboard_data/mnist/lr_1E-03,conv=1,fc=2/events.out.tfevents.1563406327.HOSTNAME
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'input/image/0') with unsupported plugin name 'images'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'input/image/1') with unsupported plugin name 'images'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'input/image/2') with unsupported plugin name 'images'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'conv/weights') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'conv/biases') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'conv/activations') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'fc1/weights') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'fc1/biases') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'fc1/activations') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'fc1/relu') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'fc2/weights') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'fc2/biases') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'fc2/activations') with unsupported plugin name 'histograms'
uploader.py:548] Trying request of 8154 bytes
uploader.py:555] Upload for 1 runs (8154 bytes) took 0.254 seconds
uploader.py:548] Trying request of 8166 bytes
uploader.py:555] Upload for 1 runs (8166 bytes) took 0.311 seconds
directory_loader.py:131] Loading data from path /HOMEDIR/tensorboard_data/mnist/lr_1E-03,conv=2,fc=2/events.out.tfevents.1563406405.HOSTNAME
uploader.py:345] Skipping time series ('lr_1E-03,conv=2,fc=2', 'input/image/0') with unsupported plugin name 'images'
uploader.py:345] Skipping time series ('lr_1E-03,conv=2,fc=2', 'input/image/1') with unsupported plugin name 'images'
uploader.py:345] Skipping time series ('lr_1E-03,conv=2,fc=2', 'input/image/2') with unsupported plugin name 'images'
uploader.py:345] Skipping time series ('lr_1E-03,conv=2,fc=2', 'conv1/weights') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=2,fc=2', 'conv1/biases') with unsupported plugin name 'histograms'

Note that each time series (run-tag combination) appears only once, even
though we process the some time series in multiple upload cycles. For
instance, the second upload cycle here doesn’t print any “Skipping…”
messages, and the messages in the third upload cycle are for new time
series (conv=1 vs. conv=2).

@wchargin wchargin changed the base branch from wchargin-uploader-test-di-helpers to master February 28, 2020 21:10
wchargin-branch: uploader-plugincontrol
wchargin-source: e23a7157fccccc5badb0531cdae3fc41e5e8c993

# Conflicts:
#	tensorboard/uploader/uploader_test.py
wchargin-branch: uploader-plugincontrol
wchargin-source: e23a7157fccccc5badb0531cdae3fc41e5e8c993
Copy link
Contributor Author

@wchargin wchargin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that my comment below addresses your comment without any need
for code change, so I’ll plan to merge this once CI passes. If I’ve
misunderstood, feel free to correct me either before or after merging
and I will fix it to your satisfaction.

# If later events arrive with a mismatching plugin_name, they are
# ignored with a warning.
metadata = self._tag_metadata.get(time_series_key)
first_in_time_series = False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mechanism of detecting 'first time' is a little entangled hard to
follow. How about maintaining a collection of plugin_names_seen and
testing if plugin_name in plugin_names_seen:?

We warn once per time series, not once per plugin. So rather than a
collection of “plugin names seen”, we maintain a collection of “time
series seen”—but this is just the key set of self._tag_metadata.

Related, the 'first time' gets reset for every _upload_once().

Hmm; I’m not able to reproduce that? We create a _BatchedRequestSender
only once, at create_experiment time, and the “already seen?” source
of truth is on its self._tag_metadata instance attribute, which is
never cleared.

It may help to look at the log output to address both questions. The
output looks like:

directory_loader.py:131] Loading data from path /HOMEDIR/tensorboard_data/mnist/lr_1E-03,conv=1,fc=2/events.out.tfevents.1563406327.HOSTNAME
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'input/image/0') with unsupported plugin name 'images'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'input/image/1') with unsupported plugin name 'images'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'input/image/2') with unsupported plugin name 'images'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'conv/weights') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'conv/biases') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'conv/activations') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'fc1/weights') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'fc1/biases') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'fc1/activations') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'fc1/relu') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'fc2/weights') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'fc2/biases') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=1,fc=2', 'fc2/activations') with unsupported plugin name 'histograms'
uploader.py:548] Trying request of 8154 bytes
uploader.py:555] Upload for 1 runs (8154 bytes) took 0.254 seconds
uploader.py:548] Trying request of 8166 bytes
uploader.py:555] Upload for 1 runs (8166 bytes) took 0.311 seconds
directory_loader.py:131] Loading data from path /HOMEDIR/tensorboard_data/mnist/lr_1E-03,conv=2,fc=2/events.out.tfevents.1563406405.HOSTNAME
uploader.py:345] Skipping time series ('lr_1E-03,conv=2,fc=2', 'input/image/0') with unsupported plugin name 'images'
uploader.py:345] Skipping time series ('lr_1E-03,conv=2,fc=2', 'input/image/1') with unsupported plugin name 'images'
uploader.py:345] Skipping time series ('lr_1E-03,conv=2,fc=2', 'input/image/2') with unsupported plugin name 'images'
uploader.py:345] Skipping time series ('lr_1E-03,conv=2,fc=2', 'conv1/weights') with unsupported plugin name 'histograms'
uploader.py:345] Skipping time series ('lr_1E-03,conv=2,fc=2', 'conv1/biases') with unsupported plugin name 'histograms'

Note that each time series (run-tag combination) appears only once, even
though we process the some time series in multiple upload cycles. For
instance, the second upload cycle here doesn’t print any “Skipping…”
messages, and the messages in the third upload cycle are for new time
series (conv=1 vs. conv=2).

@wchargin wchargin merged commit f969c0d into master Feb 28, 2020
@wchargin wchargin deleted the wchargin-uploader-plugincontrol branch February 28, 2020 22:12
bileschi pushed a commit to bileschi/tensorboard that referenced this pull request Mar 3, 2020
Summary:
When the server sends a `PluginControl` message in the initial
handshake, we now only send data for plugins specified therein.

Test Plan:
Unit tests included. As end-to-end tests:

  - When running against prod (which does not send `plugin_control`),
    experiment upload is unchanged. Info logs now include “Skipping time
    series […] with unsupported plugin name 'histograms'”, just once per
    time series.

  - When running against a local server that serves scalars-only
    explicitly—

    ```
    plugin_control {
      allowed_plugins: "scalars"
    }
    ```

    —the behavior is also the same.

  - When running against a local server that serves “no plugins allowed”
    explicitly—

    ```
    plugin_control {
    }
    ```

    —nothing is uploaded, and the logs include “Skipping…” messages for
    scalar time series.

wchargin-branch: uploader-plugincontrol
@bileschi bileschi mentioned this pull request Mar 3, 2020
nfelt pushed a commit that referenced this pull request Mar 4, 2020
Summary:
When the server sends a `PluginControl` message in the initial
handshake, we now only send data for plugins specified therein.

Test Plan:
Unit tests included. As end-to-end tests:

  - When running against prod (which does not send `plugin_control`),
    experiment upload is unchanged. Info logs now include “Skipping time
    series […] with unsupported plugin name 'histograms'”, just once per
    time series.

  - When running against a local server that serves scalars-only
    explicitly—

    ```
    plugin_control {
      allowed_plugins: "scalars"
    }
    ```

    —the behavior is also the same.

  - When running against a local server that serves “no plugins allowed”
    explicitly—

    ```
    plugin_control {
    }
    ```

    —nothing is uploaded, and the logs include “Skipping…” messages for
    scalar time series.

wchargin-branch: uploader-plugincontrol
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants