audio: add generic data support #3514

wchargin · 2020-04-14T03:27:05Z

Summary:
This patch teaches the audio plugin to support generic data providers.

This requires a few more list_blob_sequences queries than we’d like,
due to the current structure of the frontend. Some have been avoided by
pushing the content type of the waveform into the query string. A bit of
care is required to ensure that a malicious request cannot turn this
into an XSS hole; we do so by simply whitelisting a (singleton) set of
audio content types, which means that the safety properties are
unchanged by this patch.

Test Plan:
The audio dashboard now works even if the multiplexer is removed. The
audio dashboard doesn’t have any “extras” (data download links, etc.),
so just checking the basic functionality suffices. Unit tests updated.

wchargin-branch: audio-generic

Summary: Most versions of the audio summary APIs only support writing audio data, but the TB 1.x `tensorboard.summary.audio` ops also support attaching a “label” (Markdown text) to each audio clip. This feature is not present in any version of the TensorFlow summary APIs, including `tf.contrib`, and is not present in the TB 2.x `tensorboard.summary.audio` API. It hasn’t been widely adopted, and doesn’t integrate nicely in its current form with upcoming efforts like migrating the audio dashboard to use the generic data APIs. This commit causes labels to no longer be read by TensorBoard. Labels are still written by the TB 1.x summary ops if specified, but now print a warning. Because we do not change the write paths, it is important that `audio.metadata` *not* set `DATA_CLASS_BLOB_SEQUENCE`, because otherwise audio summaries will not be touched by `dataclass_compat`. Tests for `dataclass_compat` cover this. We don’t think that this feature has wide adoption. If this change gets significant pushback, we can look into restoring labels with a different implementation, likely as a parallel tensor time series. Test Plan: Unit tests updated. As a manual test, TensorBoard still works on both legacy audio data and audio data in the new form (that’s not actually written to disk yet), and simply does not display any labels. wchargin-branch: audio-no-labels

Summary: We initially used `dataclass_compat` to perform filtering of large graphs as a stopgap mechanism. This commit moves that filtering into the uploader, which is the only surface in which it’s actually used. As a result, `dataclass_compat` no longer takes extra arguments and so can be moved into `EventFileLoader` in a future change. Test Plan: Unit tests added to the uploader for the small graph, large graph, and corrupt graph cases. wchargin-branch: uploader-graph-filtering wchargin-source: 00af6ebe200fe60aacbc23f468eb44126d2d33f7

Summary: The `data_compat` and `dataclass_compat` transforms are now effected by the `EventFileLoader` class. The event accumulator and uploader therefore no longer need to effect these transformations manually. Test Plan: Unit tests that use mocking have been updated to apply compat transforms manually. Using the uploader with `--plugin scalars,graphs` still works as intended. wchargin-branch: efl-compat-transforms wchargin-source: d5b86d851f5dff6f24ddc8cfac52c740ab8af1b3

Summary: The `dataclass_compat` transform dispatches on plugin name, which is stored in the summary metadata. But the semantics of summary metadata is that it need only be set in the first event of each time series. Thus, until now, events past the first step of a time series have not been modified by `dataclass_compat`. This has been fine because all transformations have been either (a) only metadata transforms (which don’t matter after the first step anyway) or (b) tensor transforms to hparams data (which only ever has one step anyway). But an upcoming change will need to transform every step of each audio time series, so we need to identify which summaries have audio data even when they don’t have plugin name set on the same event. To do so, we let `dataclass_compat` update a stateful map from tag name to the first `SummaryMetadata` seen for that time series (scoped to a particular run). Test Plan: Some unit tests included now; the tests for migration of audio summaries will be more complete because they have more interesting things to test. wchargin-branch: dataclass-compat-initial-metadata wchargin-source: 364d3b33f32452e8d3ac9468e6bcca4de876963a

wchargin-branch: uploader-graph-filtering wchargin-source: 9179acc76158380d9f254df1d0a0aae1ec82df6c

wchargin-branch: efl-compat-transforms wchargin-source: 4fa593a34147c7b05c7f4c55f3ce9279242a7414

wchargin-branch: audio-no-labels wchargin-source: 7c7844bff26668d32d692ef7faa0dba3ecbaf138 # Conflicts: # tensorboard/dataclass_compat.py

wchargin-branch: audio-no-labels wchargin-source: 7c7844bff26668d32d692ef7faa0dba3ecbaf138

Summary: This patch teaches the audio plugin to support generic data providers. This requires a few more `list_blob_sequences` queries than we’d like, due to the current structure of the frontend. Some have been avoided by pushing the content type of the waveform into the query string. A bit of care is required to ensure that a malicious request cannot turn this into an XSS hole; we do so by simply whitelisting a (singleton) set of audio content types, which means that the safety properties are unchanged by this patch. Test Plan: With `--generic_data true`, the audio dashboard now works even if the multiplexer is removed. The audio dashboard doesn’t have any “extras” (data download links, etc.), so just checking the basic functionality suffices. Unit tests also updated. wchargin-branch: audio-generic wchargin-source: 5b223026c6cb8e22bfe5d8d9654459f61fa3a45e

wchargin-branch: uploader-graph-filtering wchargin-source: b89f6da9cd740080aefecec49194752b0de85dd4

wchargin-branch: efl-compat-transforms wchargin-source: 8abfd3ec5c7394e7b1930b4bf42d2de816e67da4 # Conflicts: # tensorboard/uploader/uploader.py

wchargin-branch: efl-compat-transforms wchargin-source: 8abfd3ec5c7394e7b1930b4bf42d2de816e67da4

wchargin-branch: dataclass-compat-initial-metadata wchargin-source: cfd81441162803b6bab104bc76d93c3b00e8c381

wchargin-branch: audio-no-labels wchargin-source: eb55e6cf2dcb2dc50a10201296e04c3a030bca3d

wchargin-branch: audio-generic wchargin-source: 0d4ed8098d4fdd3e7353ce81e45237abed8ffd0b

Summary: Fixes an oversight in #3507: we can’t assert that the raw bytes are what was expected because the code under test does a proto serialization roundtrip, which is permitted to permute keys. Test Plan: Tests still pass, and this should fix an internal sync error. wchargin-branch: uploader-test-proto-equal

wchargin-branch: uploader-graph-filtering wchargin-source: 30f0764698d66cdaf481c797dc56bf6a34ae3ae0

wchargin-branch: efl-compat-transforms wchargin-source: f18ea5648a96010b48237fe7743e5f5d0d535acb

wchargin-branch: dataclass-compat-initial-metadata wchargin-source: 9f516c3b25569b4e007bb929bb456541e5433f52

wchargin-branch: audio-no-labels wchargin-source: 7a379545e5195047d46d162de36606433a04cf18

wchargin-branch: audio-generic wchargin-source: fc3b8e088f8b71b092396755275fb8a895755dba

wchargin-branch: uploader-graph-filtering wchargin-source: 63adea8ca2f0a85bd9f061aee72435f04d1d56d8 # Conflicts: # tensorboard/uploader/uploader_test.py

wchargin-branch: uploader-graph-filtering wchargin-source: 63adea8ca2f0a85bd9f061aee72435f04d1d56d8

wchargin-branch: efl-compat-transforms wchargin-source: 4fdc21c37611fab6a00384e9e824cd60302f6def

wchargin-branch: efl-compat-transforms wchargin-source: 124d838ae368a9d9c9f061a39a72056554c89790 # Conflicts: # tensorboard/uploader/uploader.py # tensorboard/uploader/uploader_test.py

wchargin-branch: efl-compat-transforms wchargin-source: 124d838ae368a9d9c9f061a39a72056554c89790

wchargin-branch: efl-compat-transforms wchargin-source: 41caa7efc7790458360d47924ce80e0d8f0ef248

wchargin-branch: dataclass-compat-initial-metadata wchargin-source: 9bceb864dae1a69432349daf80ebb4db6e19278b

wchargin-branch: audio-no-labels wchargin-source: f0ba8b1511c55b4596450e49d7dbe69d2170fdcc

wchargin-branch: audio-generic wchargin-source: be1113153b02927ea35b953598e5f37866028801

wchargin-branch: dataclass-compat-initial-metadata wchargin-source: e87f73c1f892413fb38f913c3da0db230f9d8c61 # Conflicts: # tensorboard/backend/event_processing/event_file_loader.py # tensorboard/backend/event_processing/plugin_event_accumulator_test.py # tensorboard/uploader/uploader_test.py

wchargin-branch: dataclass-compat-initial-metadata wchargin-source: e87f73c1f892413fb38f913c3da0db230f9d8c61

wchargin-branch: audio-no-labels wchargin-source: 3ef10438c3ca65d6cff434e277b72807fd27cf36

wchargin-branch: audio-generic wchargin-source: a8e5986d9788550a07672f1bf203d27d8c289a50

nfelt · 2020-04-16T23:23:22Z

tensorboard/plugins/audio/audio_plugin.py

-        return bool(
-            self._multiplexer.PluginRunToTagToContent(metadata.PLUGIN_NAME)
-        )
+        return False


maybe add the # `list_plugins` as called by TB core suffices comment here as well?

nfelt · 2020-04-16T23:38:31Z

tensorboard/plugins/audio/audio_plugin.py

+        experiment = plugin_util.experiment_id(request.environ)
+        mime_type = request.args["content_type"]
+        if mime_type not in _ALLOWED_MIME_TYPES:
+            raise errors.InvalidArgumentError(


Thanks for catching this possible issue! Can we add a test that exercises this to ensure that we refuse to return audio as, say, text/javascript?

I think the Werkzeug client get() API takes a content_type parameter so this should be straightforward.

In the longer term it would be nice if we didn't have to rely on the client behavior for this. Perhaps we should add a per-blob content_type onto the blob data model? We would have to go around adding it to the various data providers and being sure to set it correctly on upload, but it seems worth doing to properly handle cases like this.

Sure: done, I think, though I didn’t understand what you meant about the
Werkzeug client API taking a content_type parameter. (Isn’t that for
the content type of a POST request body?)

Perhaps we should add a per-blob content_type onto the blob data
model?

Perhaps. Presumably the source of truth would be a new field on the
SummaryMetadata proto (string content_type = 5;), required to be
homogeneous within a time series across all steps and indices, and
defaulting to application/octet-stream if empty for backward
compatibility? I’ll keep it in mind.

Ah, I was confused and misread this and somehow thought we were getting the content type from a header that the browser was populating (even though yes, that isn't a thing for GET requests) rather than the query parameter. Never mind :P

Re: per-blob content types, yeah, I guess it could have a field in SummaryMetadata, although it's perhaps a little odd to have a field that is returned per-value stored in the per-tag SummaryMetadata (even if for now we don't anticipate allowing it to vary per-value).

wchargin-branch: dataclass-compat-initial-metadata wchargin-source: ba0efc9d5159b2f74982e63b6ca884c7f2380495

wchargin-branch: audio-no-labels wchargin-source: dc72cdd9d4dd9383e196b556b6a91917cfd9b7dd # Conflicts: # tensorboard/dataclass_compat.py

wchargin-branch: audio-no-labels wchargin-source: dc72cdd9d4dd9383e196b556b6a91917cfd9b7dd

wchargin-branch: audio-generic wchargin-source: 2ec65f28a77e56f5cf5c786d7020f652a2cdde36

nfelt · 2020-04-17T00:58:46Z

tensorboard/plugins/audio/audio_plugin.py

+        experiment = plugin_util.experiment_id(request.environ)
+        mime_type = request.args["content_type"]
+        if mime_type not in _ALLOWED_MIME_TYPES:
+            raise errors.InvalidArgumentError(


Ah, I was confused and misread this and somehow thought we were getting the content type from a header that the browser was populating (even though yes, that isn't a thing for GET requests) rather than the query parameter. Never mind :P

Re: per-blob content types, yeah, I guess it could have a field in SummaryMetadata, although it's perhaps a little odd to have a field that is returned per-value stored in the per-tag SummaryMetadata (even if for now we don't anticipate allowing it to vary per-value).

wchargin-branch: audio-no-labels wchargin-source: adbc9272e97c6ba2a70d84cf055baaab7d166f25 # Conflicts: # tensorboard/dataclass_compat.py

wchargin-branch: audio-no-labels wchargin-source: adbc9272e97c6ba2a70d84cf055baaab7d166f25

wchargin-branch: audio-generic wchargin-source: cfcd8b069dc99defbedfe71f93f5444ad62d66da

wchargin-branch: audio-generic wchargin-source: ee6bb07e2331836178cd4329427286f69abe43ad # Conflicts: # tensorboard/plugins/audio/audio_plugin.py

wchargin-branch: audio-generic wchargin-source: ee6bb07e2331836178cd4329427286f69abe43ad

Summary: This patch teaches the audio plugin to support generic data providers. This requires a few more `list_blob_sequences` queries than we’d like, due to the current structure of the frontend. Some have been avoided by pushing the content type of the waveform into the query string. A bit of care is required to ensure that a malicious request cannot turn this into an XSS hole; we do so by simply whitelisting a (singleton) set of audio content types, which means that the safety properties are unchanged by this patch. Test Plan: The audio dashboard now works even if the multiplexer is removed. The audio dashboard doesn’t have any “extras” (data download links, etc.), so just checking the basic functionality suffices. Unit tests updated. wchargin-branch: audio-generic

wchargin added 9 commits April 9, 2020 11:15

[update patch]

61ab333

wchargin-branch: uploader-graph-filtering wchargin-source: 9179acc76158380d9f254df1d0a0aae1ec82df6c

[update diffbase]

45b02e7

wchargin-branch: efl-compat-transforms wchargin-source: 4fa593a34147c7b05c7f4c55f3ce9279242a7414

[update diffbase]

76843be

wchargin-branch: audio-no-labels wchargin-source: 7c7844bff26668d32d692ef7faa0dba3ecbaf138 # Conflicts: # tensorboard/dataclass_compat.py

[update patch]

a12f351

wchargin-branch: audio-no-labels wchargin-source: 7c7844bff26668d32d692ef7faa0dba3ecbaf138

wchargin added the plugin:audio label Apr 14, 2020

googlebot added the cla: yes label Apr 14, 2020

wchargin added 19 commits April 14, 2020 01:18

[update patch]

8a4247c

wchargin-branch: uploader-graph-filtering wchargin-source: b89f6da9cd740080aefecec49194752b0de85dd4

[bump ci]

aac18b4

wchargin-branch: uploader-graph-filtering wchargin-source: b89f6da9cd740080aefecec49194752b0de85dd4

[update diffbase]

8ef2886

wchargin-branch: efl-compat-transforms wchargin-source: 8abfd3ec5c7394e7b1930b4bf42d2de816e67da4 # Conflicts: # tensorboard/uploader/uploader.py

[update patch]

f159fa5

wchargin-branch: efl-compat-transforms wchargin-source: 8abfd3ec5c7394e7b1930b4bf42d2de816e67da4

[update diffbase]

6d11b7c

wchargin-branch: dataclass-compat-initial-metadata wchargin-source: cfd81441162803b6bab104bc76d93c3b00e8c381

[update diffbase]

6c2d8ae

wchargin-branch: audio-no-labels wchargin-source: eb55e6cf2dcb2dc50a10201296e04c3a030bca3d

[update diffbase]

7ea1d0c

wchargin-branch: audio-generic wchargin-source: 0d4ed8098d4fdd3e7353ce81e45237abed8ffd0b

[update diffbase]

d17e9f9

wchargin-branch: uploader-graph-filtering wchargin-source: 30f0764698d66cdaf481c797dc56bf6a34ae3ae0

[update diffbase]

84fd685

wchargin-branch: efl-compat-transforms wchargin-source: f18ea5648a96010b48237fe7743e5f5d0d535acb

[update diffbase]

ff15dc7

wchargin-branch: dataclass-compat-initial-metadata wchargin-source: 9f516c3b25569b4e007bb929bb456541e5433f52

[update diffbase]

a0edd31

wchargin-branch: audio-no-labels wchargin-source: 7a379545e5195047d46d162de36606433a04cf18

[update diffbase]

1a04730

wchargin-branch: audio-generic wchargin-source: fc3b8e088f8b71b092396755275fb8a895755dba

[update diffbase]

94a974c

wchargin-branch: uploader-graph-filtering wchargin-source: 63adea8ca2f0a85bd9f061aee72435f04d1d56d8 # Conflicts: # tensorboard/uploader/uploader_test.py

[update patch]

2ac4935

wchargin-branch: uploader-graph-filtering wchargin-source: 63adea8ca2f0a85bd9f061aee72435f04d1d56d8

[update diffbase]

7c33f3e

wchargin-branch: efl-compat-transforms wchargin-source: 4fdc21c37611fab6a00384e9e824cd60302f6def

[update diffbase]

a9adf93

wchargin-branch: efl-compat-transforms wchargin-source: 124d838ae368a9d9c9f061a39a72056554c89790 # Conflicts: # tensorboard/uploader/uploader.py # tensorboard/uploader/uploader_test.py

[update patch]

8f469cc

wchargin-branch: efl-compat-transforms wchargin-source: 124d838ae368a9d9c9f061a39a72056554c89790

[update diffbase]

ac58c6d

wchargin-branch: efl-compat-transforms wchargin-source: 41caa7efc7790458360d47924ce80e0d8f0ef248

wchargin added 4 commits April 16, 2020 10:49

[update diffbase]

61cc397

wchargin-branch: dataclass-compat-initial-metadata wchargin-source: 9bceb864dae1a69432349daf80ebb4db6e19278b

[update patch]

524d0bc

wchargin-branch: dataclass-compat-initial-metadata wchargin-source: 9bceb864dae1a69432349daf80ebb4db6e19278b

[update diffbase]

94e1afe

wchargin-branch: audio-no-labels wchargin-source: f0ba8b1511c55b4596450e49d7dbe69d2170fdcc

[update diffbase]

748938a

wchargin-branch: audio-generic wchargin-source: be1113153b02927ea35b953598e5f37866028801

wchargin requested a review from nfelt April 16, 2020 17:49

wchargin added 4 commits April 16, 2020 15:38

[update patch]

d553605

wchargin-branch: dataclass-compat-initial-metadata wchargin-source: e87f73c1f892413fb38f913c3da0db230f9d8c61

[update diffbase]

83b3404

wchargin-branch: audio-no-labels wchargin-source: 3ef10438c3ca65d6cff434e277b72807fd27cf36

[update diffbase]

254eb40

wchargin-branch: audio-generic wchargin-source: a8e5986d9788550a07672f1bf203d27d8c289a50

nfelt approved these changes Apr 16, 2020

View reviewed changes

wchargin added 5 commits April 16, 2020 17:32

[update patch]

3d1d73f

wchargin-branch: dataclass-compat-initial-metadata wchargin-source: ba0efc9d5159b2f74982e63b6ca884c7f2380495

[update diffbase]

0a6d49f

wchargin-branch: audio-no-labels wchargin-source: dc72cdd9d4dd9383e196b556b6a91917cfd9b7dd # Conflicts: # tensorboard/dataclass_compat.py

[update patch]

c1da800

wchargin-branch: audio-no-labels wchargin-source: dc72cdd9d4dd9383e196b556b6a91917cfd9b7dd

[update diffbase]

12af0db

wchargin-branch: audio-generic wchargin-source: 2ec65f28a77e56f5cf5c786d7020f652a2cdde36

[update patch]

800a3b1

wchargin-branch: audio-generic wchargin-source: 2ec65f28a77e56f5cf5c786d7020f652a2cdde36

nfelt approved these changes Apr 17, 2020

View reviewed changes

wchargin added 3 commits April 16, 2020 17:59

[update diffbase]

2b095ce

wchargin-branch: audio-no-labels wchargin-source: adbc9272e97c6ba2a70d84cf055baaab7d166f25 # Conflicts: # tensorboard/dataclass_compat.py

[update patch]

fd201f6

wchargin-branch: audio-no-labels wchargin-source: adbc9272e97c6ba2a70d84cf055baaab7d166f25

[update diffbase]

ba64453

wchargin-branch: audio-generic wchargin-source: cfcd8b069dc99defbedfe71f93f5444ad62d66da

wchargin changed the base branch from wchargin-audio-no-labels to master April 17, 2020 01:44

wchargin added 2 commits April 16, 2020 18:45

[update diffbase]

1cc2861

wchargin-branch: audio-generic wchargin-source: ee6bb07e2331836178cd4329427286f69abe43ad # Conflicts: # tensorboard/plugins/audio/audio_plugin.py

[update patch]

28a8148

wchargin-branch: audio-generic wchargin-source: ee6bb07e2331836178cd4329427286f69abe43ad

wchargin merged commit b85fd87 into master Apr 17, 2020

wchargin deleted the wchargin-audio-generic branch April 17, 2020 03:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

audio: add generic data support #3514

audio: add generic data support #3514

Uh oh!

wchargin commented Apr 14, 2020 •

edited

Loading

Uh oh!

nfelt Apr 16, 2020

Uh oh!

wchargin Apr 17, 2020

Uh oh!

nfelt Apr 16, 2020

Uh oh!

wchargin Apr 17, 2020

Uh oh!

nfelt Apr 17, 2020

Uh oh!

nfelt Apr 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

audio: add generic data support #3514

audio: add generic data support #3514

Uh oh!

Conversation

wchargin commented Apr 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nfelt Apr 16, 2020

Choose a reason for hiding this comment

Uh oh!

wchargin Apr 17, 2020

Choose a reason for hiding this comment

Uh oh!

nfelt Apr 16, 2020

Choose a reason for hiding this comment

Uh oh!

wchargin Apr 17, 2020

Choose a reason for hiding this comment

Uh oh!

nfelt Apr 17, 2020

Choose a reason for hiding this comment

Uh oh!

nfelt Apr 17, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wchargin commented Apr 14, 2020 •

edited

Loading