-
Notifications
You must be signed in to change notification settings - Fork 1.7k
uploader: inline graph filtering from dataclass_compat
#3510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
54c2da2
61ab333
8a4247c
aac18b4
ab52a51
d17e9f9
94a974c
2ac4935
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,16 +25,21 @@ | |
| import grpc | ||
| import six | ||
|
|
||
| from google.protobuf import message | ||
| from tensorboard.compat.proto import graph_pb2 | ||
| from tensorboard.compat.proto import summary_pb2 | ||
| from tensorboard.compat.proto import types_pb2 | ||
| from tensorboard.uploader.proto import write_service_pb2 | ||
| from tensorboard.uploader.proto import experiment_pb2 | ||
| from tensorboard.uploader import logdir_loader | ||
| from tensorboard.uploader import util | ||
| from tensorboard import data_compat | ||
| from tensorboard import dataclass_compat | ||
| from tensorboard.backend import process_graph | ||
| from tensorboard.backend.event_processing import directory_loader | ||
| from tensorboard.backend.event_processing import event_file_loader | ||
| from tensorboard.backend.event_processing import io_wrapper | ||
| from tensorboard.plugins.graph import metadata as graphs_metadata | ||
| from tensorboard.plugins.scalar import metadata as scalar_metadata | ||
| from tensorboard.util import grpc_util | ||
| from tensorboard.util import tb_logging | ||
|
|
@@ -425,12 +430,11 @@ def _run_values(self, run_to_events): | |
| for (run_name, events) in six.iteritems(run_to_events): | ||
| for event in events: | ||
| v2_event = data_compat.migrate_event(event) | ||
| dataclass_events = dataclass_compat.migrate_event( | ||
| v2_event, experimental_filter_graph=True | ||
| ) | ||
| for dataclass_event in dataclass_events: | ||
| if dataclass_event.summary: | ||
| for value in dataclass_event.summary.value: | ||
| events = dataclass_compat.migrate_event(v2_event) | ||
| events = _filter_graph_defs(events) | ||
| for event in events: | ||
| if event.summary: | ||
| for value in event.summary.value: | ||
| yield (run_name, event, value) | ||
|
|
||
|
|
||
|
|
@@ -833,3 +837,41 @@ def _varint_cost(n): | |
| result += 1 | ||
| n >>= 7 | ||
| return result | ||
|
|
||
|
|
||
| def _filter_graph_defs(events): | ||
| for e in events: | ||
| for v in e.summary.value: | ||
| if ( | ||
| v.metadata.plugin_data.plugin_name | ||
| != graphs_metadata.PLUGIN_NAME | ||
| ): | ||
| continue | ||
| if v.tag == graphs_metadata.RUN_GRAPH_NAME: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems unnecessarily restrictive. In practice, for now, we have only run-level graphs anyway. But IIRC we expect to allow (I know this is a straight refactor of code that already ignored those cases, but still)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As you note, I’m porting existing code. Op graphs and conceptual graphs
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh right, I forgot about the different plugin names. So this condition is protecting us from data that is associated with the graph plugin, but under a different tag. There is currently no such thing; if something arises in the future, we can't predict whether it should be filtered or not-- so now I agree the condition is correct. |
||
| data = list(v.tensor.string_val) | ||
| filtered_data = [_filtered_graph_bytes(x) for x in data] | ||
| filtered_data = [x for x in filtered_data if x is not None] | ||
| if filtered_data != data: | ||
| new_tensor = tensor_util.make_tensor_proto( | ||
| filtered_data, dtype=types_pb2.DT_STRING | ||
| ) | ||
| v.tensor.CopyFrom(new_tensor) | ||
| yield e | ||
|
|
||
|
|
||
| def _filtered_graph_bytes(graph_bytes): | ||
| try: | ||
| graph_def = graph_pb2.GraphDef().FromString(graph_bytes) | ||
| # The reason for the RuntimeWarning catch here is b/27494216, whereby | ||
| # some proto parsers incorrectly raise that instead of DecodeError | ||
| # on certain kinds of malformed input. Triggering this seems to require | ||
| # a combination of mysterious circumstances. | ||
| except (message.DecodeError, RuntimeWarning): | ||
| logger.warning( | ||
| "Could not parse GraphDef of size %d. Skipping.", len(graph_bytes), | ||
| ) | ||
| return None | ||
| # Use the default filter parameters: | ||
| # limit_attr_size=1024, large_attrs_key="_too_large_attrs" | ||
| process_graph.prepare_graph_for_ui(graph_def) | ||
| return graph_def.SerializeToString() | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -33,6 +33,7 @@ | |
|
|
||
| import tensorflow as tf | ||
|
|
||
| from google.protobuf import message | ||
| from tensorboard.uploader.proto import experiment_pb2 | ||
| from tensorboard.uploader.proto import scalar_pb2 | ||
| from tensorboard.uploader.proto import write_service_pb2 | ||
|
|
@@ -359,6 +360,67 @@ def test_upload_skip_large_blob(self): | |
| self.assertEqual(0, mock_rate_limiter.tick.call_count) | ||
| self.assertEqual(1, mock_blob_rate_limiter.tick.call_count) | ||
|
|
||
| def test_filter_graphs(self): | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for cleaning up and simplifying these tests! However, I am a little skeptical of this design because the tests are no longer independent. Can you factor out some common setup but keep the tests separate (while retaining most of the simplicity)? Also, why did you remove the explicit tests for Finally FYI #3509 is outstanding. NP, I'll merge it, assuming you submit first. The behavior of the empty graph case may actually change, depending on how you address the 'corrupt graph' case above.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The tests do fail independently. Python Thus: import unittest
class Test(unittest.TestCase):
def test(self):
x = 2
with self.subTest("should be one"):
self.assertEqual(x, 1)
with self.subTest("but also two"):
self.assertEqual(x, 2)
with self.subTest("and somehow three"):
self.assertEqual(x, 3)
if __name__ == "__main__":
unittest.main()It’s true that any side effects in one sub-test would be visible in the I’m not sure that I fully understand your concern; is this satisfactory?
Mock-based tests are fragile, and the one proposed in #3508 is no The proposed test covers that the code does what we expect it to if a It’s always tradeoffs in the end. It is my opinion that the test does
Thanks.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK for now for expediency. For the record, my concern is that the setup code is not generic; it includes setup specific to the tests (e.g., the values and order of the bytes examples). If I wanted to add another subtest--perhaps the one from #3509--I'd have to change the setup code here. I can see this both ways (i.e., I could also add more tests that use the same examples, without changing the setup). The fact that currently parts of the setup correspond 1:1 with specific tests makes it look more entangled than it necessarily is. |
||
| # Three graphs: one short, one long, one corrupt. | ||
| bytes_0 = _create_example_graph_bytes(123) | ||
| bytes_1 = _create_example_graph_bytes(9999) | ||
| # invalid (truncated) proto: length-delimited field 1 (0x0a) of | ||
| # length 0x7f specified, but only len("bogus") = 5 bytes given | ||
| # <https://developers.google.com/protocol-buffers/docs/encoding> | ||
| bytes_2 = b"\x0a\x7fbogus" | ||
|
|
||
| logdir = self.get_temp_dir() | ||
| for (i, b) in enumerate([bytes_0, bytes_1, bytes_2]): | ||
| run_dir = os.path.join(logdir, "run_%04d" % i) | ||
| event = event_pb2.Event(step=0, wall_time=123 * i, graph_def=b) | ||
| with tb_test_util.FileWriter(run_dir) as writer: | ||
| writer.add_event(event) | ||
|
|
||
| limiter = mock.create_autospec(util.RateLimiter) | ||
| limiter.tick.side_effect = [None, AbortUploadError] | ||
| mock_client = _create_mock_client() | ||
| uploader = _create_uploader( | ||
| mock_client, | ||
| logdir, | ||
| logdir_poll_rate_limiter=limiter, | ||
| allowed_plugins=[ | ||
| scalars_metadata.PLUGIN_NAME, | ||
| graphs_metadata.PLUGIN_NAME, | ||
| ], | ||
| ) | ||
| uploader.create_experiment() | ||
|
|
||
| with self.assertRaises(AbortUploadError): | ||
| uploader.start_uploading() | ||
|
|
||
| actual_blobs = [] | ||
| for call in mock_client.WriteBlob.call_args_list: | ||
| requests = call[0][0] | ||
| actual_blobs.append(b"".join(r.data for r in requests)) | ||
|
|
||
| actual_graph_defs = [] | ||
| for blob in actual_blobs: | ||
| try: | ||
| actual_graph_defs.append(graph_pb2.GraphDef.FromString(blob)) | ||
| except message.DecodeError: | ||
| actual_graph_defs.append(None) | ||
|
|
||
| with self.subTest("graphs with small attr values should be unchanged"): | ||
| expected_graph_def_0 = graph_pb2.GraphDef.FromString(bytes_0) | ||
| self.assertEqual(actual_graph_defs[0], expected_graph_def_0) | ||
|
|
||
| with self.subTest("large attr values should be filtered out"): | ||
| expected_graph_def_1 = graph_pb2.GraphDef.FromString(bytes_1) | ||
| del expected_graph_def_1.node[1].attr["large"] | ||
| expected_graph_def_1.node[1].attr["_too_large_attrs"].list.s.append( | ||
| b"large" | ||
| ) | ||
| requests = list(mock_client.WriteBlob.call_args[0][0]) | ||
| self.assertEqual(actual_graph_defs[1], expected_graph_def_1) | ||
|
|
||
| with self.subTest("corrupt graphs should be skipped"): | ||
| self.assertLen(actual_blobs, 2) | ||
|
|
||
| def test_upload_server_error(self): | ||
| mock_client = _create_mock_client() | ||
| mock_rate_limiter = mock.create_autospec(util.RateLimiter) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't shadow
events(andeventbelow)-- that is confusing and error-prone.dataclass_eventswas fine imho.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experience, this form of shadowing is less error-prone than the
alternative, because it is never possible to accidentally refer to the
old value. Case in point: prior to this change, the code was subtly
incorrect, because the
yieldexpression referred toeventratherthan
dataclass_event. (This would be observable if a dataclasstransformation were to affect the wall time or step—certainly allowed,
and perfectly conceivable for conceptually run-level summaries like
graphs or hparams.)
Shadowing makes that kind of error structurally impossible, and, as you
correctly note, this shadowing is entirely lexical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for what it's worth, all of this goes away in the subsequent PR anyway (https://github.com/tensorflow/tensorboard/pull/3511/files#diff-64ad30888691c0bf32cae63247f4ca5c).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remain opposed to any shadowing whatsoever, but don't need to argue this case since it's going away in #3511
(Eep, the former bug does suck, though; thanks for the fix).