rust: support legacy TF 1.x histogram summaries #4740

nfelt · 2021-03-05T22:57:43Z

This implements support in Rustboard for converting legacy TF 1.x histogram summaries that use HistogramProto into the k-by-3 tensor format that TensorBoard expects. We adopt the same logic as data_compat.py for consistency, even though the semantics are a bit dubious.

Test plan: unit tests, plus I manually generated an organic TF 1.x histogram summary and confirmed that it now A) shows up in Rustboard-backed TB and B) looks the same as in slow-loading TB.

Part of #4422 tensor support sub-task.

wchargin

I manually generated an organic TF 1.x histogram summary

You can use gs://tensorboard-bench-logs/mnist here, too. Can confirm
that it shows histograms at this commit but not before it, even with
your core tensor support.

wchargin · 2021-03-06T00:10:05Z

tensorboard/data/server/data_compat.rs

+                        dim: vec![
+                            pb::tensor_shape_proto::Dim {
+                                size: num_buckets as i64,
+                                ..Default::default()
+                            },
+                            pb::tensor_shape_proto::Dim {
+                                size: 3,
+                                ..Default::default()
+                            },
+                        ],
+                        ..Default::default()
+                    }),


Optional: this is just tensor_shape(&[num_buckets as i64, 3]) if you
promote fn tensor_shape from a test helper to a real helper.

True, but if you don't mind, I think I'll leave it as just a test helper for now since it's not all that bad written out, and I'm feeling a little burned from sinking a fair amount of time earlier generalizing this into a generic TensorProto construction utility (as we'd discussed, with the extension trait etc) before emerging from a pile of yak hair and deciding that there really wasn't enough boilerplate and I was ending up with net more complexity and lines of code from writing this utility and its tests than I was saving in the first place.

(Not that I necessarily object to such a utility at some point especially if we end up with more of these manipulations, to be clear; just feeling meh about it at this stage.)

Sure, sounds good. Appreciate the awareness of the yak density and happy
that you feel comfortable calling that out.

wchargin · 2021-03-06T00:17:28Z

tensorboard/data/server/data_compat.rs

+                // bucket right edges; the first bucket's left edge is assumed to be -DBL_MAX and
+                // subsequent left edges are defined as the right edge of the preceeding bucket.
+                //
+                // Our conversion logic in data_compat.py however disobeys this and instead sets the


Sigh—my bad, probably. Put it on the list of “ways in which the
histogram transformation pipeline makes zero sense”.

No worries, was not trying to cast aspersions :) I too have known about this quirk for a long time and not fixed it and then keep forgetting exactly what the quirk was, so I figured I'd just write it down here for posterity. Maybe 2021 will be the year we actually revisit histogram binning...

wchargin · 2021-03-06T00:29:19Z

tensorboard/data/server/data_compat.rs

+                let num_buckets = hp.bucket.len();
+                let bucket_edges = || hp.bucket_limit.iter().take(num_buckets - 1).copied();


Shall we check that bucket.len() == bucket_limit.len() and signal data
loss otherwise? or at least take num_buckets to be the shorter of the
two lengths? As written, it looks like bucket_lefts could be much
shorter than bucket_counts, which would make the output shape not
correct.

Also, if hp.bucket.len() == 0, the num_buckets - 1 is an error
(panic in debug mode), so let’s do something nicer in that case?
Maybe worth a test (up to you).

In case you’re interested, my sketch of this routine took quite a
different approach (imperative instead of streamy). I think that I like
your idea to use tensor_content, since at least cloning a Bytes is
cheap compared to cloning a Vec<f64>.

Fair enough; updated to require the lengths match and to support the empty case (thanks for pointing that out), and added tests for each.

Re: streaminess what's the fun of having ~~zero~~ 𝜖-cost abstractions if we aren't gonna use 'em? 🦀 But I do actually find it easier to read personally than the offset-arithmetic version.

I went with tensor_content mostly to match what tf.make_tensor_proto will produce, FWIW.

Lovely; thanks! Yeah, I’m perfectly happy with the streams here. I’m
part of the “use streams judiciously” crowd, and this looks like a
pretty easy sell. It’s definitely more readable than mine.

wchargin · 2021-03-06T00:33:34Z

tensorboard/data/server/data_compat.rs

+                        ],
+                        ..Default::default()
+                    }),
+                    tensor_content: tensor_content,


nit: Clippy says to pun this (tensor_content,).

wchargin · 2021-03-06T00:34:23Z

tensorboard/data/server/data_compat.rs

+                bucket: vec![0.0, 10.0, 20.0, 20.0, 10.0, 0.0],
+                ..Default::default()
+            };
+            let v = EventValue::Summary(SummaryValue(Box::new(Value::Histo(hp.clone()))));


nit: Clippy says these clones are useless (hp.clone() → hp, and below).

Done, must have been vestigial.

nfelt

PTAL

nfelt · 2021-03-06T01:50:41Z

tensorboard/data/server/data_compat.rs

+                bucket: vec![0.0, 10.0, 20.0, 20.0, 10.0, 0.0],
+                ..Default::default()
+            };
+            let v = EventValue::Summary(SummaryValue(Box::new(Value::Histo(hp.clone()))));


Done, must have been vestigial.

nfelt · 2021-03-06T01:50:53Z

tensorboard/data/server/data_compat.rs

+                        ],
+                        ..Default::default()
+                    }),
+                    tensor_content: tensor_content,


nfelt · 2021-03-06T01:57:46Z

tensorboard/data/server/data_compat.rs

+                        dim: vec![
+                            pb::tensor_shape_proto::Dim {
+                                size: num_buckets as i64,
+                                ..Default::default()
+                            },
+                            pb::tensor_shape_proto::Dim {
+                                size: 3,
+                                ..Default::default()
+                            },
+                        ],
+                        ..Default::default()
+                    }),


True, but if you don't mind, I think I'll leave it as just a test helper for now since it's not all that bad written out, and I'm feeling a little burned from sinking a fair amount of time earlier generalizing this into a generic TensorProto construction utility (as we'd discussed, with the extension trait etc) before emerging from a pile of yak hair and deciding that there really wasn't enough boilerplate and I was ending up with net more complexity and lines of code from writing this utility and its tests than I was saving in the first place.

(Not that I necessarily object to such a utility at some point especially if we end up with more of these manipulations, to be clear; just feeling meh about it at this stage.)

nfelt · 2021-03-06T02:00:08Z

tensorboard/data/server/data_compat.rs

+                // bucket right edges; the first bucket's left edge is assumed to be -DBL_MAX and
+                // subsequent left edges are defined as the right edge of the preceeding bucket.
+                //
+                // Our conversion logic in data_compat.py however disobeys this and instead sets the


No worries, was not trying to cast aspersions :) I too have known about this quirk for a long time and not fixed it and then keep forgetting exactly what the quirk was, so I figured I'd just write it down here for posterity. Maybe 2021 will be the year we actually revisit histogram binning...

nfelt · 2021-03-06T02:08:25Z

tensorboard/data/server/data_compat.rs

+                let num_buckets = hp.bucket.len();
+                let bucket_edges = || hp.bucket_limit.iter().take(num_buckets - 1).copied();


Fair enough; updated to require the lengths match and to support the empty case (thanks for pointing that out), and added tests for each.

Re: streaminess what's the fun of having ~~zero~~ 𝜖-cost abstractions if we aren't gonna use 'em? 🦀 But I do actually find it easier to read personally than the offset-arithmetic version.

I went with tensor_content mostly to match what tf.make_tensor_proto will produce, FWIW.

wchargin · 2021-03-06T07:41:50Z

tensorboard/data/server/data_compat.rs

+                let num_buckets = hp.bucket.len();
+                let bucket_edges = || hp.bucket_limit.iter().take(num_buckets - 1).copied();


Lovely; thanks! Yeah, I’m perfectly happy with the streams here. I’m
part of the “use streams judiciously” crowd, and this looks like a
pretty easy sell. It’s definitely more readable than mine.

wchargin · 2021-03-06T07:42:32Z

tensorboard/data/server/data_compat.rs

+                        dim: vec![
+                            pb::tensor_shape_proto::Dim {
+                                size: num_buckets as i64,
+                                ..Default::default()
+                            },
+                            pb::tensor_shape_proto::Dim {
+                                size: 3,
+                                ..Default::default()
+                            },
+                        ],
+                        ..Default::default()
+                    }),


Sure, sounds good. Appreciate the awareness of the yak density and happy
that you feel comfortable calling that out.

wchargin · 2021-03-06T07:45:14Z

tensorboard/data/server/data_compat.rs

-                let bucket_rights = bucket_edges().chain(iter::once(hp.max));
+                // Skip the last `bucket_limit`; it gets replaced by `hp.max`. It's okay to ignore
+                // the edge case at 0 since `.zip()` will stop immediately in that case anyway.
+                let bucket_edges = &hp.bucket_limit[..usize::saturating_sub(num_buckets, 1)];


For general reference, this can be num_buckets.saturating_sub(1) if
you want, but I don’t mind it like this. No action required.

Ack, thanks for the observation. I guess I didn't really think about whether num_buckets is already usize. Will leave as-is just to avoid re-pushing only for this.

nfelt added 2 commits March 5, 2021 14:42

rust: add to_le_bytes! macro for generating tensor_content fields

9f4581c

rust: support legacy TF 1.x histogram summaries

f6766e2

nfelt added the core:rustboard //tensorboard/data/server/... label Mar 5, 2021

google-cla bot added the cla: yes label Mar 5, 2021

nfelt marked this pull request as ready for review March 5, 2021 23:14

nfelt requested a review from wchargin March 5, 2021 23:14

wchargin reviewed Mar 6, 2021

View reviewed changes

Base automatically changed from nfelt-rust-tensor-content-bytes-macro to master March 6, 2021 01:41

nfelt added 2 commits March 5, 2021 17:46

Merge branch 'main' into nfelt-rust-tensors-legacy-histograms

e6a2d85

cr: clippy shall be obeyed

49eb4a8

wchargin mentioned this pull request Mar 6, 2021

RustBoard tasks #4422

Closed

34 tasks

cr: handle HistogramProto format edge cases

a3f6a8b

nfelt commented Mar 6, 2021

View reviewed changes

nfelt requested a review from wchargin March 6, 2021 03:29

wchargin approved these changes Mar 6, 2021

View reviewed changes

wchargin reviewed Mar 6, 2021

View reviewed changes

nfelt merged commit 02664a7 into master Mar 9, 2021

nfelt deleted the nfelt-rust-tensors-legacy-histograms branch March 9, 2021 01:20

		let num_buckets = hp.bucket.len();
		let bucket_edges = \|\| hp.bucket_limit.iter().take(num_buckets - 1).copied();

rust: support legacy TF 1.x histogram summaries #4740

rust: support legacy TF 1.x histogram summaries #4740

Uh oh!

Conversation

nfelt commented Mar 5, 2021

Uh oh!

wchargin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nfelt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wchargin Mar 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wchargin Mar 6, 2021 •

edited

Loading