Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 144 additions & 3 deletions tensorboard/data/server/data_compat.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ use bytes::Bytes;
use prost::Message;
use std::convert::TryInto;
use std::fmt::Debug;
use std::iter;

use crate::commit::{BlobSequenceValue, DataLoss, ScalarValue};
use crate::proto::tensorboard as pb;
Expand Down Expand Up @@ -92,9 +93,10 @@ impl EventValue {

/// Consumes this event value and enriches it into a tensor.
///
/// This supports summaries with `tensor` populated.
//
// TODO(#4422): support conversion of other summary types to tensors.
/// This supports:
///
/// - summaries with `tensor` populated;
/// - summaries with TensorFlow 1.x `histogram`.
pub fn into_tensor(self, _metadata: &pb::SummaryMetadata) -> Result<pb::TensorProto, DataLoss> {
let value_box = match self {
EventValue::GraphDef(_) => return Err(DataLoss),
Expand All @@ -103,6 +105,57 @@ impl EventValue {
};
match *value_box {
pb::summary::value::Value::Tensor(tp) => Ok(tp),
pb::summary::value::Value::Histo(hp) => {
// Migrate legacy TF 1.x HistogramProto to TensorProto. The "spec" in summary.proto
// says `bucket` and `bucket_limit` are parallel arrays encoding bucket counts and
// bucket right edges; the first bucket's left edge is assumed to be -DBL_MAX and
// subsequent left edges are defined as the right edge of the preceeding bucket.
//
// Our conversion logic in data_compat.py however disobeys this and instead sets the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh—my bad, probably. Put it on the list of “ways in which the
histogram transformation pipeline makes zero sense”.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, was not trying to cast aspersions :) I too have known about this quirk for a long time and not fixed it and then keep forgetting exactly what the quirk was, so I figured I'd just write it down here for posterity. Maybe 2021 will be the year we actually revisit histogram binning...

// leftmost and rightmost edges to the `min` and `max` values, respectively. This
// will result in the outermost buckets having left edge > right edge if they were
// originally empty. Apparently the histogram visualization can't handle buckets
// extending to -/+ DBL_MAX, but can handle buckets of negative width?
//
// For consistency with the status quo, we replicate this questionable logic here.
if hp.bucket.len() != hp.bucket_limit.len() {
return Err(DataLoss);
}
let num_buckets = hp.bucket.len();
// Skip the last `bucket_limit`; it gets replaced by `hp.max`. It's okay to ignore
// the edge case at 0 since `.zip()` will stop immediately in that case anyway.
let bucket_edges = &hp.bucket_limit[..usize::saturating_sub(num_buckets, 1)];
Copy link
Contributor

@wchargin wchargin Mar 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For general reference, this can be num_buckets.saturating_sub(1) if
you want, but I don’t mind it like this. No action required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, thanks for the observation. I guess I didn't really think about whether num_buckets is already usize. Will leave as-is just to avoid re-pushing only for this.

let bucket_lefts = iter::once(hp.min).chain(bucket_edges.iter().copied());
let bucket_rights = bucket_edges.iter().copied().chain(iter::once(hp.max));
let bucket_counts = hp.bucket.iter().copied();
let tensor_content = bucket_lefts
.zip(bucket_rights)
.zip(bucket_counts)
.flat_map(|((l, r), v)| vec![l, r, v])
.map(f64::to_le_bytes)
.collect::<Vec<_>>()
.concat()
.into();

Ok(pb::TensorProto {
dtype: pb::DataType::DtDouble.into(),
tensor_shape: Some(pb::TensorShapeProto {
dim: vec![
pb::tensor_shape_proto::Dim {
size: num_buckets as i64,
..Default::default()
},
pb::tensor_shape_proto::Dim {
size: 3,
..Default::default()
},
],
..Default::default()
}),
Comment on lines +143 to +154
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: this is just tensor_shape(&[num_buckets as i64, 3]) if you
promote fn tensor_shape from a test helper to a real helper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but if you don't mind, I think I'll leave it as just a test helper for now since it's not all that bad written out, and I'm feeling a little burned from sinking a fair amount of time earlier generalizing this into a generic TensorProto construction utility (as we'd discussed, with the extension trait etc) before emerging from a pile of yak hair and deciding that there really wasn't enough boilerplate and I was ending up with net more complexity and lines of code from writing this utility and its tests than I was saving in the first place.

(Not that I necessarily object to such a utility at some point especially if we end up with more of these manipulations, to be clear; just feeling meh about it at this stage.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, sounds good. Appreciate the awareness of the yak density and happy
that you feel comfortable calling that out.

tensor_content,
..Default::default()
})
}
_ => Err(DataLoss),
}
}
Expand Down Expand Up @@ -287,6 +340,7 @@ impl SummaryValue {
(_, Value::SimpleValue(_)) => blank(plugin_names::SCALARS, pb::DataClass::Scalar),
(_, Value::Image(_)) => tf1x_image_metadata(),
(_, Value::Audio(_)) => tf1x_audio_metadata(),
(_, Value::Histo(_)) => blank(plugin_names::HISTOGRAMS, pb::DataClass::Tensor),
(Some(mut md), _) => {
// Use given metadata, but first set data class based on plugin name, if known.
match md.plugin_data.as_ref().map(|pd| pd.plugin_name.as_str()) {
Expand Down Expand Up @@ -719,6 +773,93 @@ mod tests {
Ok(tp)
);
}

#[test]
fn test_enrich_tf1x_histogram() {
let hp = pb::HistogramProto {
min: -1.999,
max: 1.999,
num: 60.0,
bucket_limit: vec![-2.0, -1.0, 0.0, 1.0, 2.0, f64::MAX],
bucket: vec![0.0, 10.0, 20.0, 20.0, 10.0, 0.0],
..Default::default()
};
let v = EventValue::Summary(SummaryValue(Box::new(Value::Histo(hp))));
assert_eq!(
v.into_tensor(&blank("histogram", pb::DataClass::Tensor)),
Ok(pb::TensorProto {
dtype: pb::DataType::DtDouble.into(),
tensor_shape: Some(tensor_shape(&[6, 3])),
tensor_content: {
#[rustfmt::skip]
let b = to_le_bytes![
-1.999, -2.0, 0.0,
-2.0, -1.0, 10.0,
-1.0, 0.0, 20.0,
0.0, 1.0, 20.0,
1.0, 2.0, 10.0,
2.0, 1.999, 0.0f64
];
b
},
..Default::default()
})
);
}

#[test]
fn test_enrich_tf1x_histogram_single_bucket() {
let hp = pb::HistogramProto {
min: -1.0,
max: 1.0,
num: 2.0,
bucket_limit: vec![f64::MAX],
bucket: vec![2.0],
..Default::default()
};
let v = EventValue::Summary(SummaryValue(Box::new(Value::Histo(hp))));
assert_eq!(
v.into_tensor(&blank("histogram", pb::DataClass::Tensor)),
Ok(pb::TensorProto {
dtype: pb::DataType::DtDouble.into(),
tensor_shape: Some(tensor_shape(&[1, 3])),
tensor_content: to_le_bytes![-1.0, 1.0, 2.0f64],
..Default::default()
})
);
}

#[test]
fn test_enrich_tf1x_histogram_empty() {
let hp = pb::HistogramProto::default();
let v = EventValue::Summary(SummaryValue(Box::new(Value::Histo(hp))));
assert_eq!(
v.into_tensor(&blank("histogram", pb::DataClass::Tensor)),
Ok(pb::TensorProto {
dtype: pb::DataType::DtDouble.into(),
tensor_shape: Some(tensor_shape(&[0, 3])),
..Default::default()
})
);
}

#[test]
fn test_enrich_tf1x_histogram_mismatched_field_lengths() {
for (limit_len, bucket_len) in &[(0, 1), (1, 0), (1, 2), (2, 1)] {
let hp = pb::HistogramProto {
bucket_limit: vec![0.0; *limit_len],
bucket: vec![0.0; *bucket_len],
..Default::default()
};
let v = EventValue::Summary(SummaryValue(Box::new(Value::Histo(hp.clone()))));
assert_eq!(
v.into_tensor(&blank("histogram", pb::DataClass::Tensor)),
Err(DataLoss),
"expected error converting proto {:?}",
hp
);
}
}
}

mod blob_sequences {
Expand Down