-
Notifications
You must be signed in to change notification settings - Fork 1.7k
rust: implement data_compat.rs basic tensor support #4734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
wchargin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice; thanks! Only request is to remove nPMI special-casing if we can;
if that doesn’t work for some reason, happy to keep it. (I do want to
support it; I just think that we might get it for free.)
| pub const TEXT: &str = "text"; | ||
| pub const PR_CURVES: &str = "pr_curves"; | ||
| pub const HPARAMS: &str = "hparams"; | ||
| pub const MESH: &str = "mesh"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting. I hadn’t planned to accommodate mesh because it
wasn’t obvious to me that the data fit well into the existing storage
classes. It’s not clearly Tensor because the data may be arbitrarily
large, and plausibly over our recommended limits. It’s not clearly
BlobSequence because it’s the wrong shape—though we could faithfully
reinterpret-cast without too much trouble, since all dimensions but one
are constant.
But I’m happy to just make it Tensor as you do here. I’m not sure how
widely used it is in conjunction with environments that would actually
care about the size limit (e.g., TensorBoard.dev uploads), so it may
just be a non-issue. As you note, the plugin backend still uses the
multiplexer, but we can fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess tensor seemed like the most obvious mapping, but it's true I hadn't really considered this deeply. Since this doesn't work anyway until the plugin is converted, I'll just omit it for now and we can cross that bridge when we get there.
FWIW though we may want to reassess the exact semantics we've assigned to the data classes, since text also doesn't really meet the bounded size criterion. Possibly what we really want is something more like "scalar" stored in f32, "bounded-size tensor with fixed-size elements" stored an inline TensorProto, "bounded-size tensor with unbounded-size elements" stored the way blob sequences are stored and where each element can be individually retrieved by index via its own RPC, and then some new thing for "unbounded-size tensor" where we only support essentially a streaming read of the entire tensor but it's chunked into roughly-constant-RPC-response-size batches to avoid excessive overhead when the individual elements are small but numerous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah—in the past, I think, we’d agreed that it was “reasonable” to
assume that text would be small, even though it’s not bounded. The
unbounded-size tensor is also approximately what I was thinking, as
“basically sugar for a length-1 blob sequence, but we handle the
serialization for you”. From that lens, I don’t know that we actually
need a separate data type—emit_tensor(..., as_blob=True) or something
might suffice.
Happy to leave as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I guess it basically is just sugar for a length-1 blob sequence, so it could be persisted as such. But I think we'd probably need a new data type at the DataProvider level if we wanted to both handle the Tensor serialization and also support a streaming read, right?
| pub const PR_CURVES: &str = "pr_curves"; | ||
| pub const HPARAMS: &str = "hparams"; | ||
| pub const MESH: &str = "mesh"; | ||
| pub const NPMI: &str = "npmi"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think nPMI data has always had DATA_CLASS_TENSOR at rest. Can
we get away with dropping the special case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, dropped.
| let md = pb::SummaryMetadata { | ||
| plugin_data: Some(PluginData { | ||
| plugin_name: "rando".to_string(), | ||
| content: Bytes::from_static(b"preserved!"), | ||
| ..Default::default() | ||
| }), | ||
| data_class: pb::DataClass::Tensor.into(), | ||
| ..Default::default() | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We do have blank_with_plugin_content (per your suggestion), in
case you’d like to use that? Also suffices for the next test case, since
you can pass DataClass::Unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(per your suggestion)
Done, lol.
| /// This supports summaries with `tensor` populated. | ||
| // | ||
| // TODO(#4422): support conversion of other summary types to tensors. | ||
| pub fn into_tensor(self, _metadata: &pb::SummaryMetadata) -> Result<pb::TensorProto, DataLoss> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what the thought process is behind taking _metadata here?
I omitted it from into_scalar because it’s not actually needed, but
that does make the signatures different and makes the RunLoader
integration ever so slightly more awkward. I don’t feel strongly either
way, just wondering about the rationale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thought process was "I mistakenly figured I was going to need it later (to handle empty hparams conversions) and then realized I didn't but wasn't sure it was worth taking it back out."
I did notice it's omitted from into_scalar(). I'm fine doing the same thing here if you think it's a bit cleaner to avoid unused parameters when they aren't required by an interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mistakenly figured I was going to need it later (to handle empty hparams conversions)
Me, too. :-)
No preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. Will leave as-is in that case.
|
|
||
| #[test] | ||
| fn test_metadata_tensor_without_dataclass() { | ||
| for plugin_name in vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clippy says:
warning: useless use of `vec!`
--> data_compat.rs:678:32
|
678 | for plugin_name in vec![
| ________________________________^
679 | | plugin_names::HISTOGRAMS,
680 | | plugin_names::TEXT,
681 | | plugin_names::PR_CURVES,
... |
684 | | plugin_names::NPMI,
685 | | ] {
| |_____________^
|
= note: `#[warn(clippy::useless_vec)]` on by default
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#useless_vec
help: you can use a slice directly
|
678 | for plugin_name in &[plugin_names::HISTOGRAMS,
679 | plugin_names::TEXT,
680 | plugin_names::PR_CURVES,
681 | plugin_names::HPARAMS,
682 | plugin_names::MESH,
683 | plugin_names::NPMI] {
|
(In case you’re wondering, the reason that these checks don’t show up
inline is that our Cargo package is not at repository root.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks.
| /// This supports summaries with `tensor` populated. | ||
| // | ||
| // TODO(#4422): support conversion of other summary types to tensors. | ||
| pub fn into_tensor(self, _metadata: &pb::SummaryMetadata) -> Result<pb::TensorProto, DataLoss> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mistakenly figured I was going to need it later (to handle empty hparams conversions)
Me, too. :-)
No preference.
| pub const TEXT: &str = "text"; | ||
| pub const PR_CURVES: &str = "pr_curves"; | ||
| pub const HPARAMS: &str = "hparams"; | ||
| pub const MESH: &str = "mesh"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah—in the past, I think, we’d agreed that it was “reasonable” to
assume that text would be small, even though it’s not bounded. The
unbounded-size tensor is also approximately what I was thinking, as
“basically sugar for a length-1 blob sequence, but we handle the
serialization for you”. From that lens, I don’t know that we actually
need a separate data type—emit_tensor(..., as_blob=True) or something
might suffice.
Happy to leave as is for now.
Summary: This patch replaces the multiplexer code in the mesh plugin with data provider code. It is not gated behind a flag. We treat all mesh data as tensors, since that’s most expedient at this time and the local data providers (multiplexer and RustBoard) don’t actually have any size constraints. See also discussion in #4734. Test Plan: The dashboard still works with the standard demo data, both with and without `--load_fast`. wchargin-branch: mesh-generic wchargin-source: 2a995b31efea5c58dee9d05fc7e66bbdb833a32e
Summary: This patch replaces the multiplexer code in the mesh plugin with data provider code. It is not gated behind a flag. We treat all mesh data as tensors, since that’s most expedient at this time and the local data providers (multiplexer and RustBoard) don’t actually have any size constraints. See also discussion in #4734. Test Plan: The dashboard still works with the standard demo data, both with and without `--load_fast=true`. wchargin-branch: mesh-generic
This wires up
data_compat.rshandling for tensors, as part of the tensor support subtask of #4422, which includes:initial_metadata()special-cases for tensor-dataclass pluginsinto_tensor()for enriching tensor valuesinto_tensor()from within RunLoaderI also adjusted some naming of other tests within
data_compat.rsto be (IMO) slightly more precise and clear now that we have the new tensor test methods.Note that we don't yet handle A) events with no summary value that should nonetheless be treated as having implicit Tensor values (the use case being hparams metadata persistence), and B) legacy histogram data stored in
HistogramProto. I've left those for later PRs.Test plan: I confirmed that with this change and the preceding RPC PR, we can now render the "normal" tensor dashboards. In particular, what now works is: text, histograms/distributions for data written with the tensor-based histogram summary APIs, PR curves, and NPMI (the visualization seems somewhat broken, at least with the demo data, but it's equivalently broken with and without
--load_fastnow). Hparams doesn't work yet per the note above. The mesh plugin hasn't been DataProviderized; I included it for completeness anyway but it's untested.