-
Notifications
You must be signed in to change notification settings - Fork 1.7k
rust: implement blob sequence RPCs and client #4563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: The `Commit` structure now has a `blob_sequences` store in addition to its `scalars` store. For now, we only actually load run graphs into memory (since it’s trivial, and everything else is merely easy). Test Plan: Unit tests included. There are no user-facing changes to the RPC layer. Loading an MNIST dataset with scalars, graphs, and images no longer prints warnings about `graph_def`s, and still serves scalars properly. wchargin-branch: rust-load-graphs wchargin-source: a16be4c845d05f74034e5a038439c210d8b3e937
Summary: The existing `ListPlugins` RPC now explores blob sequence time series for plugin names, in addition to scalar time series. This patch also expands the `commit::test_data` builders to support blob sequences. Test Plan: Unit tests updated. As an end-to-end test, pointing TensorBoard at a directory with scalars and graphs now causes the graphs dashboard to appear active in addition to the scalars dashboard. The dashboard doesn’t yet work, since `ListBlobSequences`, et al. aren’t implemented. wchargin-branch: rust-listplugins-blobs wchargin-source: fbab637da1881c51f9b5559abb97141b58fe0387
wchargin-branch: rust-listplugins-blobs wchargin-source: 366fe5ecc221b2101e0190146b1d9e4e33de305e
Summary: The [`tokio-stream`] crate makes it easier to work with async streams, and in particular to iterate over them. This is useful for test code that needs to iterate over the responses of server-streaming RPC like `ReadBlob`. [`tokio-stream`]: https://crates.io/crates/tokio-stream Test Plan: It builds: `bazel build //third_party/rust:tokio_stream`. wchargin-branch: rust-dep-tokio-stream wchargin-source: f106742eeeb3113fd968232c973715f9ecb269c0
…s' and 'origin/wchargin-rust-dep-tokio-stream' into HEAD
Summary: We implement `ListBlobSequences`, `ReadBlobSequences`, and `ReadBlob` for the Rust data server, and the corresponding data provider methods for the Python data provider. Run graphs now work end to end, and getting more blobs to work is a matter of implementing data compat transformations on the Rust side. Test Plan: Unit tests included on both ends (solid ones on the Rust side, mocking ones on the Python side). Running TensorBoard with `--load_fast` now shows a working graphs plugin. wchargin-branch: rust-blob-sequence-rpcs wchargin-source: 3870627314e30a8d5ede4cb004b887ab1e1f68ba
|
Note: this is diffbased against To reviewers: the diff is rather larger than I usually like to send. If |
wchargin-branch: rust-load-graphs wchargin-source: 33d56123cfe3da7c79b5a6bbfa9951159038ebcb
wchargin-branch: rust-listplugins-blobs wchargin-source: 1d0bdb4ae5370b1b94c30e81ec333c5e3ca1e0eb
wchargin-branch: rust-blob-sequence-rpcs wchargin-source: 92bef146c06286fc2d5d7d7bb3427a594fdfae36
wchargin-branch: rust-blob-sequence-rpcs wchargin-source: cb548c2370622404b8b4921e3dd9e6164f0da8e2
wchargin-branch: rust-blob-sequence-rpcs wchargin-source: cb548c2370622404b8b4921e3dd9e6164f0da8e2
stephanwlee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good but I want to hear your response before the rubber stamp.
tensorboard/data/grpc_provider.py
Outdated
| tags = {} | ||
| result[run_entry.run_name] = tags | ||
| for tag_entry in run_entry.tags: | ||
| ts = tag_entry.metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does ts stand for? Tags? or does it mean Tag Series?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“Time series”. Can rename to something else if you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No AI required. Is it anti-pattern to name variables in descriptive/verbose manner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol :-) I think I was in Golang headspace when I first wrote this, and
in Golang the answer to your question is “yes”. I’ve embiggened a few of
the more obscure variable names.
| let runs = self.read_runs()?; | ||
|
|
||
| let mut res: data::ReadBlobSequencesResponse = Default::default(); | ||
| for (run, data) in runs.iter() { | ||
| if !run_filter.want(run) { | ||
| continue; | ||
| } | ||
| let data = data | ||
| .read() | ||
| .map_err(|_| Status::internal(format!("failed to read run data for {:?}", run)))?; | ||
| let mut run_res: data::read_blob_sequences_response::RunEntry = Default::default(); | ||
| for (tag, ts) in &data.blob_sequences { | ||
| if !tag_filter.want(tag) { | ||
| continue; | ||
| } | ||
| if plugin_name(&ts.metadata) != Some(&want_plugin) { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not saying repetition is bad so please let me know if this is unsound.
I imagine run tag filter on RunData to be quite common across list_{blob,scalar,tensor}. Would it make sense to create a iterable (if that's idiomatic) that has data filtered at the very least by run and tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah; I’ve been playing around with this in my head but haven’t actually
sketched it out. I agree both that repetition isn’t intrinsically bad
and that it feels like this common behavior could be factored out to
save a bunch of lines. It’s made a bit tricky because the response
types (like data::read_blob_sequences_response::RunEntry) all differ,
and RPCs access different commit fields (e.g., &data.blob_sequences).
I also agree that some kind of iterable would make sense: something like
for item in self.iter_data(rtf, plugin_filter, |rd| &rd.blob_sequences) {
let (run, tag, time_series) = item?; // might return "failed to read..."
// ...
}
// given:
fn iter_data<T, F: FnMut(&RunData) -> T>(
&self,
rtf: RunTagFilter,
pf: PluginFilter,
extract: F,
) -> impl Iterator<Item = Result<(Run, Tag, T), Status>>;with the catch that inside the loop we wouldn’t have run_res like we
do right now.
I was planning to wait until we have the three data classes implemented
before trying to pull things out. What do you think? Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No immediate suggestion. I will let you decide what should be done at places where we do read_runs and use run_tag_filter.
| let blob_refs = (0..value.len()) | ||
| .map(|i| { | ||
| let bk = BlobKey { | ||
| experiment_id: Cow::Borrowed(eid), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting; I should have asked this question before but when does it turn into mutable (to_mut)? It is not 100% obvious to me reading the blob_key.rs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asking now is fine! :-) Depending on exactly what you’re asking, I think
maybe the answer is “when we call bk.to_string() below”? but let me
provide some more context.
Blob keys have string fields defined as Cow<'a, str> because we can
get blob keys in two different ways. First, in ReadBlobSequences, we
synthesize blob keys from the runs and tags that we already have in
memory. Second, in ReadBlob, we parse a blob key from user input.
When we parse a blob key from a base64 string, we have to allocate new
memory for the eid/run/tag: no way around it, since they’re not stored
anywhere else. But when we’re synthesizing a blob key, we’d like to
avoid cloning the eid/run/tag, since we already have references to them.
This is where the Cow<'a, str> comes in: you can think of it as
“either an owned string or a borrowed string, plus a bit indicating
which is which so that we know whether to free memory when this is
dropped”.
So on this line, we create a blob key that just points into the existing
string structures. Then, when we create the blob reference below, we
encode it all as base64-over-JSON, which allocates a new String.
Maybe this is what you meant by “turn into mutable (to_mut)”, or maybe
not; does this help? if not, can you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. I think you are answering most of my question. However, my confusions really lies with the basic of string manipulation. For instance, when invoking bk.to_string(), I suspect it would be "concatenating" run, tag, index, and step and when doing so, why you would have to "clone". I definitely see why you would have to allocate a new space to store the new serialized string but I am not sure why that's perceived as cloning, for example, run and others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when invoking
bk.to_string(), I suspect it would be "concatenating"run,tag,index, andstepand when doing so, why you would have to "clone". I definitely see why you would have to allocate a new space to store the new serialized string but I am not sure why that's perceived as cloning, for example,runand others.
Right. You don’t have to clone run, but you do have to memcpy its
contents into a new serialized string.
By “when we’re synthesizing a blob key, we’d like to avoid cloning the
eid/run/tag”, I mean that we don’t want to do this:
let bk = BlobKey {
experiment_id: eid.clone(),
run: run.clone(),
// ...
};—which would be a literal clone—and so instead we use Cow<'a, str>s
so that the blob key can just borrow the data.
wchargin-branch: rust-blob-sequence-rpcs wchargin-source: 361768949089e75849ff4468bb7a6e6aaa2d0eca
|
thanks for your quick review of this large PR, and the good questions! |
Summary:
We implement
ListBlobSequences,ReadBlobSequences, andReadBlobfor the Rust data server, and the corresponding data provider methods
for the Python data provider. Run graphs now work end to end, and
getting more blobs to work is a matter of implementing data compat
transformations on the Rust side.
Test Plan:
Unit tests included on both ends (solid ones on the Rust side, mocking
ones on the Python side). Running TensorBoard with
--load_fastnowshows a working graphs plugin.
wchargin-branch: rust-blob-sequence-rpcs