Skip to content

Conversation

@wchargin
Copy link
Contributor

@wchargin wchargin commented Feb 1, 2021

Summary:
Previously, LogdirLoader and RunLoader were both hard-tied to the
native OS filesystem, via walkdir::WalkDir and std::fs::File,
respectively. This patch introduces a Logdir trait that abstracts over
listing and reading event files. A DiskLogdir implementation recovers
the current behavior (mostly; see below), and we can slot in adapters
for more filesystems as desired.

At this point, it’s convenient to drop the hyper-precise semantics
around non-UTF-8 run name collisions, which only occur if you have two
runs whose names are invalid Unicode and are equal after lossy
conversion. We originally handled this precisely because Rust made it
easy to do so, but it’s never come up as an issue in the real world.
It’s no longer quite so convenient to handle, so we cull the complexity.

Test Plan:
Existing unit tests suffice, and a --load_fast end-to-end test still
checks out.

wchargin-branch: rust-logdir-trait

Summary:
Previously, `LogdirLoader` and `RunLoader` were both hard-tied to the
native OS filesystem, via `walkdir::WalkDir` and `std::fs::File`,
respectively. This patch introduces a `Logdir` trait that abstracts over
listing and reading event files. A `DiskLogdir` implementation recovers
the current behavior (mostly; see below), and we can slot in adapters
for more filesystems as desired.

At this point, it’s convenient to drop the hyper-precise semantics
around non-UTF-8 run name collisions, which only occur if you have two
runs whose names are invalid Unicode and are equal after lossy
conversion. We originally handled this precisely because Rust made it
easy to do so, but it’s never come up as an issue in the real world.
It’s no longer quite so convenient to handle, so we cull the complexity.

Test Plan:
Existing unit tests suffice, and a `--load_fast` end-to-end test still
checks out.

wchargin-branch: rust-logdir-trait
wchargin-source: 7535f5d9c7997b111e80edee5be9f1726f8fbe6b
@wchargin wchargin added the core:rustboard //tensorboard/data/server/... label Feb 1, 2021
@google-cla google-cla bot added the cla: yes label Feb 1, 2021
@wchargin wchargin requested a review from nfelt February 1, 2021 19:45
wchargin-branch: rust-logdir-trait
wchargin-source: ee5495623979251b83234ee51f6c92ae46a3643d
/// don't warn about the directory again on every load cycle.
/// Event files within each run should be emitted in chronological order. Canonically, a file
/// is an event file if its basename contains [`EVENT_FILE_BASENAME_INFIX`] as a substring.
fn discover(&self) -> io::Result<HashMap<Run, Vec<PathBuf>>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably fine for now, but may eventually want this to be an iterator so that we don't necessarily need to block postprocessing of the run discovery results on a full scan (since those can be quite expensive).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I was also wondering whether it might be a good fit to go full
async here:

#[async_trait::async_trait]
pub async trait Logdir {
    /// A stream of discovered event files and associated runs.
    type Discovery: Stream<Item = Result<(Run, PathBuf)>>;
    /// Type of output stream for reading event files under this logdir.
    type File: tokio::AsyncRead;

    /// Finds event files... they should be grouped by run... etc.
    async fn discover(&self) -> Self::Discovery;
    /// Opens an event file... it can be read asynchronously... etc.
    async fn open(&self, path: &Path) -> io::Result<Self::File>;
}

That way, with just 8 reload threads, you could have n runs reloading
at any given time, with n − 8 of them waiting on network. One downside
is that this approach gives us some nice implicit throttling to ensure
that we’re not hammering the user’s network stack or GCS account. I’m
also not certain about the lifetimes—I suspect that we can make it work
out without needing unsafe, but I’m not certain.

So yeah, agreed, and let’s keep it as is for now.

(since those can be quite expensive)

For reference, I can list 33K objects in about 3 seconds. So yes, slow,
but imho manageable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sure, SGTM to consider at a future point if it becomes more pressing.

///
/// [`OsStr::to_string_lossy`]: std::ffi::OsStr::to_string_lossy
collided_relpaths: HashSet<PathBuf>,
/// The `path` should be one of the values returned by a previous call to [`Self::discover`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spirit of making the compiler do these kinds of things for us, what do you think about wrapping the returned PathBuf in a newtype so that users of Logdir are actually compelled to go through the abstraction rather than being tempted to poke at the filesystem themselves?

Could even convert open() into a method on the type, which would avoid the need to pass the whole Logdir down into RunLoader.

It would require a decent amount of tweaking to RunLoader, so might make sense as a separate PR (introducing the "openable pathbuf" type in RunLoader, then obtaining it from Logdir in this PR).

Thoughts?

Copy link
Contributor Author

@wchargin wchargin Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, yes. I spent a while thinking about this. I think the most direct
interpretation of your suggestion is:

pub struct EventFileBuf(pub PathBuf);
pub struct EventFile(pub Path);
pub trait Logdir { /* s:Path:EventFile:g */ }

The catch here is that Path is dynamically sized, so we get into the
same issue with newtyping str. You can handle PathBuf fine, but
Path is actually what we care about since it’s the argument to open.

One alternate approach is:

pub trait Logdir {
    type Path;
    type PathBuf: Borrow<Path>;
    type File: Read;

    // `discover` returns `Self::PathBuf`
    // `open` takes `&Self::Path`
}

…and I think that this can be made to work (haven’t tested). But then
the trouble is that it becomes hard to have dynamic logdirs. We’re going
to want something like

enum UserLogdir {
    Disk(DiskLogdir),
    Gcs(GcsLogdir),
}

impl UserLogdir {
    /// Parse `--logdir` into a logdir, honoring `gs://` if given.
    fn new(path: String) -> Option<Self>;
}

impl Logdir for UserLogdir {
    // ...
}

let logdir = UserLogdir::new(opts.logdir)?;
let loader = LogdirLoader::new(&commit, logdir);

But then the issue is that UserLogdir has no good implementations for
Path and PathBuf, since depending on the variant, the paths may be
either <DiskLogdir as Logdir>::Path or <GcsLogdir as Logdir>::Path.
Having a common substrate like PathBuf is valuable. (As an aside, you
could well argue in favor of String over PathBuf. I thought about
this for a while, too. Both have their merits. I went with PathBuf
since PathBuf: From<String> but not the other way around.)

One way to resolve this is to use a continuation-passing trick that
I came up with a while ago for reservoir sampling in a strictly linear
type system:

trait Logdir {
    /// Type of output stream (as before).
    type File: Read;
    /// Callback to open a particular file.
    type Opener: Open<Self::File>;

    /// Discovers all event files and gives you handles to open them.
    fn discover(&self) -> HashMap<Run, Vec<Self::Open>>;
    // (no separate `open` method)
}
trait Open<F> {
    /// Attempts to open a file.
    fn open(self) -> io::Result<F>;
}
// (^isomorphic to `FnOnce() -> io::Result<F>`, but is less magical and
// can be manually implemented)

The idea here is that you avoid having to pass an argument back by just
keeping it in the Opener data the whole time.

Then you can still implement a dispatching logdir:

enum UserLogdir {
    Disk(DiskLogdir),
    Gcs(GcsLogdir),
}

enum UserFile {
    Disk(<DiskLogdir as Logdir>::File),
    Gcs(<GcsLogdir as Logdir>::File),
}
impl Read for UserFile {
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
        match self {
            UserFile::Disk(f) => f.read(buf),
            UserFile::Gcs(f) => f.read(buf),
        }
    }
}

enum UserOpener {
    Disk(<DiskLogdir as Logdir>::Opener),
    Gcs(<GcsLogdir as Logdir>::Opener),
}
impl Open<UserFile> for UserOpener {
    fn open(self) -> io::Result<UserFile> {
        match self {
            UserOpener::Disk(o) => o.open().map(UserFile::Disk),
            UserOpener::Gcs(o) => o.open().map(UserFile::Gcs),
        }
    }
}

impl Logdir for UserLogdir {
    type File = UserFile;
    type Opener = UserOpener;

    fn discover(&self) -> HashMap<Run, Vec<Self::Opener>> {
        match self {
            UserLogdir::Disk(x) => x.discover().map(UserOpener::Disk),
            UserLogdir::Gcs(x) => x.discover().map(UserOpener::Gcs),
            // (for brevity, pretend that `map` actually works like that)
        }
    }
}

(And this can easily be macroed to hell and back if we need more variants.)

I think that this works. (I haven’t tried it.) It’s certainly a
powerful use of the type system. But I also think it’s a moderate amount
of conceptual overhead. I’d be happy to return to it if we decide that
we want it.

One super modest thing that we could do: adopt (my interpretation of)
your original suggestion by just replacing Path with PathBuf
everywhere:

pub struct EventFileBuf(pub PathBuf);

pub trait Logdir {
    type File: Read;
    fn discover(&self) -> HashMap<Run, Vec<EventFileBuf>>;
    fn open(&self, path: &EventFileBuf) -> io::Result<Self::File>;
}

I avoided this until now because you generally want to accept slices
rather than owned buffers, since it’s more flexible. But in this case
maybe it’s fine, since the event file handles are meant to be opaque,
anyway. This doesn’t prevent you from passing in foreign paths, be
they from other instances of the same type, from other types, or just
synethetically generated. But maybe it waggles its finger strongly
enough in the correct direction to be worth it.

Will think about it and get back to you in the morning. :-)

edit: Done. Included in this PR; it only adds a few lines of diff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for the detailed thoughts! I think what I had in mind was really most like your type Opener approach above, since if we make open() a method then we no longer need to ensure that we're feeding back an object from the same original Logdir instance.

I'm not sure I follow the parts about how we need a dispatching UserLogdir though? Roughly speaking if I imagine Logdir as a Java-style interface then it feels like code written against Logdir should be able to somehow do the dispatching based on the actual implementation at runtime, rather than handwriting the dispatch code ourselves? I am not sure about how that translates into Rust (trait objects..??) but it feels like there ought to be a way.

Anyway, this is fine for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I think there are roughly two ways to implement a “dynamic”
logdir that can delegate to multiple implementations:

  • delegation over a closed set of implementations, known at compile
    time (maybe, “branched dynamic dispatch”);
  • delegation over an open set of implementations, with some kind of
    virtual method table (maybe, “vtable dynamic dispatch”).

The UserLogdir that I sketched above, and implemented in #4646, is a
dynamic logdir over a closed set. You can also implement a dynamic
logdir over an open set, but it may involve more explicit boxing than
you’d think. Diving in…

A helpful guiding principle is, “how would I actually compile this?” For
instance, if you could have ld: &dyn Logdir, then you would expect to
be able to call ld.open(my_path). But the return type of this, and
thus the calling convention, depends on the specific Logdir
implementation, so we actually need ld: &dyn Logdir<File = F> for some
type F. And there is no F for which DiskLogdir and gcs::Logdir
both implement Logdir<File = F>; their associated types are different.

So you want to first box this up at the file level:

use std::io::Read;

// slightly simplified, but keeps the important structure
trait Logdir {
    type File: Read;
    fn discover(&self) -> String;
    fn open(&self, p: &str) -> Self::File;
}

struct FileBoxedLogdir<F, L>
    where F: Read, L: Logdir<File = F>,
{
    inner: L,
}

impl<F, L> Logdir for FileBoxedLogdir<F, L>
    where F: Read + 'static, L: Logdir<File = F>
{
    type File = Box<dyn Read>;
    fn discover(&self) -> String { self.inner.discover() }
    fn open(&self, p: &str) -> Self::File {
        Box::new(self.inner.open(p))
    }
}

Now FileBoxedLogdir<_, DiskLogdir> and FileBoxedLogdir<_, GcsLogdir>
both implement Logdir<File = Box<dyn Read>>. But these are still
distinct types, so we can’t (e.g.) put them in a Vec<_> together. They
may have different representations: a FileBoxedLogdir<_, L> has the
same memory layout as L, and DiskLogdir and GcsLogdir may well
have different structure. So, having boxed the files, we must next box
the logdirs themselves:

struct DynamicLogdir {
    inner: Box<dyn Logdir<File = Box<dyn Read>>>,
}

impl Logdir for DynamicLogdir {
    type File = Box<dyn Read>;
    fn discover(&self) -> String { self.inner.discover() }
    fn open(&self, p: &str) -> Self::File { self.inner.open(p) }
}

fn from_disk(x: DiskLogdir) -> DynamicLogdir {
    let file_boxed: FileBoxedLogdir<_, DiskLogdir> = FileBoxedLogdir { inner: x };
    let boxed: Box<dyn Logdir<File = Box<dyn Read>>> = Box::new(file_boxed);
    DynamicLogdir { inner: boxed }
}
fn from_gcs(x: GcsLogdir) -> DynamicLogdir {
    let file_boxed: FileBoxedLogdir<_, GcsLogdir> = FileBoxedLogdir { inner: x };
    let boxed: Box<dyn Logdir<File = Box<dyn Read>>> = Box::new(file_boxed);
    DynamicLogdir { inner: boxed }
}

(Playground.)

Now, a DynamicLogdir really is a single, concrete type, which we’ve
used to erase the type parameters all the way down. We can reasonably
speak about a LogdirLoader<DynamicLogdir>, and instantiate one that
ultimately delegates to either a DiskLogdir or a GcsLogdir.

For me, it was a useful exercise to mentally trace the execution flow
when you invoke ld.open("foo") for ld: DynamicLogdir.

Here’s a problem. We want to share our logdirs across threads, because
many runs may call ld.open at once. Now, DiskLogdir and GcsLogdir
are both Sync. But DynamicLogdir is not. Why? Because you could
have a Logdir implementation that is not sync, and you’d still be
able to wrap that in a DynamicLogdir. Since the set of implementations
is open, there’s no way to guarantee that an—erased!—DynamicLogdir has
the thread-safety properties that we want.

How can we fix this? Add requirements to DynamicLogdir:

struct DynamicLogdir {
    inner: Box<dyn Logdir<File = Box<dyn Read + Send + Sync>>>,
}

Once we’ve done this, DynamicLogdirs can still wrap our two impls of
interest, but couldn’t wrap a non-Send/Sync impl, so the compiler
will infer that DynamicLogdir: Send + Sync as well. This works, but
with the closed set of implementations, the compiler can infer that
automatically, and we don’t need to think about Send and Syncness at
all. Likewise for, e.g., Debug, or other auto traits that you want
your implementations to share.

All else equal, I think that I prefer the closed implementations for
this reason. There are just fewer concepts to keep track of.

Also of note re: type Opener: object safety. Again, ask, “how would
I actually compile this?” Not all traits can reasonably be boxed. But
I’ll leave this rabbit hole for another day.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more reasons in favor of using a closed enum rather than trait
objects:

  1. Right now, we only need <L as Logdir>::File to be io::Read and
    Sync and Send. But I might want to require io::Seek as well.
    At that point, you can't use dyn Read + Seek + ... because a trait
    object must be bounded by at most one non-auto trait. I presume
    that this is so that the compiler doesn’t have to form arbitrary
    combinations of vtables. We could get around this by defining a
    marker trait that requires both Read and Seek and nothing else,
    with blanket implementation:

    use std::io::{Read, Seek};
    
    /// Lower bound of `Read` and `Seek` to form a single vtable.
    trait ReadSeek: Read + Seek {}
    impl<T: Read + Seek + ?Sized> ReadSeek for T {}
    
    // (now require `dyn ReadSeek + Send + Sync` trait objects...)

    …so this isn’t a show-stopper, but it is another inconvenience.

  2. We’ve discussed how using Box<dyn Trait>s requires us to figure
    out appropriate trait bounds. In particular, there must be a
    single set of trait bounds, which neuters a powerful aspect of the
    trait system: conditional implementations. For example, BufReader
    type `std::io::BufReader has these implementations:

    struct BufReader<R> { inner: R, ... };
    
    // It's readable if the underlying type is.
    impl<R: Read> Read for BufReader<R> { /* ... */ }
    
    // It's seekable if the underlying type is.
    impl<R: Seek> Seek for BufReader<R> { /* ... */ }
    
    // Furthermore, additional methods are available if the underlying
    // type is seekable.
    impl<R: Seek> BufReader<R> {
        pub fn seek_relative(&mut self, ...) { /* ... */ }
    }

    Contrast to (e.g.) Go’s bufio.Reader, which wraps the Go
    equivalent of a Box<dyn Read> and therefore is never Seek,
    even if the underlying stream is. To get this flexibility with trait
    objects, you need a combinatorial number of new reader types.

    How does this apply to us? I imagine defining functionality to save
    and restore the loader state, so that restarting TensorBoard can
    instantly pick up where it left off. This is naturally implemented
    as something like:

    // You can reload from a `LogdirLoader` of any streams.
    impl<L: Logdir> LogdirLoader<'_, L>
        where <L as Logdir>::File: Read
    {
        fn reload(&mut self, ...) { /* ... */ }
    }
    
    // You can save and restore only if your streams are seekable.
    impl<L: Logdir> LogdirLoader<'_, L>
        where <L as Logdir::File>: Read + Seek
    {
        fn save(&mut self, ...) -> Cache { /* ... */ }
        fn restore(c: Cache) -> Self { /* ... */ }
    }

Copy link
Contributor Author

@wchargin wchargin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! no patch update yet.

/// don't warn about the directory again on every load cycle.
/// Event files within each run should be emitted in chronological order. Canonically, a file
/// is an event file if its basename contains [`EVENT_FILE_BASENAME_INFIX`] as a substring.
fn discover(&self) -> io::Result<HashMap<Run, Vec<PathBuf>>>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I was also wondering whether it might be a good fit to go full
async here:

#[async_trait::async_trait]
pub async trait Logdir {
    /// A stream of discovered event files and associated runs.
    type Discovery: Stream<Item = Result<(Run, PathBuf)>>;
    /// Type of output stream for reading event files under this logdir.
    type File: tokio::AsyncRead;

    /// Finds event files... they should be grouped by run... etc.
    async fn discover(&self) -> Self::Discovery;
    /// Opens an event file... it can be read asynchronously... etc.
    async fn open(&self, path: &Path) -> io::Result<Self::File>;
}

That way, with just 8 reload threads, you could have n runs reloading
at any given time, with n − 8 of them waiting on network. One downside
is that this approach gives us some nice implicit throttling to ensure
that we’re not hammering the user’s network stack or GCS account. I’m
also not certain about the lifetimes—I suspect that we can make it work
out without needing unsafe, but I’m not certain.

So yeah, agreed, and let’s keep it as is for now.

(since those can be quite expensive)

For reference, I can list 33K objects in about 3 seconds. So yes, slow,
but imho manageable.

///
/// [`OsStr::to_string_lossy`]: std::ffi::OsStr::to_string_lossy
collided_relpaths: HashSet<PathBuf>,
/// The `path` should be one of the values returned by a previous call to [`Self::discover`].
Copy link
Contributor Author

@wchargin wchargin Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, yes. I spent a while thinking about this. I think the most direct
interpretation of your suggestion is:

pub struct EventFileBuf(pub PathBuf);
pub struct EventFile(pub Path);
pub trait Logdir { /* s:Path:EventFile:g */ }

The catch here is that Path is dynamically sized, so we get into the
same issue with newtyping str. You can handle PathBuf fine, but
Path is actually what we care about since it’s the argument to open.

One alternate approach is:

pub trait Logdir {
    type Path;
    type PathBuf: Borrow<Path>;
    type File: Read;

    // `discover` returns `Self::PathBuf`
    // `open` takes `&Self::Path`
}

…and I think that this can be made to work (haven’t tested). But then
the trouble is that it becomes hard to have dynamic logdirs. We’re going
to want something like

enum UserLogdir {
    Disk(DiskLogdir),
    Gcs(GcsLogdir),
}

impl UserLogdir {
    /// Parse `--logdir` into a logdir, honoring `gs://` if given.
    fn new(path: String) -> Option<Self>;
}

impl Logdir for UserLogdir {
    // ...
}

let logdir = UserLogdir::new(opts.logdir)?;
let loader = LogdirLoader::new(&commit, logdir);

But then the issue is that UserLogdir has no good implementations for
Path and PathBuf, since depending on the variant, the paths may be
either <DiskLogdir as Logdir>::Path or <GcsLogdir as Logdir>::Path.
Having a common substrate like PathBuf is valuable. (As an aside, you
could well argue in favor of String over PathBuf. I thought about
this for a while, too. Both have their merits. I went with PathBuf
since PathBuf: From<String> but not the other way around.)

One way to resolve this is to use a continuation-passing trick that
I came up with a while ago for reservoir sampling in a strictly linear
type system:

trait Logdir {
    /// Type of output stream (as before).
    type File: Read;
    /// Callback to open a particular file.
    type Opener: Open<Self::File>;

    /// Discovers all event files and gives you handles to open them.
    fn discover(&self) -> HashMap<Run, Vec<Self::Open>>;
    // (no separate `open` method)
}
trait Open<F> {
    /// Attempts to open a file.
    fn open(self) -> io::Result<F>;
}
// (^isomorphic to `FnOnce() -> io::Result<F>`, but is less magical and
// can be manually implemented)

The idea here is that you avoid having to pass an argument back by just
keeping it in the Opener data the whole time.

Then you can still implement a dispatching logdir:

enum UserLogdir {
    Disk(DiskLogdir),
    Gcs(GcsLogdir),
}

enum UserFile {
    Disk(<DiskLogdir as Logdir>::File),
    Gcs(<GcsLogdir as Logdir>::File),
}
impl Read for UserFile {
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
        match self {
            UserFile::Disk(f) => f.read(buf),
            UserFile::Gcs(f) => f.read(buf),
        }
    }
}

enum UserOpener {
    Disk(<DiskLogdir as Logdir>::Opener),
    Gcs(<GcsLogdir as Logdir>::Opener),
}
impl Open<UserFile> for UserOpener {
    fn open(self) -> io::Result<UserFile> {
        match self {
            UserOpener::Disk(o) => o.open().map(UserFile::Disk),
            UserOpener::Gcs(o) => o.open().map(UserFile::Gcs),
        }
    }
}

impl Logdir for UserLogdir {
    type File = UserFile;
    type Opener = UserOpener;

    fn discover(&self) -> HashMap<Run, Vec<Self::Opener>> {
        match self {
            UserLogdir::Disk(x) => x.discover().map(UserOpener::Disk),
            UserLogdir::Gcs(x) => x.discover().map(UserOpener::Gcs),
            // (for brevity, pretend that `map` actually works like that)
        }
    }
}

(And this can easily be macroed to hell and back if we need more variants.)

I think that this works. (I haven’t tried it.) It’s certainly a
powerful use of the type system. But I also think it’s a moderate amount
of conceptual overhead. I’d be happy to return to it if we decide that
we want it.

One super modest thing that we could do: adopt (my interpretation of)
your original suggestion by just replacing Path with PathBuf
everywhere:

pub struct EventFileBuf(pub PathBuf);

pub trait Logdir {
    type File: Read;
    fn discover(&self) -> HashMap<Run, Vec<EventFileBuf>>;
    fn open(&self, path: &EventFileBuf) -> io::Result<Self::File>;
}

I avoided this until now because you generally want to accept slices
rather than owned buffers, since it’s more flexible. But in this case
maybe it’s fine, since the event file handles are meant to be opaque,
anyway. This doesn’t prevent you from passing in foreign paths, be
they from other instances of the same type, from other types, or just
synethetically generated. But maybe it waggles its finger strongly
enough in the correct direction to be worth it.

Will think about it and get back to you in the morning. :-)

edit: Done. Included in this PR; it only adds a few lines of diff.

wchargin-branch: rust-logdir-trait
wchargin-source: a97ab9996a1a87b98951441ab44c1461879f9a2c

# Conflicts:
#	tensorboard/data/server/bench.rs
#	tensorboard/data/server/cli.rs
#	tensorboard/data/server/logdir.rs
wchargin-branch: rust-logdir-trait
wchargin-source: a97ab9996a1a87b98951441ab44c1461879f9a2c
wchargin-branch: rust-logdir-trait
wchargin-source: 3a683c56a8013e9e46389ff53891a794b661cd43
wchargin-branch: rust-logdir-trait
wchargin-source: bdad59cb2dee154d63518e05bfa0d252301e8c83
Copy link
Contributor Author

@wchargin wchargin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed; PTAL. Merge had a bunch of churn; the salient diff is:
971c2c2

@wchargin wchargin requested a review from nfelt February 2, 2021 09:04
wchargin-branch: rust-logdir-trait
wchargin-source: 15bddd7be7ef775d415d2fecf835814bc35e27c8
wchargin-branch: rust-logdir-trait
wchargin-source: 15bddd7be7ef775d415d2fecf835814bc35e27c8
/// don't warn about the directory again on every load cycle.
/// Event files within each run should be emitted in chronological order. Canonically, a file
/// is an event file if its basename contains [`EVENT_FILE_BASENAME_INFIX`] as a substring.
fn discover(&self) -> io::Result<HashMap<Run, Vec<PathBuf>>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sure, SGTM to consider at a future point if it becomes more pressing.

///
/// [`OsStr::to_string_lossy`]: std::ffi::OsStr::to_string_lossy
collided_relpaths: HashSet<PathBuf>,
/// The `path` should be one of the values returned by a previous call to [`Self::discover`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for the detailed thoughts! I think what I had in mind was really most like your type Opener approach above, since if we make open() a method then we no longer need to ensure that we're feeding back an object from the same original Logdir instance.

I'm not sure I follow the parts about how we need a dispatching UserLogdir though? Roughly speaking if I imagine Logdir as a Java-style interface then it feels like code written against Logdir should be able to somehow do the dispatching based on the actual implementation at runtime, rather than handwriting the dispatch code ourselves? I am not sure about how that translates into Rust (trait objects..??) but it feels like there ought to be a way.

Anyway, this is fine for now.

@wchargin wchargin merged commit 5c2b955 into master Feb 4, 2021
@wchargin wchargin deleted the wchargin-rust-logdir-trait branch February 4, 2021 00:52
@wchargin wchargin mentioned this pull request Feb 16, 2021
34 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes core:rustboard //tensorboard/data/server/...

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants