-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
This is a generic tracking issue for improvements to TensorBoard's Python filesystem abstraction. (This doesn't impact Rustboard, which today should be the read path for regular users, but there are still users who have to use the Python read path because they can't use Rustboard, and the write path for TensorBoard's summary APIs to write without TensorFlow is still entirely in Python.)
Currently, TensorBoard's Python code relies on the TensorFlow filesystem API via tf.io.gfile, which has built-in support for a number of protocols besides local disk. When TensorFlow is not available, we fall back to a stub implementation of the tf.io.gfile API in tensorboard/compat/tensorflow_stub/io/gfile.py, which today only supports local disk and S3. The stub implementation achieves some reuse between filesystem implementations by having an intermediate set of methods that both implementations expose, but this abstraction A) isn't even defined other than by virtue of common methods, i.e. there's no abstract base class or spec, and B) has a number of issues.
Here are a few of the issues with that abstraction:
- underspecified behavior in a number of places (e.g. what characters glob supports), which makes it harder to guarantee much in the way of consistency across implementations
- little clarity/documentation about how filenames are treated as bytes vs strings
- no "file" object at all, which is mostly ok for remote filesystems where everything is an RPC anyway, but is significantly worse for filesystems where there actually might be local state between, say, multiple reads or writes to a single file (e.g. local file descriptors, buffers, caches, etc.). Right now we ignore all of that, so it's conceivable that in cases we end up with quadratic behavior where each read at offset N is independent and must first seek past all the previous data.
- no
walk()method for filesystems, just a function, as pointed out in gfile: add support for fsspec filesystems #5248 (comment), so filesystems aren't able to take advantage of fast paths they might support
We already have a goal (bottom of #3666) to ultimately migrate away from the stub approach and instead properly inject a filesystem abstraction into places that do I/O. As part of that, we should consider revisiting this abstraction and formalizing it properly. One possibility is to just adopt the existing fsspec abstraction, since it's a whole project around this idea: https://filesystem-spec.readthedocs.io/en/latest/
If we did that, we'd likely still want to be able to use the TF filesystem APIs when available (so that existing TensorBoard users who also have TensorFlow and thus benefit from built-in support for cloud filesystems don't experience a regression), so one open question would be determining whether it's possible to shoehorn the TF filesystem support into this setup.