Skip to content

Replace tf_record_iterator with a non-deprecated way to read TFRecords #1711

@nfelt

Description

@nfelt

TensorBoard currently relies on tf.pywrap_tensorflow.PyRecordReader_New() to read tfrecords files (aka event files). Not only is this bad because it's not strictly speaking a public API symbol, it's also being hidden in TF 2.0.

There is a reimplementation of record reading logic in pure python in our tensorflow_stub here:
https://github.com/tensorflow/tensorboard/blob/3f055ba9c057ad2fbacfbd2c61b6e27471a51396/tensorboard/compat/tensorflow_stub/pywrap_tensorflow.py

I haven't benchmarked this, but I expect that it's considerably slower than the C++ version from TensorFlow. If so and if we can't get the performance to be comparable to the C++ version, then when TF is available we should prefer the C++ code. To use that code, however, we'll need to expose it under a real public python API.

What exists today is:

An iterator API isn't powerful enough for us because we need the ability to read to the end of a file and then wait for new data, preserving our read offset. As far as I can tell, all the above iterator approaches will abort iteration when they reach the end of the file, so reading new data would require re-reading all the previously consumed data. Also, I believe we currently rely on being able to handle cases where a record isn't fully flushed (e.g. the file appears to end in a corrupted record, but it's really just that the data isn't all there yet, and the appropriate response is to wait and retry), which the iterator APIs also don't allow.

So it's likely that we would need to basically create a new TFRecordReader. I think it would make the most sense to live at tf.io.TFRecordReader alongside the writer; we'd just have to clearly message that most users will actually want the TFRecordDataset (and that this reader is somewhat different from the old tf.TFRecordReader).

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions