runner.SimpleDatasetProvider

Builds a tf.data.Dataset from a list of files.

Inherits From: DatasetProvider

runner.SimpleDatasetProvider(
    file_pattern: Optional[str] = None,
    *,
    filenames: Optional[Sequence[str]] = None,
    shuffle_filenames: bool = False,
    interleave_fn: Callable[..., tf.data.Dataset],
    examples_shuffle_size: Optional[int] = None
)

This SimpleDatasetProvider builds a tf.data.Dataset as follows: - The object is initialized with a list of filenames. For convenience, a file pattern can be specified instead, which will be expanded to a sorted list. - The filenames are sharded between replicas according to the InputContext (order matters). - Filenames are shuffled per replica (if requested). - The files in each shard are interleaved after being read by the interleave_fn. - Examples are shuffled (if requested), auto-prefetched, and returned for use in one replica of the trainer.

Args
`file_pattern`	A file pattern, to be expanded by `tf.io.gfile.glob` and sorted into the list of all `filenames`.
`filenames`	A list of all filenames, specified explicitly. This argument is mutually exclusive with `file_pattern`.
`shuffle_filenames`	If enabled, filenames will be shuffled after sharding between replicas, before any file reads. Through interleaving, some files may be read in parallel: the details are auto-tuned for throughput.
`interleave_fn`	A callback that receives a single filename and returns a `tf.data.Dataset` with the `tf.Example` values from that file.
`examples_shuffle_size`	An optional buffer size for example shuffling.

Methods

`get_dataset`

View source

get_dataset(
    context: tf.distribute.InputContext
) -> tf.data.Dataset

Gets a tf.data.Dataset by context per replica.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!