Builds a tf.data.Dataset
from a list of files.
Inherits From: DatasetProvider
runner.SimpleDatasetProvider(
file_pattern: Optional[str] = None,
*,
filenames: Optional[Sequence[str]] = None,
shuffle_filenames: bool = False,
interleave_fn: Callable[..., tf.data.Dataset],
examples_shuffle_size: Optional[int] = None
)
This SimpleDatasetProvider
builds a tf.data.Dataset
as follows: - The object
is initialized with a list of filenames. For convenience, a file pattern can be
specified instead, which will be expanded to a sorted list. - The filenames are
sharded between replicas according to the InputContext
(order matters). -
Filenames are shuffled per replica (if requested). - The files in each shard are
interleaved after being read by the interleave_fn
. - Examples are shuffled (if
requested), auto-prefetched, and returned for use in one replica of the trainer.
get_dataset(
context: tf.distribute.InputContext
) -> tf.data.Dataset
Gets a tf.data.Dataset
by context
per replica.