Skip to content

Latest commit

 

History

History
97 lines (82 loc) · 2.97 KB

SimpleDatasetProvider.md

File metadata and controls

97 lines (82 loc) · 2.97 KB

runner.SimpleDatasetProvider

View source on GitHub

Builds a tf.data.Dataset from a list of files.

Inherits From: DatasetProvider

runner.SimpleDatasetProvider(
    file_pattern: Optional[str] = None,
    *,
    filenames: Optional[Sequence[str]] = None,
    shuffle_filenames: bool = False,
    interleave_fn: Callable[..., tf.data.Dataset],
    examples_shuffle_size: Optional[int] = None
)

This SimpleDatasetProvider builds a tf.data.Dataset as follows: - The object is initialized with a list of filenames. For convenience, a file pattern can be specified instead, which will be expanded to a sorted list. - The filenames are sharded between replicas according to the InputContext (order matters). - Filenames are shuffled per replica (if requested). - The files in each shard are interleaved after being read by the interleave_fn. - Examples are shuffled (if requested), auto-prefetched, and returned for use in one replica of the trainer.

Args

file_pattern A file pattern, to be expanded by tf.io.gfile.glob and sorted into the list of all filenames.
filenames A list of all filenames, specified explicitly. This argument is mutually exclusive with file_pattern.
shuffle_filenames If enabled, filenames will be shuffled after sharding between replicas, before any file reads. Through interleaving, some files may be read in parallel: the details are auto-tuned for throughput.
interleave_fn A callback that receives a single filename and returns a tf.data.Dataset with the tf.Example values from that file.
examples_shuffle_size An optional buffer size for example shuffling.

Methods

get_dataset

View source

get_dataset(
    context: tf.distribute.InputContext
) -> tf.data.Dataset

Gets a tf.data.Dataset by context per replica.