Skip to content

[DataPipe] Ensure all DataPipes Meet Testing Requirements #106

Open
@NivekT

Description

@NivekT

🚀 Feature

We have many tests for existing DataPipes (both in PyTorch Core and TorchData). However, over time, they have become less organized. Moreover, as the testing requirements expand, older DataPipes may not have tests to cover the newly added requirements.

This issue aims to track the status of tests for all DataPipes.

Motivation

We want to ensure test coverage for all DataPipe is complete to reduce bugs and unexpected behavior.

Alternative

We also should create some testing templates for IterDataPipe and MapDataPipe that can be widely applied.

IterDataPipe Tracker

X - Done
NA - Not Applicable
Blank - Not Done/Unclear

Test definitions:
Functional - unit test to ensure that the DataPipe works properly with various input arguments
Reset - DataPipe can be reset/restart after being read
__len__ - the __len__ method is implemented whenever possible (or explicitly not implemented)
Serializable - DataPipe is serializable
Graph (future) - can be traversed as part of a DataPipe graph
Snapshot (future) - can be saved/loaded as a checkpoint/snapshot

Name Module Functional Test Reset __len__ Serializable (Pickable) Graph Snapshot
Batcher Core X X X X
Collator Core X X X X
Concater Core X X X X
Demultiplexer Core X X X X
FileLister Core X X X X
FileOpener Core X X X X
Filter Core X X X X
Forker Core X X X X
Grouper Core X X X
IterableWrapper Core X X X X
Mapper Core X X X X
Multiplexer Core X X X X
RoutedDecoder Core X X X X
Sampler Core X X X X
Shuffler Core X X X X
StreamReader Core X X X X
UnBatcher Core X X X
Zipper Core X X X X
BucketBatcher Data X X X X
CSVDictParser Data X X X X
CSVParser Data X X X X
Cycler Data X X X X
DataFrameMaker Data X X X X
Decompressor Data X X X X
Enumerator Data X X X X
FlatMapper Data X X X X
FSSpecFileLister Data X X X X
FSSpecFileOpener Data X X X X
FSSpecSaver Data X X X X
GDriveReader Data X X X X
HashChecker Data X X X X
Header Data X X X X
HttpReader Data X X X X
InMemoryCacheHolder Data X X X X
IndexAdder Data X X X X
IoPathFileLister Data X X X X
IoPathFileOpener Data X X X X
IoPathSaver Data X X X X
IterKeyZipper Data X X X X
JsonParser Data X X X X
LineReader Data X X X X
MapKeyZipper Data X X X X
OnDiskCacheHolder Data X X X X
OnlineReader Data X X X X
ParagraphAggregator Data X X X X
ParquetDataFrameLoader Data X X X X
RarArchiveLoader Data X X X X
Rows2Columnar Data X X X X
SampleMultiplexer Data X X X X
Saver Data X X X X
TarArchiveLoader Data X X X X
UnZipper Data X X X X
XzFileLoader Data X X X X
ZipArchiveLoader Data X X X X

MapDataPipe Tracker

X - Done
NA - Not Applicable
Blank - Not Done/Unclear

Name Module Functional Test __len__ Serializable (Pickable) Graph Snapshot
Batcher Core X X
Concater Core X X
Mapper Core X X X
SequenceWrapper Core X X X
Shuffler Core X X
Zipper Core X X

cc: @ejguan @VitalyFedyunin @NivekT

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions