You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
This is the part 2 of Gluon Data API extension and fixes, which mainly focus on speeding up the current data loading pipeline using gluon dataset and dataloader.
Motivation
The current data loading pipeline is the major bottleneck for many training tasks. We can summarize the entire flow as:
| Dataset.__getitem__ ->|Transform.__call__()/forward() ->| Batchify ->| (optional communicate through shared_mem) ->| split_and_load(ctxs) ->|<training on GPUs>
->
where there are performance concerns:
performance of python dataset/transform functions aren't satisfying
it's not easy to embrace multithreading to speed up dataloading due to global interpreter lock
python multiprocessing is unfortunately slow and error prune, not to mention the shared memory implementations on different OS are quite difference and very annoying(e.g., it's very likely to run out of shared memory if not properly taken care of)
currently memory planing for batchify is non-exist, causing frequent alloc/dealloc for large chunk of memory if the batch size is big
batchify then split and load can be optimized to partial_batchify
Proposal
To alleviate the existing troubles I propose to use a hybrid solution, that is to
provide C++ Datasets that can cover the most usecases
fromgluon.data.datasetimportTupleDataset, ImageFolderDataset, ArrayDataset# as long as TupleDataset, ImageSequenceDataset, ArrayDataset are supported by backenddataset=TupleDataset([ImageSequenceDataset(img_paths), ArrayDataset(image_labels)])
# dataset is an image classification dataset while fully supported in C++# with TupleDataset we can combine as many data as possible# a C++ backed Dataset can have a magic __handle__ method to return the c++ handle for referenceclassTupleDataset:
def__init__(self, datasets):
ifall([callable(getattr(dataset, '__handle__')) fordatasetindatasets]):
# all supported by backendself._tuple_dataset=check_call(_LIB.MXTupleDatasetCreate([getattr(dataset, '__handle__') fordatasetindatasets]))
else:
self._tuple_dataset=Nonedef__handle__(self):
returnself._tuple_dataset
provide common C++ batchify functions that are split and context aware. Batchify with memory planner is TBD.
provide a C++ MultithreadingDataLoader which inherit the same arguments as gluon.data.DataLoader but use mxnet internal multithreading rather than python multiprocessing.
fallback to python multiprocessing whenever
the dataset is not fully supported by backend(e.g., there are custom python datasets)
Transform is not fully hybridizable
Batchify is not fully supported by backend
User will continue to use the existing gluon.data.DataLoader, and the conversion will be applied automatically
loader=gluon.data.DataLoader(hybrid_dataset.transform(hybrid_transform), batch_size=32, batchify_fn=hybrid_batchify)
defDataLoader:
def__init__(self, dataset, ...):
ifisinstance(dataset, _LazyTransformDataset) andis_hybrid(dataset._transform) andis_hybrid(dataset) andis_hybrid(batchify_fn):
self._mt_dataloader=check_call(_LIB.MXMultiThreadDataLoaderCreate(...))
def__iter__(self):
ifself._mt_dataloader:
returnself._mt_dataloaderelse:
# fallback to single thread normal dataloader or multiprocessing dataloader
With this change, mxnet 2.0 will get smooth transition to mixed data loaders. Please comment with specific examples where this proposal fail to accommodate.
The text was updated successfully, but these errors were encountered:
Description
This is the part 2 of Gluon Data API extension and fixes, which mainly focus on speeding up the current data loading pipeline using gluon dataset and dataloader.
Motivation
The current data loading pipeline is the major bottleneck for many training tasks. We can summarize the entire flow as:
where there are performance concerns:
Proposal
To alleviate the existing troubles I propose to use a hybrid solution, that is to
MultithreadingDataLoader
which inherit the same arguments asgluon.data.DataLoader
but use mxnet internal multithreading rather than python multiprocessing.User will continue to use the existing
gluon.data.DataLoader
, and the conversion will be applied automaticallyWith this change, mxnet 2.0 will get smooth transition to mixed data loaders. Please comment with specific examples where this proposal fail to accommodate.
The text was updated successfully, but these errors were encountered: