Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-141] Add tutorial Gluon Datasets and DataLoaders #10251

Merged
merged 4 commits into from
Apr 2, 2018

Conversation

thomelane
Copy link
Contributor

Description

Intro to Datasets and Dataloaders.
Using own data with Included Dataset objects (including RecordIO format).
Using own data with Custom Dataset objects.
Wrappers for converting between DataLoader and DataIters.

Checklist

N/A. Added single markdown file.

Comments

https://issues.apache.org/jira/browse/MXNET-141

@thomelane thomelane requested a review from szha as a code owner March 26, 2018 21:00
@Ishitori
Copy link
Contributor

Looks fine to me, @thomelane

@piiswrong
Copy link
Contributor

  1. DataLoader has num_workers to allow parallelization. This is very important. We should introduce it to users.
  2. recordio is pretty hard to use. We are considering phasing it out. I don't think we should recommend it to users here

@thomelane
Copy link
Contributor Author

@piiswrong agree with num_workers. Will add commentary on that, and use it in the examples.
Question about RecordIO though... doesn't it give better performance? And if so, is there a replacement planned/implemented?

@piiswrong
Copy link
Contributor

It shouldn't matter that much, espesially when you are randomly accessing with shuffle

…ved Gluon DataLoader to Module DataIter wrapper.
@thomelane
Copy link
Contributor Author

@piiswrong Okay, good to know, I've removed the section on RecordIO. We should mention on the website/docs more clearly that it's not recommended, this could confuse new users otherwise. And even add a deprecation warning to the im2rec.py file.

Also realized that I had used num_workers in the tutorial already, just on the second usage of DataLoader. So I've moved the discussion to the first time DataLoader is used, so there's no chance people can miss it!

And lastly, I've removed the wrapper to convert from DataLoader to DataIter based on your email feedback. I think keeping DataIter to DataLoader would still be useful for the case where people have already implemented a lot of augmentation, etc, using DataIters.

@indhub indhub merged commit 9a0d002 into apache:master Apr 2, 2018
lanking520 pushed a commit to lanking520/incubator-mxnet that referenced this pull request Apr 2, 2018
* Added tutorial for Gluon datasets and data loaders.

* Changes as per code review.

* Added link to tutorial in index.md.

* Cut section on RecordIO. Moved num_workers discussion higher up. Removed Gluon DataLoader to Module DataIter wrapper.
haojin2 pushed a commit to haojin2/incubator-mxnet that referenced this pull request Apr 2, 2018
* Added tutorial for Gluon datasets and data loaders.

* Changes as per code review.

* Added link to tutorial in index.md.

* Cut section on RecordIO. Moved num_workers discussion higher up. Removed Gluon DataLoader to Module DataIter wrapper.
rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018
* Added tutorial for Gluon datasets and data loaders.

* Changes as per code review.

* Added link to tutorial in index.md.

* Cut section on RecordIO. Moved num_workers discussion higher up. Removed Gluon DataLoader to Module DataIter wrapper.
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
* Added tutorial for Gluon datasets and data loaders.

* Changes as per code review.

* Added link to tutorial in index.md.

* Cut section on RecordIO. Moved num_workers discussion higher up. Removed Gluon DataLoader to Module DataIter wrapper.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants