Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Gluon data 2.0: c++ dataloader and built-in image/bbox transforms #17841

Merged
merged 17 commits into from
May 7, 2020

Conversation

zhreshold
Copy link
Member

Description

This is the implementation for proposal #17269

Changes

The major components of this PR is:

  • c++ threaded dataloader with native support from the following datasets/batchify_functions
  • c++ native support for datasets covering gluon.data.datasets and gluon.data.vision.datasets
  • c++ batchify functions including Stack, Pad, Groups
  • Gluon version of transforms that can apply data augmentations to image/bboxes
  • Gluon version of data iterator substitutes of mx.image.ImageIter and mx.image.ImageDetIter for simplest interface for loading image classification and object detection datsets.

Code Review

Since this PR involves lots of backend and frontend changes and can not be easily divided into multiple PRs, so I would like to suggest modular code reviews:

Modules notes suggested reviewers
include/mxnet/c_api.h/cc, include/mxnet/io.h, python/mxnet/gluon/data/_internal.py, python/mxnet/io/io.py c api changes for io: datasets and batchify functions, with corresponding python frontends @szha, @leezu
python/mxnet/gluon/data/batchify.py(dataloader.py, dataset.py) gluon data loader and batchify functions @sxjscience
python/mxnet/gluon/contrib/data/vision/dataloader.py, python/mxnet/gluon/contrib/data/vision/transforms/* substitutes for deprecated mx.image.ImageIter and mx.image.ImageDetIter and old augmenters all committers
python/mxnet/gluon/nn/basic_layers.py changes to nn.sequential and nn.HybridSequential to allow Sequential block to take more than 1 arguments all gluon contributors
python/mxnet/image/image.py numpy compatitility change @haojin2
src/imperative/* Minor modification to cached op to allow operators to skip engine in imperative executions @eric-haibin-lin
src/io/dataloader.cc(dataset.cc, batchify.cc), src/io/iter_sampler.cc c++ implementations of gluon.data(gluon.data. dataset, gluon.data.batchify) @leezu @sxjscience
src/operator/image/* operators to support many random image transformations committers familiar with vision operators

Thank you for everyone who participate in code review and sorry about the size of this PR.

@zhreshold
Copy link
Member Author

Ping for reviewers' attention. I will fix lint along with review comments to reduce CI triggers.

python/mxnet/gluon/data/_internal.py Outdated Show resolved Hide resolved
python/mxnet/gluon/data/_internal.py Outdated Show resolved Hide resolved
python/mxnet/gluon/data/_internal.py Show resolved Hide resolved
python/mxnet/gluon/data/_internal.py Outdated Show resolved Hide resolved
python/mxnet/gluon/data/_internal.py Outdated Show resolved Hide resolved


namespace mxnet {
/*! \brief NaiveCachedOp which does not involve engine which is useful when executed in parallel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-haibin-lin how many CachedOp's will we have after the unification?

src/io/dataloader.cc Outdated Show resolved Hide resolved
include/mxnet/io.h Outdated Show resolved Hide resolved
src/io/dataset.cc Outdated Show resolved Hide resolved
include/mxnet/io.h Outdated Show resolved Hide resolved
Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job! Would you mind also looking into how we adopt the new FFI?

@zhreshold
Copy link
Member Author

Awesome job! Would you mind also looking into how we adopt the new FFI?

I am thinking of a separate PR that addresses the FFI improvement for all data related APIs. For now, maintaining consistency with other data APIs is preferable.

Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use constructor to initialize the classes instead of requiring users to call Init after construction?

@zhreshold
Copy link
Member Author

Let's use constructor to initialize the classes instead of requiring users to call Init after construction?

@leezu Sorry I missed this comment. If you look at the creator in c_api used for list and create the existing mxnet iterators: https://github.com/apache/incubator-mxnet/blob/master/src/c_api/c_api.cc#L1859, you will find that the registry is used to hold all uninitialized iterators that has been registered to io module: https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/io/io.py#L984

So if I follow the current registeration patter in python module, I have to follow the empty constructor + init function.

@leezu
Copy link
Contributor

leezu commented Apr 16, 2020

I'm not convinced. You can change the call signature of the registry and provide const std::vector<std::pair<std::string, std::string> >& kwargs to the constructor instead of the Init function. There should be no problem.

Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added two comments inline to provide more information on how to use constructor

include/mxnet/io.h Outdated Show resolved Hide resolved
src/c_api/c_api.cc Outdated Show resolved Hide resolved
@RuRo
Copy link
Contributor

RuRo commented Apr 20, 2020

@zhreshold I see, you increased the timeout for the GPU tests. If you are trying to avoid aborts of the Python 3: GPU and Python 3: GPU (TVM_OP OFF) tests on unix-gpu after 3 hours, this won't help. This is a known bug, where those tests sometimes get completely stuck. These tests should normally complete in about an hour. See #18090 for discussion.

@zhreshold zhreshold force-pushed the gluon-data-2 branch 2 times, most recently from 870bf59 to 5d0ac0f Compare April 23, 2020 23:48
@zhreshold
Copy link
Member Author

@leezu Outstanding comments fixed, can you take another look?
@szha @eric-haibin-lin can you please either request change or approve?

Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zhreshold!

I have two general questions and also left some comments / requested changes inline:

  • What's the plan for python/mxnet/gluon/contrib/data/vision/transform? You mentioned we like to remove vision API and keep it in GluonCV only? Does that not apply to the data API? Will the data API be removed in GluonCV or how is the relation to the one in MXNet?
  • How can users run the pipeline / part of it on GPU? Some of the operators added in this PR do have GPU implementations, but I'm not sure how they could be used. Is there any plan on how to extend the current design to integrate with external libraries such as nvidia dali?

For the inline comments, the "requested changes" are with respect to CI Windows build script and the random state in the samplers. The others are more open-ended questions

ci/build_windows.py Outdated Show resolved Hide resolved

return augmenter

class ImageDataLoader(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be mostly equivalent to using normal DataLoader and calling dataset.transform_first(augmenter)? Would it be easier for users and more maintainable to just use DataLoader?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a thin wrapper on top of DataLoader, the approach you mentioned is exactly what I was thinking of . In fact, anyone can directly use DataLoader with transform or transform_first by manually compose the transforms.

However, I guess users of the legacy ImageDetIter won't be happy with the deprecation of the old iterator so I am trying to keep a contrib version with augmenters wrapped in for convenient transition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't they be happy, given that the composition of transforms appears straightforward?
Having a lot of wrappers around DataLoader will make it harder for us to maintain the codebase in the future, and may also confuse users as it's not clear to them that the manual composition approach is equivalent to using the wrapper. So I wonder if it's better to remove the wrapper and try keep our codebase shorter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 200 line of code can save each user 5min at least, especially for beginners. Actually it's not duplicating any code block and can handle most usecases for images.

So I think it's worth the effort to put in gluon.contrib and get maintained.

python/mxnet/gluon/data/dataloader.py Outdated Show resolved Hide resolved
python/mxnet/gluon/data/dataloader.py Outdated Show resolved Hide resolved
python/mxnet/gluon/data/dataloader.py Outdated Show resolved Hide resolved
src/io/iter_sampler.cc Outdated Show resolved Hide resolved
src/io/iter_sampler.cc Outdated Show resolved Hide resolved
tests/python/unittest/test_numpy_interoperability.py Outdated Show resolved Hide resolved
include/mxnet/io.h Outdated Show resolved Hide resolved
src/io/dataset.cc Outdated Show resolved Hide resolved
@zhreshold
Copy link
Member Author

Thanks @zhreshold!

I have two general questions and also left some comments / requested changes inline:

  • What's the plan for python/mxnet/gluon/contrib/data/vision/transform? You mentioned we like to remove vision API and keep it in GluonCV only? Does that not apply to the data API? Will the data API be removed in GluonCV or how is the relation to the one in MXNet?
  • How can users run the pipeline / part of it on GPU? Some of the operators added in this PR do have GPU implementations, but I'm not sure how they could be used. Is there any plan on how to extend the current design to integrate with external libraries such as nvidia dali?

For the inline comments, the "requested changes" are with respect to CI Windows build script and the random state in the samplers. The others are more open-ended questions

@leezu

  • We are indeed removing CV specific implementation, especially network definitions, in favor of GluonCV. In the meantime some fundmental image processing transforms will still be added to MXNet. For definition of fundmental, I refer to the usage in D2L for guidance.

  • For GPU, there's a lot of topics we can discuss, including end-to-end GPU training, DALI integration, hybrid CPU/GPU data pipeline, (de)serialization of data pipeline. I am open to all these discussions, but do not have bandwidth to work on those areas yet.

@leezu
Copy link
Contributor

leezu commented Apr 29, 2020

So the fundamental APIs would only be maintained in MXNet side and removed from GluonCV?

@zhreshold
Copy link
Member Author

So the fundamental APIs would only be maintained in MXNet side and removed from GluonCV?

@leezu Yes

@zhreshold zhreshold force-pushed the gluon-data-2 branch 2 times, most recently from 0a83a6f to 86b7c3f Compare May 1, 2020 20:12
src/io/dataset.cc Outdated Show resolved Hide resolved
src/io/iter_sampler.cc Outdated Show resolved Hide resolved
src/io/dataloader.cc Outdated Show resolved Hide resolved
src/io/dataloader.cc Show resolved Hide resolved
@zhreshold zhreshold merged commit a0e6735 into apache:master May 7, 2020
rondogency pushed a commit to rondogency/incubator-mxnet that referenced this pull request Jul 2, 2020
…ache#17841)

* c++ dataloader and built-in image/bbox

* update

* fix error

* fix import error

* fix ci build

* fix vs openmp loop type

* fix warning as error with sign/unsign comp

* sign/unsign comp

* update to pytest

* remove nose

* fix tear_down

* address comments

* thread safe dataset

* address comments

* address comments

* fix

* serial pytest for data download

(cherry picked from commit a0e6735)
rondogency pushed a commit to rondogency/incubator-mxnet that referenced this pull request Jul 2, 2020
…ache#17841)

* c++ dataloader and built-in image/bbox

* update

* fix error

* fix import error

* fix ci build

* fix vs openmp loop type

* fix warning as error with sign/unsign comp

* sign/unsign comp

* update to pytest

* remove nose

* fix tear_down

* address comments

* thread safe dataset

* address comments

* address comments

* fix

* serial pytest for data download
AntiZpvoh pushed a commit to AntiZpvoh/incubator-mxnet that referenced this pull request Jul 6, 2020
…ache#17841)

* c++ dataloader and built-in image/bbox

* update

* fix error

* fix import error

* fix ci build

* fix vs openmp loop type

* fix warning as error with sign/unsign comp

* sign/unsign comp

* update to pytest

* remove nose

* fix tear_down

* address comments

* thread safe dataset

* address comments

* address comments

* fix

* serial pytest for data download
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants