Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add image-pipe ops for zero-copy image generation #23481

Closed
wants to merge 1 commit into from

Conversation

ghostplant
Copy link
Contributor

An ops in place of tf.keras.preprocessing.ImageDataGenerator with very high-performance data generation performance loading from on-disk original image directories to GPU directly with ZeroCopy, which could achieve ~96% performance of synthetic dataset training for modern models like Resnet50/Inception3/..

  1. Multi-worker deterministic image input by configuration of seed, which is not supported by tf.keras.preprocessing.ImageDataGenerator;
  2. Support direct image generation with either NCHW or NHWC format;
  3. Support target image resize in place and interleaving generation;
  4. Reference of internal image directory format -
/train/
    /class-monkey/
        aug_1.jpg
        aug_2.jpg
    /class-bird/
        aug_1.jpg
        aug_2.jpg

@mrry
Copy link
Contributor

mrry commented Nov 5, 2018

Thanks for the contribution! Since we are in the process of sunsetting tf.contrib, we are not currently accepting new submodules.

However, this contribution might make sense as part of the Special Interest Group on IO, which is planned to host a repository of community-maintained I/O-related code (primarily Dataset and FileSystem implementations).

I'm going to close this PR for now, but please consider joining the SIG IO mailing list, and contributing to that repository when it is set up.

/cc @ewilderj

@mrry mrry closed this Nov 5, 2018
@ghostplant
Copy link
Contributor Author

OK, Thanks.

@fangjyshanghai
Copy link

Thanks

@yongtang
Copy link
Member

@ghostplant The sig-io and its repo is in place now: https://github.com/tensorflow/io

You can consider open a PR in the repo and join the discussion in sig-io groups: https://groups.google.com/a/tensorflow.org/forum/#!forum/io

@ghostplant
Copy link
Contributor Author

@yongtang Yeah, that's great!

@ghostplant
Copy link
Contributor Author

@yongtang Do you know there is a C-based header library included in Tensorflow that could iterate filesystem's directories and file elements on any generic OS platform (e.g. Linux, Windows, MacOS) like what os.walk in python does?

@ghostplant
Copy link
Contributor Author

OK, I found a good reference here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/posix/posix_file_system.cc

@feihugis
Copy link
Member

@ghostplant You can also check MatchingFilesDatasetOp, which can iterate filesystem's directories and files on Linux, Windows, and Mac as well. Its Python API is here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants