Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Real time data augmentation and online batch generation. #449

Closed
jrabary opened this issue Nov 1, 2015 · 7 comments
Closed

Real time data augmentation and online batch generation. #449

jrabary opened this issue Nov 1, 2015 · 7 comments

Comments

@jrabary
Copy link

jrabary commented Nov 1, 2015

Mxnet looks very promising and I would like to give it a try. I want to train a triplet network like google facenet. One of the key issue to get a succefull training is the selection on a good triplet data to fed to the network. Since generating all possible triplets is not tractable, selecting them during the training is a good compromise. So is it currently possible to add additional steps it the data iterator and in the SGD step such that one can do some statistics on the current batch using the current model in order to select good triplets ? The idea is to transform the source data set with just labeled image into triplet during training. Currently, I use fuel and blocks frameworks and chain different transformer to achieve this. May be Mxnet already have such kind of data pipeline transformation, or can I plug fuel into Mxnet ?

@tqchen
Copy link
Member

tqchen commented Nov 1, 2015

Can you give a it more detail on what kind of triplet information you would like to have.

  • Say get multiple mini-batch of data, run a scoring over current network, and generate the pairs training set?

It is possible, in a sense that all the things after data loading goes into python side, and you can always insert such augmentation step (pairing) on python side to generate the triplets. It requires a bit tweaking on current code.

@jrabary
Copy link
Author

jrabary commented Nov 1, 2015

Precisely, I generate the batch of triplets as follow :

  • start with parent dataset that contains labeled images ( image + class )
  • at each step
    • create a batch with an evenly distributed images per class ( say n classes with m images each)
    • from this mini batch, Create a batch of triplets formed with an anchor image, positive image and negative image. The negative images are selected wrt to current network score.
    • update parameters using this batch of triplets.
      with fuel, all of this is done by chaining different data transformer. The SGD optimizer just fetches the batch from the pipeline. Do we have similar feature with Mxnet ?

@tqchen
Copy link
Member

tqchen commented Nov 1, 2015

The data transforming pipeline on the IO side was not as rich as you mentioned. However, the pipline you mentioned should be able to implemented using the current python API

@jrabary
Copy link
Author

jrabary commented Nov 1, 2015

How should I get started for that :

  • customizing the data iterator ?
  • changing the model.fit method ?

@tqchen
Copy link
Member

tqchen commented Nov 1, 2015

Starting from the getting data part of model.fit should be the easiest way. Customizing data iterator with a pipeline should make the approach more modularized and suitable for future usecases

@jrabary
Copy link
Author

jrabary commented Nov 6, 2015

After checking a closed look into model.fit method, I notice that it just need a DataIterator object. Then, what is the requirement for an iterator to make it works with model.fit ? Does fuel data stream http://fuel.readthedocs.org/en/latest/overview.html will work out of the box ?

@tqchen
Copy link
Member

tqchen commented Nov 6, 2015

I think if you write an adapter to implement the DataIter, things might work out of the box

jens-mueller-sociomantic added a commit to jens-mueller-sociomantic/mxnet that referenced this issue Apr 10, 2019
https://github.com/apache/incubator-mxnet/releases/tag/1.3.1

This updates to MXNet 1.3.1. We are skipping 1.2.x because they
indirectly reference missing commits. The update to 1.3.0 is skipped
because there is already the patched version 1.3.1 available.

The conflicts in `.gitmodules` are manually resolved to keep our
beaver and makd dependencies. For all other conflicts their version was
picked all the time.

To see our changes with respect to upstream MXNet diff against the MXNet
1.3.1 tag, i.e., `git diff 1.3.1..HEAD`.

* 3rdparty/dmlc-core e9446f5(e9446f5)...0a0e8ad(0a0e8ad) (41 commits)
  > Add OMPException class and use it for Text Parser (apache#445)
  > Fix build problem on windows (apache#450)
  > switch to safe_load for kubernetes config load (apache#449)
  > Add S3_IS_AWS env and fixed non-AWS behavior (apache#444)
  > add error message for s3 list (apache#439)
  (...)

* 3rdparty/mkldnn 0e7ca738(0e7ca738)...0e7ca738(0e7ca738) (99 commits)
  > build: bumped version to v0.14 in readme
  > build: bumped version to v0.14
  > cpu: reorder: start using jit uni for 8x8 transposition
  > cpu: reorder: jit uni: add 8x8 kernel
  > cpu: reorder: enable jit uni reorder
  (...)

* 3rdparty/mshadow a8c650c(a8c650c)...8a9e337(8a9e337) (9 commits)
  > Merge pull request apache#358 from eric-haibin-lin/revert
  > Merge pull request apache#357 from azai91/revert/d68d3
  > Merge pull request apache#356 from szha/omp
  > Add half_t support for batch_dot.  (apache#353)
  > Allow large array operation in MXNet (apache#348)
  (...)

* 3rdparty/onnx-tensorrt ()...3d8ee04(3d8ee04) (1 commits)
  > Refactor onnxGetBackendInfo (apache#39)

* 3rdparty/ps-lite v1+144(a6dda54)...v1+146(8a76389) (1 commits)
  > Merge pull request apache#133 from CodingCat/turn_up_down

* 3rdparty/tvm v0.3+434(90db723)d...v0.3+434(90db723)d (1 commits)
  > [FRONTEND] A Python hybrid frontend (apache#1251)
jens-mueller-sociomantic added a commit to jens-mueller-sociomantic/mxnet that referenced this issue Apr 12, 2019
https://github.com/apache/incubator-mxnet/releases/tag/1.3.1

This updates to MXNet 1.3.1. We are skipping 1.2.x because they
indirectly reference missing commits. The update to 1.3.0 is skipped
because there is already the patched version 1.3.1 available.

The conflicts in `.gitmodules` are manually resolved to keep our
beaver and makd dependencies. For all other conflicts their version was
picked all the time.

To see our changes with respect to upstream MXNet diff against the MXNet
1.3.1 tag, i.e., `git diff 1.3.1..HEAD`.

* 3rdparty/dmlc-core e9446f5(e9446f5)...0a0e8ad(0a0e8ad) (41 commits)
  > Add OMPException class and use it for Text Parser (apache#445)
  > Fix build problem on windows (apache#450)
  > switch to safe_load for kubernetes config load (apache#449)
  > Add S3_IS_AWS env and fixed non-AWS behavior (apache#444)
  > add error message for s3 list (apache#439)
  (...)

* 3rdparty/mkldnn 0e7ca738(0e7ca738)...0e7ca738(0e7ca738) (99 commits)
  > build: bumped version to v0.14 in readme
  > build: bumped version to v0.14
  > cpu: reorder: start using jit uni for 8x8 transposition
  > cpu: reorder: jit uni: add 8x8 kernel
  > cpu: reorder: enable jit uni reorder
  (...)

* 3rdparty/mshadow a8c650c(a8c650c)...8a9e337(8a9e337) (9 commits)
  > Merge pull request apache#358 from eric-haibin-lin/revert
  > Merge pull request apache#357 from azai91/revert/d68d3
  > Merge pull request apache#356 from szha/omp
  > Add half_t support for batch_dot.  (apache#353)
  > Allow large array operation in MXNet (apache#348)
  (...)

* 3rdparty/onnx-tensorrt ()...3d8ee04(3d8ee04) (1 commits)
  > Refactor onnxGetBackendInfo (apache#39)

* 3rdparty/ps-lite v1+144(a6dda54)...v1+146(8a76389) (1 commits)
  > Merge pull request apache#133 from CodingCat/turn_up_down

* 3rdparty/tvm v0.3+434(90db723)d...v0.3+434(90db723)d (1 commits)
  > [FRONTEND] A Python hybrid frontend (apache#1251)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants