Real time data augmentation and online batch generation. #449

jrabary · 2015-11-01T13:25:57Z

Mxnet looks very promising and I would like to give it a try. I want to train a triplet network like google facenet. One of the key issue to get a succefull training is the selection on a good triplet data to fed to the network. Since generating all possible triplets is not tractable, selecting them during the training is a good compromise. So is it currently possible to add additional steps it the data iterator and in the SGD step such that one can do some statistics on the current batch using the current model in order to select good triplets ? The idea is to transform the source data set with just labeled image into triplet during training. Currently, I use fuel and blocks frameworks and chain different transformer to achieve this. May be Mxnet already have such kind of data pipeline transformation, or can I plug fuel into Mxnet ?

tqchen · 2015-11-01T16:37:49Z

Can you give a it more detail on what kind of triplet information you would like to have.

Say get multiple mini-batch of data, run a scoring over current network, and generate the pairs training set?

It is possible, in a sense that all the things after data loading goes into python side, and you can always insert such augmentation step (pairing) on python side to generate the triplets. It requires a bit tweaking on current code.

jrabary · 2015-11-01T20:09:50Z

Precisely, I generate the batch of triplets as follow :

start with parent dataset that contains labeled images ( image + class )
at each step
- create a batch with an evenly distributed images per class ( say n classes with m images each)
- from this mini batch, Create a batch of triplets formed with an anchor image, positive image and negative image. The negative images are selected wrt to current network score.
- update parameters using this batch of triplets.
  with fuel, all of this is done by chaining different data transformer. The SGD optimizer just fetches the batch from the pipeline. Do we have similar feature with Mxnet ?

tqchen · 2015-11-01T20:20:37Z

The data transforming pipeline on the IO side was not as rich as you mentioned. However, the pipline you mentioned should be able to implemented using the current python API

jrabary · 2015-11-01T20:29:04Z

How should I get started for that :

customizing the data iterator ?
changing the model.fit method ?

tqchen · 2015-11-01T20:32:35Z

Starting from the getting data part of model.fit should be the easiest way. Customizing data iterator with a pipeline should make the approach more modularized and suitable for future usecases

jrabary · 2015-11-06T09:18:45Z

After checking a closed look into model.fit method, I notice that it just need a DataIterator object. Then, what is the requirement for an iterator to make it works with model.fit ? Does fuel data stream http://fuel.readthedocs.org/en/latest/overview.html will work out of the box ?

tqchen · 2015-11-06T18:18:26Z

I think if you write an adapter to implement the DataIter, things might work out of the box

https://github.com/apache/incubator-mxnet/releases/tag/1.3.1 This updates to MXNet 1.3.1. We are skipping 1.2.x because they indirectly reference missing commits. The update to 1.3.0 is skipped because there is already the patched version 1.3.1 available. The conflicts in `.gitmodules` are manually resolved to keep our beaver and makd dependencies. For all other conflicts their version was picked all the time. To see our changes with respect to upstream MXNet diff against the MXNet 1.3.1 tag, i.e., `git diff 1.3.1..HEAD`. * 3rdparty/dmlc-core e9446f5(e9446f5)...0a0e8ad(0a0e8ad) (41 commits) > Add OMPException class and use it for Text Parser (apache#445) > Fix build problem on windows (apache#450) > switch to safe_load for kubernetes config load (apache#449) > Add S3_IS_AWS env and fixed non-AWS behavior (apache#444) > add error message for s3 list (apache#439) (...) * 3rdparty/mkldnn 0e7ca738(0e7ca738)...0e7ca738(0e7ca738) (99 commits) > build: bumped version to v0.14 in readme > build: bumped version to v0.14 > cpu: reorder: start using jit uni for 8x8 transposition > cpu: reorder: jit uni: add 8x8 kernel > cpu: reorder: enable jit uni reorder (...) * 3rdparty/mshadow a8c650c(a8c650c)...8a9e337(8a9e337) (9 commits) > Merge pull request apache#358 from eric-haibin-lin/revert > Merge pull request apache#357 from azai91/revert/d68d3 > Merge pull request apache#356 from szha/omp > Add half_t support for batch_dot. (apache#353) > Allow large array operation in MXNet (apache#348) (...) * 3rdparty/onnx-tensorrt ()...3d8ee04(3d8ee04) (1 commits) > Refactor onnxGetBackendInfo (apache#39) * 3rdparty/ps-lite v1+144(a6dda54)...v1+146(8a76389) (1 commits) > Merge pull request apache#133 from CodingCat/turn_up_down * 3rdparty/tvm v0.3+434(90db723)d...v0.3+434(90db723)d (1 commits) > [FRONTEND] A Python hybrid frontend (apache#1251)

piiswrong closed this as completed Jun 28, 2016

jens-mueller-sociomantic mentioned this issue Apr 10, 2019

Merge stable 1.3.1 release of MXNet sociomantic-tsunami/mxnet#35

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Real time data augmentation and online batch generation. #449

Real time data augmentation and online batch generation. #449

jrabary commented Nov 1, 2015

tqchen commented Nov 1, 2015

jrabary commented Nov 1, 2015

tqchen commented Nov 1, 2015

jrabary commented Nov 1, 2015

tqchen commented Nov 1, 2015

jrabary commented Nov 6, 2015

tqchen commented Nov 6, 2015

Real time data augmentation and online batch generation. #449

Real time data augmentation and online batch generation. #449

Comments

jrabary commented Nov 1, 2015

tqchen commented Nov 1, 2015

jrabary commented Nov 1, 2015

tqchen commented Nov 1, 2015

jrabary commented Nov 1, 2015

tqchen commented Nov 1, 2015

jrabary commented Nov 6, 2015

tqchen commented Nov 6, 2015