-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Real time data augmentation and online batch generation. #449
Comments
Can you give a it more detail on what kind of triplet information you would like to have.
It is possible, in a sense that all the things after data loading goes into python side, and you can always insert such augmentation step (pairing) on python side to generate the triplets. It requires a bit tweaking on current code. |
Precisely, I generate the batch of triplets as follow :
|
The data transforming pipeline on the IO side was not as rich as you mentioned. However, the pipline you mentioned should be able to implemented using the current python API |
How should I get started for that :
|
Starting from the getting data part of model.fit should be the easiest way. Customizing data iterator with a pipeline should make the approach more modularized and suitable for future usecases |
After checking a closed look into model.fit method, I notice that it just need a DataIterator object. Then, what is the requirement for an iterator to make it works with model.fit ? Does fuel data stream http://fuel.readthedocs.org/en/latest/overview.html will work out of the box ? |
I think if you write an adapter to implement the DataIter, things might work out of the box |
https://github.com/apache/incubator-mxnet/releases/tag/1.3.1 This updates to MXNet 1.3.1. We are skipping 1.2.x because they indirectly reference missing commits. The update to 1.3.0 is skipped because there is already the patched version 1.3.1 available. The conflicts in `.gitmodules` are manually resolved to keep our beaver and makd dependencies. For all other conflicts their version was picked all the time. To see our changes with respect to upstream MXNet diff against the MXNet 1.3.1 tag, i.e., `git diff 1.3.1..HEAD`. * 3rdparty/dmlc-core e9446f5(e9446f5)...0a0e8ad(0a0e8ad) (41 commits) > Add OMPException class and use it for Text Parser (apache#445) > Fix build problem on windows (apache#450) > switch to safe_load for kubernetes config load (apache#449) > Add S3_IS_AWS env and fixed non-AWS behavior (apache#444) > add error message for s3 list (apache#439) (...) * 3rdparty/mkldnn 0e7ca738(0e7ca738)...0e7ca738(0e7ca738) (99 commits) > build: bumped version to v0.14 in readme > build: bumped version to v0.14 > cpu: reorder: start using jit uni for 8x8 transposition > cpu: reorder: jit uni: add 8x8 kernel > cpu: reorder: enable jit uni reorder (...) * 3rdparty/mshadow a8c650c(a8c650c)...8a9e337(8a9e337) (9 commits) > Merge pull request apache#358 from eric-haibin-lin/revert > Merge pull request apache#357 from azai91/revert/d68d3 > Merge pull request apache#356 from szha/omp > Add half_t support for batch_dot. (apache#353) > Allow large array operation in MXNet (apache#348) (...) * 3rdparty/onnx-tensorrt ()...3d8ee04(3d8ee04) (1 commits) > Refactor onnxGetBackendInfo (apache#39) * 3rdparty/ps-lite v1+144(a6dda54)...v1+146(8a76389) (1 commits) > Merge pull request apache#133 from CodingCat/turn_up_down * 3rdparty/tvm v0.3+434(90db723)d...v0.3+434(90db723)d (1 commits) > [FRONTEND] A Python hybrid frontend (apache#1251)
https://github.com/apache/incubator-mxnet/releases/tag/1.3.1 This updates to MXNet 1.3.1. We are skipping 1.2.x because they indirectly reference missing commits. The update to 1.3.0 is skipped because there is already the patched version 1.3.1 available. The conflicts in `.gitmodules` are manually resolved to keep our beaver and makd dependencies. For all other conflicts their version was picked all the time. To see our changes with respect to upstream MXNet diff against the MXNet 1.3.1 tag, i.e., `git diff 1.3.1..HEAD`. * 3rdparty/dmlc-core e9446f5(e9446f5)...0a0e8ad(0a0e8ad) (41 commits) > Add OMPException class and use it for Text Parser (apache#445) > Fix build problem on windows (apache#450) > switch to safe_load for kubernetes config load (apache#449) > Add S3_IS_AWS env and fixed non-AWS behavior (apache#444) > add error message for s3 list (apache#439) (...) * 3rdparty/mkldnn 0e7ca738(0e7ca738)...0e7ca738(0e7ca738) (99 commits) > build: bumped version to v0.14 in readme > build: bumped version to v0.14 > cpu: reorder: start using jit uni for 8x8 transposition > cpu: reorder: jit uni: add 8x8 kernel > cpu: reorder: enable jit uni reorder (...) * 3rdparty/mshadow a8c650c(a8c650c)...8a9e337(8a9e337) (9 commits) > Merge pull request apache#358 from eric-haibin-lin/revert > Merge pull request apache#357 from azai91/revert/d68d3 > Merge pull request apache#356 from szha/omp > Add half_t support for batch_dot. (apache#353) > Allow large array operation in MXNet (apache#348) (...) * 3rdparty/onnx-tensorrt ()...3d8ee04(3d8ee04) (1 commits) > Refactor onnxGetBackendInfo (apache#39) * 3rdparty/ps-lite v1+144(a6dda54)...v1+146(8a76389) (1 commits) > Merge pull request apache#133 from CodingCat/turn_up_down * 3rdparty/tvm v0.3+434(90db723)d...v0.3+434(90db723)d (1 commits) > [FRONTEND] A Python hybrid frontend (apache#1251)
Mxnet looks very promising and I would like to give it a try. I want to train a triplet network like google facenet. One of the key issue to get a succefull training is the selection on a good triplet data to fed to the network. Since generating all possible triplets is not tractable, selecting them during the training is a good compromise. So is it currently possible to add additional steps it the data iterator and in the SGD step such that one can do some statistics on the current batch using the current model in order to select good triplets ? The idea is to transform the source data set with just labeled image into triplet during training. Currently, I use fuel and blocks frameworks and chain different transformer to achieve this. May be Mxnet already have such kind of data pipeline transformation, or can I plug fuel into Mxnet ?
The text was updated successfully, but these errors were encountered: