[MXNET-1438] Adding SDML loss function #17298

anjishnu · 2020-01-14T03:31:38Z

Description

(Brief description on what this PR is about)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html

Changes

Added new loss for information retrieval - Smoothed Deep Metric Learning Loss with relevant API doc.

Comments

Added loss function outlined in https://arxiv.org/abs/1905.12786 - example to follow.

haojin2 · 2020-01-14T05:59:28Z

@anjishnu Please address the sanity errors: http://jenkins.mxnet-ci.amazon-ml.com/job/mxnet-validation/job/sanity/job/PR-17298/1/display/redirect . Also can you randomize your unit test to ensure that we're covering more numerically different cases?

anjishnu · 2020-01-14T23:05:57Z

@haojin2 Sure will address the sanity cases.

Can you give an example of a unit test that is appropriately randomized so I can base it on that?

haojin2 · 2020-01-14T23:15:59Z

@anjishnu Like this: https://github.com/apache/incubator-mxnet/pull/17298/files#diff-b129d416cc6980021cb35420a66e4d3aR334

anjishnu · 2020-01-15T01:50:37Z

It looks a little tricky to port this into the 'fit' and 'score' paradigm since this is a retrieval specific loss function which uses the other elements in a batch as implicit negative samples - and I'm not sure how cleanly it fits into the Module API for this kind of test. Specially since the loss computation needs to know the shape of the minibatch which doesn't seem to be possible in the symbol API.

The loss only guarantees that associated pairs will be closer in the chosen metric space after learning as compared to the non-associated pairs.

Maybe I can write something equivalent using the gluon API, to train a small network and ensure it learns the right associations. I'll come up with a proposal shortly.

haojin2 · 2020-01-15T18:00:47Z

@anjishnu I was actually only asking for getting the input data randomized with the original version of test code untouched.

anjishnu · 2020-01-15T18:58:21Z

@haojin2 if I randomized the input data in the original test code the losses would would have different values during each run (SDML loss imposes a distribution over the relative distances of data points in a minibatch) - so I would not be able to compare the output against precomputed loss values any more - thus the original unit test procedure cannot be reused.

That's why I added a test that fits a toy model to some toy data instead. The current test was running in ~50 ms on my machine on CPU. Would love to hear your thoughts on how to improve on this.

anjishnu · 2020-01-16T00:50:52Z

Verifying that mention of this PR in #17292 is a typo - author meant to refer to #17208 - we are not breaking horovod

anjishnu · 2020-01-16T21:08:40Z

@haojin2 Are there any serious concerns with the new unit test?

haojin2 · 2020-01-16T21:11:56Z

@anjishnu I'm only concerned about if there's any flakiness in such a test, to verify, please try
MXNET_TEST_COUNT=500 nosetests tests/python/unittest/test_loss.py:test_sdml_loss on your end and lemme know if it passes.

anjishnu · 2020-01-17T00:08:18Z

@haojin2
With the original confirguration I was getting around 1 in 100 or 1 in 200 failures, which is not acceptable for a unit test.

Changing the hyperparameters (e.g. increasing dimensionality, lowering N) does make it more robust (tried up to MXNET_TEST_COUNT=2000 runs) but there's always a chance of failure

Other options:

Go with the original precomputed values approach, but maybe over a larger range of precomputed values.
Set up a similar test ... but use correlated data instead of using purely random data:
- Generate the positive samples that are actually correlated to the paired data rather than purely random.
- Always use some set of samples that are known to converge successfully
Add a test to see loss reduction (e.g. to <0.05 as in many of the other tests)

Which of these would be your preferred approach - I have tested them all to MXNET_TEST_COUNT=2000

anjishnu · 2020-01-17T19:10:02Z

@haojin2 could you take a look at the latest version - the new unit test (which looks at loss reduction) is passing MXNET_TEST_COUNT=2000

haojin2 · 2020-01-17T19:12:00Z

@anjishnu it seems ok. I've re-triggered your failed website build, I think it's good for merge after that passes.

anjishnu · 2020-01-17T19:45:48Z

@haojin2 ok, thanks! sounds good!

haojin2 · 2020-01-17T22:07:25Z

@anjishnu Merged, thx for the contribution.

* Added loss function * cleaning out loss function * added loss fn * added loss function and unit test * fixed linter issues and updated unit test to train net and eval * fixed documentation * made unit tests more robust

szha · 2020-07-26T05:24:50Z

python/mxnet/gluon/loss.py

+        distances = self._compute_distances(x1, x2)
+        log_probabilities = F.log_softmax(-distances, axis=1)
+        # multiply for the number of labels to obtain the correct loss (gluon kl_loss averages instead of sum)
+        return self.kl_loss(log_probabilities, labels.as_in_context(distances.context)) * batch_size


this doesn't work in sym. 1.x will need to be fixed while 2.0 will be switched to deferred compute mode

anjishnu and others added 5 commits January 2, 2020 17:17

Added loss function

03a0461

cleaning out loss function

8b102cc

Merge branch 'master' into sdml_loss

cf4c9f7

added loss fn

1a475c5

added loss function and unit test

149254a

anjishnu requested a review from szha as a code owner January 14, 2020 03:31

Ubuntu added 2 commits January 15, 2020 03:57

fixed linter issues and updated unit test to train net and eval

257e167

fixed documentation

4411423

apeforest mentioned this pull request Jan 15, 2020

Can't run horovod with latest nightly wheel #17292

Closed

made unit tests more robust

4cbc6a0

apeforest mentioned this pull request Jan 17, 2020

Fix Horovod build error due to missing exported symbols #17348

Merged

haojin2 merged commit 11a2dcb into apache:master Jan 17, 2020

szha reviewed Jul 26, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-1438] Adding SDML loss function #17298

[MXNET-1438] Adding SDML loss function #17298

anjishnu commented Jan 14, 2020

haojin2 commented Jan 14, 2020

anjishnu commented Jan 14, 2020 •

edited

Loading

haojin2 commented Jan 14, 2020

anjishnu commented Jan 15, 2020 •

edited

Loading

haojin2 commented Jan 15, 2020

anjishnu commented Jan 15, 2020 •

edited

Loading

anjishnu commented Jan 16, 2020

anjishnu commented Jan 16, 2020

haojin2 commented Jan 16, 2020

anjishnu commented Jan 17, 2020

anjishnu commented Jan 17, 2020

haojin2 commented Jan 17, 2020

anjishnu commented Jan 17, 2020

haojin2 commented Jan 17, 2020

szha Jul 26, 2020

[MXNET-1438] Adding SDML loss function #17298

[MXNET-1438] Adding SDML loss function #17298

Conversation

anjishnu commented Jan 14, 2020

Description

Checklist

Essentials

Changes

Comments

haojin2 commented Jan 14, 2020

anjishnu commented Jan 14, 2020 • edited Loading

haojin2 commented Jan 14, 2020

anjishnu commented Jan 15, 2020 • edited Loading

haojin2 commented Jan 15, 2020

anjishnu commented Jan 15, 2020 • edited Loading

anjishnu commented Jan 16, 2020

anjishnu commented Jan 16, 2020

haojin2 commented Jan 16, 2020

anjishnu commented Jan 17, 2020

anjishnu commented Jan 17, 2020

haojin2 commented Jan 17, 2020

anjishnu commented Jan 17, 2020

haojin2 commented Jan 17, 2020

szha Jul 26, 2020

Choose a reason for hiding this comment

anjishnu commented Jan 14, 2020 •

edited

Loading

anjishnu commented Jan 15, 2020 •

edited

Loading

anjishnu commented Jan 15, 2020 •

edited

Loading