Setting all the optimizers to have useLocking = True #310

Craigacp · 2021-04-30T19:37:19Z

I'm running into determinism issues when training models using TF-Java. This is one area which could be causing it, as in TF 2 all the optimizers have useLocking=true. We don't currently set this in TF-Java, and I'm worried the code path is degrading (as it says the behaviour may be undefined but faster with useLocking=false).

This doesn't resolve my non-determinism issue completely, but it seems a little better.

I've added a test to GradientDescentTest which checks that the models produced are identical. This test fails randomly, for two reasons, both of which are confusing. First it seems like when we fetch the weights from a model, we get a pointer to the weights, not a copy of them. This means that when they are trained the "copies" I'm saving out as the initialized ones are being updated. Second, and this is the real issue, the gradient updates can be different on identical models for identical data, for no apparent reason. I think this is happening somewhere in the C API, as I can't see where it could be happening in our code and the models use identical GraphDefs.

Craigacp · 2021-04-30T19:40:16Z

I opened an issue for the non-determinism on tensorflow/tensorflow - tensorflow/tensorflow#48855

karllessard · 2021-04-30T20:09:55Z

@rnett , looks like the quick build is failing after merging #306 , can you please take a look at it?

karllessard · 2021-05-01T00:17:43Z

@Craigacp can you please rebase once more so we can give the quick-build another try?

…ing a determinism test that's currently failing.

Craigacp · 2021-05-01T00:56:10Z

Done.

Craigacp · 2021-05-01T01:07:17Z

Looks like everything passed this time.

karllessard

Thanks @Craigacp , I've approved it but left a few minor comments here and there, if you want to take a look

karllessard · 2021-05-01T12:59:53Z

tensorflow-framework/src/test/java/org/tensorflow/framework/optimizers/GradientDescentTest.java

+
+  // This test fails due to initialization and gradient issues. It should not, but it seems to be a
+  // problem
+  // in TF-core.


reformat this comment?

karllessard · 2021-05-01T13:03:03Z

tensorflow-framework/src/main/java/org/tensorflow/framework/optimizers/AdaGrad.java

@@ -42,6 +43,9 @@
  public static final float LEARNING_RATE_DEFAULT = 0.001f;
  public static final float INITIAL_ACCUMULATOR_DEFAULT = 0.01f;

+  private static final ApplyAdagrad.Options[] opts = new ApplyAdagrad.Options[]{
+          ApplyAdagrad.updateSlots(true),ApplyAdagrad.useLocking(true)};


Super nit: any formatter will probably complain about the missing space after a comma.

karllessard · 2021-05-01T13:07:08Z

tensorflow-framework/src/test/java/org/tensorflow/framework/optimizers/GradientDescentTest.java

+    }
+
+    for (int i = 1; i < numRuns; i++) {
+      assertEquals(initialLoss[0],initialLoss[i]);


Super nit: spaces after commas.

karllessard · 2021-05-01T13:09:16Z

tensorflow-framework/src/test/java/org/tensorflow/framework/optimizers/GradientDescentTest.java

+                .fetch(outputWeightName)
+                .fetch(outputBiasName)
+                .run());
+        System.out.println("Initialized - " + ndArrToString((TFloat32)initialized.get(i).get(3)));


Can we avoid the verbosity in the unit test, aren't the equality check enough to validate? I'm fine just commenting out these println

Craigacp · 2021-05-02T01:01:29Z

I'll clean up the test on Monday, and split it into two. There should be one test for the outputs returning a reference rather than a copy of the weights (as this might allow you to mutate the weights directly, which seems bad), and the current test which should just check the gradient behaviour.

It's all conflated in that single test as I spent hours trying to figure out what was going on, so it has got all the print statements and other stuff necessary for me to track it down.

I'll clean up the formatting issues at the same time.

Craigacp added 2 commits April 30, 2021 20:55

Setting all the optimizers to have useLocking = True, like Keras. Add…

5dd4b30

…ing a determinism test that's currently failing.

More work on the GradientDescentTest.

c0fc351

Craigacp force-pushed the locking-optimizer-fixes branch from 03402b3 to c0fc351 Compare May 1, 2021 00:56

karllessard previously approved these changes May 1, 2021

View reviewed changes

Tidying up the test.

c11c911

Craigacp dismissed karllessard’s stale review via c11c911 May 4, 2021 01:41

karllessard approved these changes May 4, 2021

View reviewed changes

karllessard merged commit a24b8ca into tensorflow:master May 4, 2021

zaleslaw mentioned this pull request Dec 15, 2021

Set all the optimizers to have useLocking = True Kotlin/kotlindl#305

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Setting all the optimizers to have useLocking = True #310

Setting all the optimizers to have useLocking = True #310

Uh oh!

Craigacp commented Apr 30, 2021

Uh oh!

Craigacp commented Apr 30, 2021

Uh oh!

karllessard commented Apr 30, 2021

Uh oh!

karllessard commented May 1, 2021

Uh oh!

Craigacp commented May 1, 2021

Uh oh!

Craigacp commented May 1, 2021

Uh oh!

karllessard left a comment

Uh oh!

karllessard May 1, 2021

Uh oh!

karllessard May 1, 2021

Uh oh!

karllessard May 1, 2021

Uh oh!

karllessard May 1, 2021

Uh oh!

Craigacp commented May 2, 2021 •

edited

Loading

Uh oh!

Uh oh!

Setting all the optimizers to have useLocking = True #310

Setting all the optimizers to have useLocking = True #310

Uh oh!

Conversation

Craigacp commented Apr 30, 2021

Uh oh!

Craigacp commented Apr 30, 2021

Uh oh!

karllessard commented Apr 30, 2021

Uh oh!

karllessard commented May 1, 2021

Uh oh!

Craigacp commented May 1, 2021

Uh oh!

Craigacp commented May 1, 2021

Uh oh!

karllessard left a comment

Choose a reason for hiding this comment

Uh oh!

karllessard May 1, 2021

Choose a reason for hiding this comment

Uh oh!

karllessard May 1, 2021

Choose a reason for hiding this comment

Uh oh!

karllessard May 1, 2021

Choose a reason for hiding this comment

Uh oh!

karllessard May 1, 2021

Choose a reason for hiding this comment

Uh oh!

Craigacp commented May 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Craigacp commented May 2, 2021 •

edited

Loading