Add Losses #129

JimClarke5 · 2020-10-08T22:57:20Z

This PR adds losses to framework.

All the loss sub-classes inherit from Loss.

The Losses class, has methods that can be called directly to get raw loss values. These are utilized by the Loss subclasses before applying a Reduction to the loss. The Losses class will also be used by some of the Metric classes when that feature is submitted.

The impl package has some helper methods and classes utilized by the loss classes, and are not expected to be exposed outside the framework module, when we do modules.

Sync with master tensorflow on upstream

…y Shape

deansher · 2020-10-09T21:54:41Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Losses.java

+   * @param tf The TensorFlow Ops
+   * @param labels the labels
+   * @param predictions the predictions
+   * @param <T> the data type of the result


perhaps "of the predictions and result"?

deansher · 2020-10-09T21:59:37Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/impl/Tuple.java

+ *
+ * @param <T> the data type of the Tuple entries.
+ */
+public class Tuple<T extends TNumber> {


The Tuple class name is uncomfortably vanilla for me. Perhaps LossTuple?

This object will also be used in Metrics as many metrics are built using loss classes or Losses methods. I have changed it to LossTuple.

deansher · 2020-10-09T22:03:44Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/impl/LossesImpl.java

+
+public class LossesImpl {
+
+  /**


The Javadocs in this file are still partly in markdown.

OK, I thought I caught them all, I will fix.

deansher · 2020-10-09T22:07:20Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/impl/LossesImpl.java

+   * @param tf the TensorFlow Ops
+   * @param predictions Predicted values, a <code>Operand</code> of arbitrary dimensions.
+   * @param labels Optional label <code>Operand</code> whose dimensions match <code>prediction</code>.
+   * @return Tuple of <code>prediction</code>, <code>label</code> and <code>sampleWeight</code>. Each of them possibly has the last


For this method, the returned sampleWeight is always null.

That is not always the case when we do Metrics.

I'm just thinking our documentation for this method might take into account that the returned sampleWeight is always null.

Now I see what you are talking about. I added a comment in the @return that sampleWeight will be null for this particular method signature.

deansher · 2020-10-09T22:08:36Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/impl/LossesImpl.java

+   *
+   * @param tf the TensorFlow Ops
+   * @param predictions Predicted values, a <code>Operand</code> of arbitrary dimensions.
+   * @param labels Optional label <code>Operand</code> whose dimensions match <code>prediction


Is "match" the right way to describe the precondition relationship between predictions and labels?

It is definitely not the same Shape. I was thinking of compatible, but that has specific meaning in Shape.isCompatibleWIth. The description is saying the ranks must be equal or differ by one. I am not sure of one word that describes that. match was the word used in the Python version of this method.

Hmm, here's a suggestion:

We could decide what we want the convention to be, in terms of squeeze-or-expand plus maybe broadcasting.

Write this up carefully in the class javadoc for either Loss or Losses.

Mention that documentation in the class javadoc for every other loss class.

Also mention it in Loss#call.

And be silent about it in the individual methods of Losses and LossesImpl.
Perhaps?

That said, it just occurred to me that we have another gap, and that filling that gap might help this issue.

We don't specify the behavior of these methods when labels and predictions don't have a permitted shape relationship. Nor do we make sure our behavior is consistent in that case.

Perhaps we should

spell out that there's an IllegalArgumentException for that in the statically-known-dimensions case,

rename squeezeOrExpandDimensions into something like validateAndAdjustLossDimensions,

have that method throw IllegalArgumentException when appropriate,

and then link to a fuller explanation in the documentation of the IllegalArgumentException?

Although I have never been in the habit of subclassing IllegalArgumentException, I see Oracle does that sometimes. That could be an alternative way of pointing people to the fuller explanation.

match must mean that the shapes of the input operands are capable of being molded into the relationships defined for the result of this method. Again LossesImpl is intended to be marked as module private (JDK 11) and only should be accessible from the losses or metrics package. It is not intended to be a general use API.

We should probably note in the javadoc for the class that this is an internal implementation class and subject to change (and being locked off under the module system).

Added this comment for the LossesImpl class

/** * These are helper methods for Losses and will be module private when * Java modularity is applied to TensorFlow Java. * These methods should not be used outside of the Loss package. */

deansher · 2020-10-09T22:11:07Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/impl/LossesImpl.java

+
+    if (labels != null) {
+      Shape labelsShape = labels.asOutput().shape();
+      long labelRank = labelsShape.numDimensions();


For consistency, labelsRank.

deansher · 2020-10-10T10:40:51Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/BinaryCrossentropy.java

+  /**
+   * Creates a Binary Crossentropy Loss using {@link Class#getSimpleName()} as the loss name, {@link
+   * #FROM_LOGITS_DEFAULT} for fromLogits, {@link #LABEL_SMOOTHING_DEFAULT} for labelSmoothing and a
+   * Loss Reduction of {@link * Reduction#AUTO}


Extraneous *

deansher · 2020-10-10T12:42:15Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/CategoricalCrossentropy.java

+  /**
+   * Creates a categorical cross entropy Loss using {@link Class#getSimpleName()} as the loss name,
+   * {@link #FROM_LOGITS_DEFAULT} for fromLogits, {@link #LABEL_SMOOTHING_DEFAULT} for
+   * labelSmoothing, a Loss Reduction of {@link * Reduction#AUTO}, and an axis of {@link


Extraneous *

Removed all Extraneous @link *

deansher · 2020-10-10T12:45:30Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Loss.java

+   * Creates a Loss using a Loss Reduction of {@link Reduction#AUTO}
+   *
+   * @param tf the TensorFlow Ops
+   * @param name the name of this Loss


. . . , or null to use {@link Class#getSimpleName()}

Why would someone want to pass null, when there are other CTORs that handle that condition?

For APIs that will get enough use to be worth some polish, I tend toward carefully documenting edge cases. I don't know whether we want to invest in that now.

I think it's worth documenting it in case users build their own losses.

OK, added this to name param, if null the name will be {@link Class#getSimpleName()}.

deansher · 2020-10-10T12:45:46Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Loss.java

+   * Creates a Loss
+   *
+   * @param tf the TensorFlow Ops
+   * @param name the name of this loss


. . . , or null to use {@link Class#getSimpleName()}

OK, added this to all name param, if null the name will be {@link Class#getSimpleName()}.

deansher · 2020-10-10T12:46:27Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Loss.java

+   *
+   * @param labels the truth values or labels
+   * @param predictions the predictions
+   * @param <T> The data type of the labels, predictions and loss.


Actually, there's a separate  for the labels.

deansher · 2020-10-10T17:57:21Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Losses.java

+    // Compute cross entropy from probabilities.
+    Operand<T> cce =
+        tf.reduceSum(
+            tf.math.mul(tLabels, tf.math.log(predictions)),


Although in this internal case of this method, we do broadcast. I'll stop commenting on this issue.

We can return to the "squeezeOrExpandDimensions followed by broadcasting" topic when I work on #130 .

Resolved.

deansher · 2020-10-10T18:03:36Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Losses.java

+   */
+  public static <T extends TNumber, U extends TNumber> Operand<T> meanAbsoluteError(
+      Ops tf, Operand<U> labels, Operand<T> predictions) {
+    Operand<T> tLabels = tf.dtypes.cast(labels, predictions.asOutput().dataType());


Do we want to avoid this cast in the case where labels already has the same data type?

I guess the question is what is the overhead of casting onto oneself vs the overhead of checking? I would hope that tf.dtypes.cast already handles this, but I could be mistaken.

The code for checking could be something like this:

@SuppressWarnings("unchecked") private static <T extends TNumber, U extends TNumber> Operand<T> castIfNecessary( Operand value, DataType<T> requiredType) { return (value.asOutput().dataType() == requiredType) ? (Operand<T>) value : tf.dtypes.cast(value, requiredType); }

So the overhead of checking would be the function call plus value.asOutput().dataType() == requiredType.

Looking at the code for tf.dtypes.cast, unless we think a cast is almost always needed, it would be cheaper to do the check to sometimes avoid it.

public Cast cast(Operand<T> x, DataType DstT, Cast.Options... options) { return Cast.create(scope, x, DstT, options); } @Endpoint(describeByClass = true) public static Cast create(Scope scope, Operand<T> x, DataType DstT, Options... options) { OperationBuilder opBuilder = scope.env().opBuilder("Cast", scope.makeOpName("Cast")); opBuilder.addInput(x.asOutput()); opBuilder = scope.applyControlDependencies(opBuilder); opBuilder.setAttr("DstT", DstT); if (options != null) { for (Options opts : options) { if (opts.Truncate != null) { opBuilder.setAttr("Truncate", opts.Truncate); } } } return new Cast(opBuilder.build()); }

In graph construction mode the overhead is probably irrelevant because it's only called once during construction. In eager mode it could be faster as it could sidestep a JNI call in each step, but I suspect we've got other issues to get speed in eager mode.

I like castIfNecessary as a general util method. It would be used almost everywhere, so it would be a huge change.
Perhaps create a new PR for castIfNecessary, then once that is merged we can start retrofitting all packages under framework.

In graph construction mode, an unnecessary call to cast creates an unnecessary graph operation.

shrug it'll be a no-op most of the time and compiled away if we get XLA working. Given the relative size of the computation around it I suspect it won't be an issue.

I also vote for a explicit check in the code to avoid adding an extra operation to the graph when it is not required

OK, I will add a helper class in org.tensorflow.framework.utils, then retrofit the Loss classes.

Just a comment on @deansher proposed method here, the datatypes for  and <T> should not be restricted to TNumber because it is valid to cast to/from TNumber and TBool .

deansher · 2020-10-10T18:06:57Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Losses.java

+  public static <T extends TNumber, U extends TNumber> Operand<T> meanAbsolutePercentageError(
+      Ops tf, Operand<U> labels, Operand<T> predictions) {
+    DataType<T> dataType = predictions.asOutput().dataType();
+    Operand<T> tLabels = tf.dtypes.cast(labels, predictions.asOutput().dataType());


Can just use dataType.

deansher · 2020-10-10T18:31:54Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Losses.java

+   * @param <T> the data type of the Operands
+   * @return the binary crossentropy loss.
+   */
+  private static <T extends TNumber> Operand<T> binaryCrossentropy(


I tripped over this private method having the usual naming of a loss method, since I didn't notice that it was private and so expected it to follow the conventions of public loss methods, such as invoking squeezeOrExpandDimensions. Also (if I'm navigating accurately through unfamiliar territory), this method doesn't compute a binaryCrossentropy since it depends on its caller to compute the mean at the end.

This method does the grunt work for the binaryCrossentropy after the operands have had their shapes and types manipulated and after smoothing the labels. Perhaps a new name would remove some of the confusion.

Yes, I wonder if we want to call it something like binaryCrossentropyHelper?

OK, Changed

deansher · 2020-10-10T18:43:05Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/CategoricalHinge.java

+import org.tensorflow.types.family.TNumber;
+
+/**
+ * Computes the categorical hinge loss between labels and predictions.


Should we follow the Python in documenting that labels are expected to be 0 or 1?

Yes, The Python CategporicalHinge class does not mention that at all, but it is mentioned in the categorical_hinge method.

I have added an entry to the class JavaDoc and to the Losses.categoricalHinge method.

Actually the values can be [-1, 0, 1]. [0,1] is converted to [-1,1]. I have added a value check to make sure the values are wholly contained in the allowed values set [-1, 0, 1]. This will either throw TFInvalidArgumentException if run in Graph mode via a control dependency, and throw IllegalArgumentException if created in Eager mode with the call method.

Cool -- Resolved.

What does it do if there are all three of [-1 0 1] present? That's probably an invalid input, does it throw?

deansher · 2020-10-11T13:50:39Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Losses.java

+  public static <T extends TNumber, U extends TNumber> Operand<T> meanSquaredLogarithmicError(
+      Ops tf, Operand<U> labels, Operand<T> predictions) {
+    DataType<T> dataType = predictions.asOutput().dataType();
+    Operand<T> tLabels = tf.dtypes.cast(labels, predictions.asOutput().dataType());


Could just use dataType. I'll stop mentioning this.

Fixed, hopefully I have fixed them all.

deansher · 2020-10-11T13:52:06Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Losses.java

+  }
+
+  /**
+   * Calculates the mean squared logarithmic percentage error between labels and predictions.


I think "percentage" is extraneous here.

deansher · 2020-10-11T14:09:55Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/impl/LossesImpl.java

+   *
+   * @param tf the TensorFlow Ops
+   * @param predictions Predicted values, a <code>Operand</code> of arbitrary dimensions.
+   * @param labels Optional label <code>Operand</code> whose dimensions match <code>prediction


That said, it just occurred to me that we have another gap, and that filling that gap might help this issue.

We don't specify the behavior of these methods when labels and predictions don't have a permitted shape relationship. Nor do we make sure our behavior is consistent in that case.

Perhaps we should

spell out that there's an IllegalArgumentException for that in the statically-known-dimensions case,

rename squeezeOrExpandDimensions into something like validateAndAdjustLossDimensions,

have that method throw IllegalArgumentException when appropriate,

and then link to a fuller explanation in the documentation of the IllegalArgumentException?

Although I have never been in the habit of subclassing IllegalArgumentException, I see Oracle does that sometimes. That could be an alternative way of pointing people to the fuller explanation.

deansher · 2020-10-11T14:42:56Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/impl/LossesImpl.java

+    // Use dynamic rank.
+
+    // TODO Operand<TInt32> rankDiff = tf.math.sub(tf.rank(predictions), tf.rank(labels));
+    if (predictionsRank == Shape.UNKNOWN_SIZE && Shape.isCompatible(predictionsShape.size(-1), 1)) {


If the rank is unknown, then the size of the last dimension is guaranteed to be unknown, so isCompatible is guaranteed true. (But there may be some idiomatic reason for writing it this way, of which I am blissfully unaware.)

Correct, it should have been or not and.

deansher · 2020-10-11T14:59:53Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/impl/LossesImpl.java

+    if (labels != null) {
+      Shape labelsShape = labels.asOutput().shape();
+      long labelRank = labelsShape.numDimensions();
+      if (labelRank != Shape.UNKNOWN_SIZE && predictionsRank != Shape.UNKNOWN_SIZE) {


I'm pretty sure this logic is wrong. Perhaps either

document preconditions of removeSqueezableDimensions and check exactly those,

or (my leaning) just invoke removeSqueezableDimensions and make it however smart it needs to be.

This logic is checking to see if both objects ranks are known (not Shape.unknown()). If both ranks are known, then it checks to see if the shapes are already in the right relationship or not. If not in the right relationship, then call removeSqueezableDimensions. It is basically an optimization to avoid doing the work in removeSqueezableDimensions if it does not need to be done.

deansher · 2020-10-11T17:12:44Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/impl/LossesImpl.java

+   */
+  private static <T extends TNumber> int[] allAxis(Operand<T> op) {
+    int rank = op.asOutput().shape().numDimensions();
+    int[] axes = new int[rank];


rank could be -1 at this point.

deansher · 2020-10-11T17:14:06Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/impl/LossesImpl.java

+   * @param <T> the type of Operand
+   * @return the integer array representing all the axes of the operand.
+   */
+  private static <T extends TNumber> int[] allAxis(Operand<T> op) {


Changed name to allAxes

deansher · 2020-10-11T17:32:41Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/impl/LossesImpl.java

+    }
+    Shape weightsShape = sampleWeight.asOutput().shape();
+    long weightsRank = weightsShape.numDimensions();
+    if (weightsRank == 0) { // scalar


What should happen if weightsRank is UNKNOWN?

It falls through and executes the last part of the method after the // Use dynamic rank. comment.

:-) Oh yeah, that.

deansher · 2020-10-11T17:35:45Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/impl/LossesImpl.java

+
+    if (predictionsRank != Shape.UNKNOWN_SIZE && weightsRank != Shape.UNKNOWN_SIZE) {
+
+      if (weightsRank - predictionsRank == 1) {


Here we are working with the original predictionsRank, when we wanted to be working with the new rank.

This matches the original Python code, but when you think about it, the predictions rank would never change from UNKNOWN to KNOWN and vice versa in a static context.

I was thinking perhaps predictions changed rank through our squeezing it to match labels earlier in the method. But I think there's a more pernicious problem. Here's an elided version of some of the method's code. Notice that we may squeeze predictions, but we only store the result in tuple. If we then also work with sampleWeight, we neither reference the squeezed version of predictions nor return it.

if (labels != null) { . . . if (predictionsRank - labelRank != 1 || predictionsShape.size(-1) == 1) { tuple = removeSqueezableDimensions(tf, labels, predictions); } } else { // use dynamic rank tuple = removeSqueezableDimensions(tf, labels, predictions); } } . . . if (predictionsRank != Shape.UNKNOWN_SIZE && weightsRank != Shape.UNKNOWN_SIZE) { if (weightsRank - predictionsRank == 1) { sampleWeight = tf.squeeze(sampleWeight); . . . } return new Tuple<>(labels, predictions, sampleWeight); }

Yes, we should probably fetch the labels and predictions from tuple first. I'll fix it.

deansher · 2020-10-11T17:37:25Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/impl/LossesImpl.java

+   *     Each of them possibly has the last dimension squeezed, <code>sampleWeight</code> could be
+   *     extended by one dimension. If <code>sampleWeight</code> is null, (prediction, label) is
+   *     returned.
+   */


This method has a myriad of complex cases, so I think it deserves its own direct unit test.

I could not find direct test cases for this method in Python. It's defined in tensorflow/tensorflow/python/ops/losses/utils.py. You want to take a stab at it?

:-) Totally. I want to do some work on #92 first, so I'll open an issue for myself.

…ictions and weights are returned in LossTuple

karllessard · 2020-10-14T00:49:13Z

Hi @JimClarke5 , what would be the best order for reviewing your PRs? You have this one, #123 and #106 that are still opened.

JimClarke5 · 2020-10-14T00:57:36Z

I would do activations #123 first.

JimClarke5 · 2020-10-21T23:40:44Z

@karllessard I have started working on Metrics which depends on Loss, and thusly this PR. I plan to do Metrics in two PRs, the first PR will focus on Metrics that depend on Losses functions. The second PR will focus on Metrics that don't depend on Losses functions, so these are more complicated to implement. My goal is to set up the scaffolding in the first Metric PR, then focus on the functionality required in the second PR. As to which existing PR to work on after activations #123 , may I suggest that this PR will need to be done before I create the first Metric PR.

Craigacp

In addition to the specific comments, I think it might be a good idea to add checks that the values are in the expected range (i.e. if it's expecting probabilities then it should check that they are in the range 0-1). Otherwise it's a right pain to track that down. Not sure if it will add too much overhead, but the loss computation tends to be much cheaper than the forward or backward passes, so hopefully it'll be fine.

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/CategoricalCrossentropy.java

Craigacp · 2020-10-24T16:49:17Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/CategoricalCrossentropy.java

+   *
+   * @param tf the TensorFlow Ops
+   * @param fromLogits Whether to interpret predictions as a tensor of logit values
+   * @param labelSmoothing Float in [0, 1]. When 0, no smoothing occurs. When > 0, we compute the


Does labelSmoothing = 1.0 mean the true label distribution is set to 1/n? I'm not sure what "squeezing the values towards 0.5" means, because it would only be 0.5 in a binary problem.

Actually this is the comment for BinaryCrossentropy. It should be:

Float in <code>[0, 1]</code>. When <code>> 0</code>, label values are smoothed, meaning the confidence on label values are relaxed. e.g. <code>label_smoothing=0.2<code> means that we will use a value of </code>0.1<code> for label </code>0<code> and </code>0.9<code> for label </code>1<code>

I'll fix it.

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/CategoricalHinge.java

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/LogCosh.java

Craigacp · 2020-10-24T16:59:23Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Loss.java

+   * Creates a Loss using a Loss Reduction of {@link Reduction#AUTO}
+   *
+   * @param tf the TensorFlow Ops
+   * @param name the name of this Loss


I think it's worth documenting it in case users build their own losses.

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Losses.java

...w-framework/src/main/java/org/tensorflow/framework/losses/SparseCategoricalCrossentropy.java

Craigacp · 2020-10-24T18:29:49Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/impl/LossesImpl.java

+   *
+   * @param tf the TensorFlow Ops
+   * @param predictions Predicted values, a <code>Operand</code> of arbitrary dimensions.
+   * @param labels Optional label <code>Operand</code> whose dimensions match <code>prediction


We should probably note in the javadoc for the class that this is an internal implementation class and subject to change (and being locked off under the module system).

...low-framework/src/test/java/org/tensorflow/framework/losses/CategoricalCrossentropyTest.java

JimClarke5 · 2020-10-24T21:41:01Z

@Craigacp we could set a range check, but it would have to be a control dependency e.g. using tf.assertThat

JimClarke5 · 2020-10-26T13:10:18Z

I tested the range check, but found out that Eager mode does not support Control Dependencies. I have modified my function to check if its Graph or Eager. Graph mode produces a Control Dependency, while Eager mode throws IllegalArgumentException. The new method is in LossesImpl.rangeCheck. I have to fix the rest of the test cases and see what other Classes besides xxxCrossentropy need this feature.

JimClarke5 · 2020-10-27T14:29:43Z

I have added 2 methods to LossesImpl, rangeCheck and valueCheck.
rangeCheck checks that all the values are within the provided min and max inclusive, eg [0. - 1.].
valueCheck will check that all the values are contained in a set of values (e.g. [-1, 0, 1]).

FIrst, control dependancies do not work in Eager mode. To handle this, I throw an IllegalArgumentException from the call method when in Eager mode if the check fails. In Graph mode, I set up a control dependency for the check, and it it will throw TFInvalidArgumentException if the check fails when the loss Operand is run.

One question, should these utilities be stored in a common location like framework.utils? I suspect that metrics will have the same checks, but metrics could easily still call the LossesImpl methods.

Add in rangeCheck and valueCheck Misc fixes based on review

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Losses.java

karllessard · 2020-10-29T01:51:22Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/impl/LossesImpl.java

+ * These are helper methods for Losses and will be module private when Java modularity is applied to
+ * TensorFlow Java. These methods should not be used outside of the Loss package.
+ */
+public class LossesImpl {


For me, a *Impl should be the implementation of an interface, this one looks more like a LossesHelper with all its static methods (and the class should probably be final).

I did not went through the whole thing but it looks like these helpers could also be moved directly to Loss as protected methods?

The split really comes for module visibility. Losses should be publicly accessible, while LossesImpl should be module private. Some LossesImpl methods may be used by metrics. Whether we call it LossesImpl of LossesHelper is a matter of preference. The current methods in LossesImpl should not be restricted to Loss classes as metrics classes may also make use of them, therefore protected is not the right semantic, .

It feels uncomfortable to me that we plan to use the LossesImpl methods from other parts of our framework while restricting them from public use. When a system's built-ins rely on privileged capabilities that aren't available to 3rd-party code, I think it is commonly a big problem for the system's extensibility. In this case, I do see room to argue that these methods aren't "capabilities", but are just "implementation" which can safely be hidden. But given that it is important to us to reuse them for our own metrics, I lean toward thinking of them as capabilities that we should expose.

There is a tight symmetry between Losses and Metrics as many (but not all) metrics rely on the methods in Losses.
Don't think other packages will have this close of a relationship.

Is there a potential use case justifying exposing these to the public? Seeing as they are utilities needed to implement Losses/Metrics.

Agree with a rename to LossesHelper or LossesUtility to differentiate from interface implementation, however.

KartikChugh · 2020-11-11T04:20:21Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Loss.java

+   *     predictions is scaled by the corresponding value of sample_weight. (Note on dN-1: all loss
+   *     functions reduce by 1 dimension, usually axis=-1.)
+   * @param <T> The data type of the predictions, sampleWeights and loss.
+   * @param <U> The data type of the labels.


Personally, I'd lean toward using some of our own single-letter conventions for situations that are common in our own code, including L as the labels type.

This may be hard to follow consistently once several letters have been used e.g. 'L' might be needed for something other than label type. Seems a tad more confusing than the standard type names

KartikChugh · 2020-11-11T04:31:31Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/impl/LossesImpl.java

+ * These are helper methods for Losses and will be module private when Java modularity is applied to
+ * TensorFlow Java. These methods should not be used outside of the Loss package.
+ */
+public class LossesImpl {


Is there a potential use case justifying exposing these to the public? Seeing as they are utilities needed to implement Losses/Metrics.

Agree with a rename to LossesHelper or LossesUtility to differentiate from interface implementation, however.

KartikChugh · 2020-11-11T04:32:02Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/impl/LossesImpl.java

+import org.tensorflow.types.family.TNumber;
+
+import java.util.Collections;
+


Class should have Javadoc description, no?

I have changed the class name to LossesHelper.

I don't understand your comment on Class JavaDoc. This is what I have in my copy.

/** * These are helper methods for Losses and Metrics and will be module private when Java modularity is applied to * TensorFlow Java. These methods should not be used outside of the losses and metrics packages. */

The basic comment was put in a while a ago, and I just updated it to mention metrics.

Rendering issue I think. Looks good, thanks.

karllessard · 2020-11-12T03:20:01Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Losses.java

+  //   * @param <T> The data type of the predictions, sampleWeights and loss.
+  //   * @param <U> The data type of the labels.
+  //   * @return the loss
+  //   *


Can we remove this commented-out documentation?

saudet · 2020-11-13T00:49:17Z

@karllessard Build failing after push.

That's just a transient network error:

2020-11-12T15:19:41.5511149Z [WARNING] Could not transfer metadata org.tensorflow:tensorflow-core-api:0.3.0-SNAPSHOT/maven-metadata.xml from/to ossrh-snapshots (https://oss.sonatype.org/content/repositories/snapshots): Transfer failed for https://oss.sonatype.org/content/repositories/snapshots/org/tensorflow/tensorflow-core-api/0.3.0-SNAPSHOT/maven-metadata.xml 502 Bad Gateway

karllessard · 2020-11-13T02:01:08Z

@saudet , @JimClarke5 , it might also be related that we are having trouble these days to build our artifacts as you can see here, the linux GPU build runs out of space and that prevent the last step to occur (i.e. the bulk deploy that normalize all snapshots timestamp which might explain why a few artifacts disappeared lately: #142).

I’ve retried many times but without success. Samuel, any idea how to solve this again?

saudet · 2020-11-13T02:09:56Z

I’ve retried many times but without success. Samuel, any idea how to solve this again?

It just looks like GitHub Actions is below its guaranteed 14 GB of disk space:

2020-11-12T04:02:15.8038646Z Filesystem      Size  Used Avail Use% Mounted on
2020-11-12T04:02:15.8039270Z overlay          84G   75G  8.9G  90% /
2020-11-12T04:02:15.8039572Z tmpfs            64M     0   64M   0% /dev
2020-11-12T04:02:15.8039893Z tmpfs           3.4G     0  3.4G   0% /sys/fs/cgroup
2020-11-12T04:02:15.8040222Z shm              64M     0   64M   0% /dev/shm
2020-11-12T04:02:15.8040517Z /dev/sdb1        84G   75G  8.9G  90% /__w
2020-11-12T04:02:15.8040859Z tmpfs           696M  768K  695M   1% /run/docker.sock
2020-11-12T04:02:15.8041228Z tmpfs           3.4G     0  3.4G   0% /proc/acpi
2020-11-12T04:02:15.8041550Z tmpfs           3.4G     0  3.4G   0% /proc/scsi
2020-11-12T04:02:15.8041886Z tmpfs           3.4G     0  3.4G   0% /sys/firmware

We'll have to wait until they fix that, again, I guess?

Craigacp · 2020-11-13T02:55:12Z

tensorflow-framework/src/main/java/org/tensorflow/framework/utils/CastHelper.java

+   * @return the value cast to the required data type.
+   */
+  @SuppressWarnings("unchecked")
+  public static <T extends TType, U extends TType> Operand<T> cast(


We should open an issue to track inserting these cast checks into the optimizers for uniformity.

I could do it in the #106 Learning Rate PR if that works.

No, let's not hold anything up for it, it's just something to clean up later.

Craigacp

A few more documentation things.

Craigacp · 2020-11-13T03:01:28Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/BinaryCrossentropy.java

+   * @param fromLogits Whether to interpret predictions as a tensor of logit values
+   * @param labelSmoothing A number in the range, [0, 1]. When 0, no smoothing occurs. When &gt; 0,
+   *     compute the loss between the predicted labels and a smoothed version of the true labels,
+   *     where the smoothing squeezes the labels towards 0.5. Larger values of label_smoothing


label_smoothing -> labelSmoothing, here and elsewhere in this file.

Craigacp · 2020-11-13T03:03:20Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/CategoricalCrossentropy.java

+   * @param fromLogits Whether to interpret predictions as a tensor of logit values
+   * @param labelSmoothing Float in [0, 1]. When 0, no smoothing occurs. When > 0, we compute the
+   *     loss between the predicted labels and a smoothed version of the true labels, where the
+   *     smoothing squeezes the labels towards 0.5. Larger values of label_smoothing correspond to


This one's still got the doc from BinaryCrossEntropy wrt label_smoothing. And it's snake_case.

Craigacp · 2020-11-13T03:03:30Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/CategoricalCrossentropy.java

+   * @param tf the TensorFlow Ops
+   * @param fromLogits Whether to interpret predictions as a tensor of logit values
+   * @param labelSmoothing Float in [0, 1]. When 0, no smoothing occurs. When > 0, we compute the
+   *     loss between the predicted labels and a smoothed version of the true labels, where the


Incorrect doc.

Craigacp · 2020-11-13T03:03:39Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/CategoricalCrossentropy.java

+   * @param fromLogits Whether to interpret predictions as a tensor of logit values
+   * @param labelSmoothing Float in [0, 1]. When 0, no smoothing occurs. When > 0, we compute the
+   *     loss between the predicted labels and a smoothed version of the true labels, where the
+   *     smoothing squeezes the labels towards 0.5. Larger values of label_smoothing correspond to


Incorrect doc.

Craigacp · 2020-11-13T03:08:00Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Losses.java

+   *
+   * <p>Note that it is a number between -1 and 1. When it is a negative number between -1 and 0, 0
+   * indicates orthogonality and values closer to -1 indicate greater similarity. The values closer
+   * to 1 indicate greater dissimilarity. This makes it usable as a loss function in a setting where


This javadoc is better, but I think it should mention that this function is inverted from the regular cosine similarity, as that's 1 when the values are most similar and -1 when they point in opposite directions. It makes sense that it is inverted because then you can minimise it sensibly, but it is confusing if you're just browsing through.

Craigacp · 2020-11-13T03:11:08Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Losses.java

+   * @param <T> the data type of the labels
+   * @return the smoothed binary labels
+   */
+  private static <T extends TNumber> Operand<T> smoothLabelsBinaryX(


I think this would be better called smoothBinaryLabels as it's not specific to the binary cross entropy as far as I can tell. But it's a private method so it's not too much of an issue.

Craigacp · 2020-11-13T03:11:26Z

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/Losses.java

+   * @param <T> the data type of the labels
+   * @return the smoothed categorical labels
+   */
+  private static <T extends TNumber> Operand<T> smoothLabelsCatX(


Similar comment to above, but smoothCategoricalLabels. Also I think the doc should explicitly state that it's smoothing the labels towards 1/n where n is the number of classes.

JimClarke5 · 2020-11-13T14:48:21Z

@Craigacp I have modified the JavaDoc for Losses.cosineSimilarity to insert this section in front to clarify the differences between the mathematical definition of cosine similarity and this loss function.

   * <p>Note that it is a number between <code>-1</code> and <code>1</code>, which is different from the mathematical definition of cosine
   * similarity where <code>1</code> represents similar vectors, and <code>0</code> represents dissimilar vectors.
   * In this function, the numbers are inverted in a range of <code>-1</code> to <code>1</code>.
....

smoothLabelsCatX to smoothCategoricalLabels. Added clarification oin JavaDoc for cosineSimilarity to describe the difference between the mathematical definition for cosine similarity and the loss definition.

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/CategoricalCrossentropy.java

fix typo error in JavaDoc comment

karllessard · 2020-11-16T22:15:39Z

@Craigacp , it looks like @JimClarke5 pushed all commits for the last changes you've requested, I let you validate and dismiss your review if that's the case, thanks

@JimClarke5 , some of your unit test files don't have a valid source header, can you please add one? Also, I've noticed that now you are authoring your work to Oracle, that is ok but I just want to validate with you if it was intentional, thanks

JimClarke5 · 2020-11-17T00:14:30Z

I have fixed the copyright issues. The attribution to Oracle was a mistake, as I copied the copyright from Optimizers to the Loss classes. I have replaced them with the TensorFlow Authors copyright.
@Craigacp did you intend this for Optimizers?

Craigacp · 2020-11-17T03:47:28Z

@JimClarke5 yes, I'm required to put the Oracle copyright header on substantive external open source contributions that are part of my job. There's an internal review process for all the things that I write that are longer than a line or two.

@karllessard I've resolved my two comments, I think this is good to be merged now. I can approve it if you want.

karllessard · 2020-11-17T12:09:36Z

All right, this one is merged now, thanks for great contribution again @JimClarke5 !

deansher · 2020-11-17T16:54:33Z

::party::

…

On Tue, Nov 17, 2020 at 7:09 AM Karl Lessard ***@***.***> wrote: All right, this one is merged now, thanks for great contribution again @JimClarke5 <https://github.com/JimClarke5> ! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#129 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABZ7X44NRW56MHAI4DLBCDSQJRZBANCNFSM4SJLM76A> .

JimClarke5 added 4 commits October 8, 2020 13:19

Merge pull request #3 from tensorflow/master

c57a2e7

Sync with master tensorflow on upstream

Initial checkin to rebase to Initialziers to pick up changes to ndarr…

9cc2675

…y Shape

Initial Checkin for losses

2508f5e

Fix reshape in sparseCategoricalCrossentropy()

17e96b5

deansher suggested changes Oct 9, 2020

View reviewed changes

deansher suggested changes Oct 10, 2020

View reviewed changes

JimClarke5 added 2 commits October 11, 2020 12:05

Apply various fixes to JavaDoc

ee1c48a

Change Tuple to LossTuple

287c96e

deansher suggested changes Oct 11, 2020

View reviewed changes

JimClarke5 added 5 commits October 11, 2020 15:29

Repair JavaDOx

642069c

Fixed AllAxis to hanlde dynamic shape when static shape rank is unknown.

249b651

change method name allAxis to allAxes

794cfdc

change private method binaryCrossentropy to binaryCrossentropyHelper

fb26c59

Fixed squeezeOrExpandDimensions to make sure the updated labels, pred…

928ef06

…ictions and weights are returned in LossTuple

Craigacp requested changes Oct 24, 2020

View reviewed changes

JimClarke5 added 2 commits October 27, 2020 12:24

Fix JavaDoc,

2bc54dd

Add in rangeCheck and valueCheck Misc fixes based on review

Fix unused imports and add @SuppressWarnings("unchecked") for casts.

951443b

karllessard reviewed Oct 29, 2020

View reviewed changes

JimClarke5 added 3 commits October 29, 2020 13:54

Add copyright

ebac9e8

Add CastHelper and used that for all casts

d8f3254

Fix JavaDoc, change snake case to camel case.

02573b5

KartikChugh reviewed Nov 11, 2020

View reviewed changes

Change class LossesImpl to LossesHelper

0bf49fe

karllessard reviewed Nov 12, 2020

View reviewed changes

KartikChugh previously approved these changes Nov 12, 2020

View reviewed changes

Craigacp reviewed Nov 13, 2020

View reviewed changes

Craigacp requested changes Nov 13, 2020

View reviewed changes

JimClarke5 added 4 commits November 13, 2020 09:56

Changed method name from smoothLabelsBinaryX to smoothBinaryLabels,

b211937

smoothLabelsCatX to smoothCategoricalLabels. Added clarification oin JavaDoc for cosineSimilarity to describe the difference between the mathematical definition for cosine similarity and the loss definition.

Fixed JavaDoc for labelSmoothing

3e0669e

Fixed JavaDoc to change label_smoothing to labelSmoothing.

914f16f

Fix formatting

7eefbb7

JimClarke5 dismissed stale reviews from KartikChugh and deansher via 7eefbb7 November 13, 2020 15:08

Craigacp requested changes Nov 13, 2020

View reviewed changes

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/CategoricalCrossentropy.java Outdated Show resolved Hide resolved

tensorflow-framework/src/main/java/org/tensorflow/framework/losses/CategoricalCrossentropy.java Outdated Show resolved Hide resolved

replace label_smoothing with labelSmoothing.

b87ad16

fix typo error in JavaDoc comment

JimClarke5 added 4 commits November 16, 2020 18:17

Add copyright to test cases

c43cd21

Fix copyright to attribute TensorFlow Authors.

4d9fd24

Fix typo on broadcast in JavaDoc

d56d8d9

Fix typo on broadcast in JavaDoc

744e324

Craigacp approved these changes Nov 17, 2020

View reviewed changes

Craigacp self-requested a review November 17, 2020 03:51

KartikChugh approved these changes Nov 17, 2020

View reviewed changes

karllessard merged commit 0a1a868 into tensorflow:master Nov 17, 2020

JimClarke5 deleted the Losses branch November 17, 2020 13:57

karllessard mentioned this pull request Nov 18, 2020

Fix Javadoc errors #152

Merged


		if (predictionsRank != Shape.UNKNOWN_SIZE && weightsRank != Shape.UNKNOWN_SIZE) {

		if (weightsRank - predictionsRank == 1) {

		import org.tensorflow.types.family.TNumber;

		import java.util.Collections;

Add Losses #129

Add Losses #129

Uh oh!

Conversation

JimClarke5 commented Oct 8, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deansher Oct 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deansher Oct 11, 2020 •

edited

Loading