TensorFlow Tools #5

karllessard · 2019-11-30T23:33:33Z

This PR merges the first draft of Tensor NIO into our master repository so that it can be tried out and tested by a larger audience. Here is a minimalist example of usage (more examples can be also found in the unit tests):

    IntNdArray heapData = NdArrays.vector(0, 1, 2, 3);

    // Creates a 2x2 matrix
    try (Tensor<TInt32> tensor = TInt32.tensorOfShape(2, 2)) {
      IntNdArray tensorData = tensor.data();

      // Copy first 2 values of the vector to the first row of the matrix
      tensorData.set(heapData.slice(Indices.range(0, 2)), 0);

      // Copy values at an odd position in the vector as the second row of the matrix
      tensorData.set(heapData.slice(Indices.odd()), 1);

      assertEquals(0, tensorData.getInt(0, 0));
      assertEquals(1, tensorData.getInt(0, 1));
      assertEquals(1, tensorData.getInt(1, 0));
      assertEquals(3, tensorData.getInt(1, 1));

      // Read rows of the tensor in reverse order
      IntNdArray reversedTensorData = tensorData.slice(Indices.all(), Indices.flip());

      assertEquals(1, reversedTensorData.getInt(0, 0));
      assertEquals(0, reversedTensorData.getInt(0, 1));
      assertEquals(3, reversedTensorData.getInt(1, 0));
      assertEquals(1, reversedTensorData.getInt(1, 1));

      try (EagerSession session = EagerSession.create()) {
        Ops tf = Ops.create(session);

        // Compute the power of the tensor by itself
        Constant<TInt32> x = tf.constant(tensor);
        IntNdArray result = tf.math.pow(x, x).tensorData();

        // Validate result by computing the same operation in Java
        tensorData.scalars().forEachIndexed((coords, s) ->
          assertEquals(Math.pow(s.getInt(), s.getInt()), result.getInt(coords), 1e-7f)
        );
      }
    }

Some additional notes:

The major breaking feature is that now, the Tensor class does not carry a Java boxed type, like Integer, but a custom tensor type, like TInt32. This allows us to support more types in Java while still guaranteeing compile-time safety check of operand type compatibility. A lot of the changes seen in this PR are related to this.
The c_api package has been moved to org.tensorflow.internal, next to the new package buffer that holds new Tensor buffer mapping utilities.
Right now, it looks like we do not control the order we process the @Operator annotation for generating *Ops classes. For this reason, methods are moved within the classes at each compilation, as you can observe in this PR.

saudet

Does that mean we can now rename getShort(), getInt(), setByte(), etc to just get() and set()? I think that would be a lot nicer.

saudet · 2019-12-04T01:22:34Z

tensorflow-core/tensorflow-core-api/pom.xml

+      <groupId>org.tensorflow</groupId>
+      <artifactId>nio-utils</artifactId>
+      <version>${project.version}</version>
+    </dependency>


That feels strange. Why do we need this dependency here? Is it just because that we can't have a separate module until we move away from the code generator in C++?

Tensor are now based on NdArray for mapping their memory and allowing direct access to it, so it must depend on nio-utils. I'm not sure to understand what is strange, you meant that memory mapping should occur outside the core API?

I think it's just the names that are confusing me. When I read something like tensorflow-core, I think like this means the actual "core" without dependencies on other modules. Maybe it's just that tensorflow-nio should really be named something else and not "TensorFlow"? That might also help with adoption from other projects. That's probably something we should continue on the mailing list you created about that and one of the first things to clear up with guys from MXNet, DL4J, etc.

Exactly, that's why I renamed the artifact tensorflow-nio-utils to nio-utils (though I'm not a huge fan of that name neither so if anyone comes up with a better one...). So the tensorflow you've noticed there is just to mention our organization in the group and does not name the artifact itself.

ndarray-core? I think nio-utils is likely to make people think of java.nio.* and as the language moves away from those classes (and they are 15+ years old at this point), calling them "new" is something of a misnomer.

I agree, we will group artifacts if there is a need but right now, I cannot think of any other utility library than this one.

I think there should be a functional need to group things together. If it's just to put them in abstract categories, it will make it hard to decide which category each module belongs to, possibly without any actual benefits. Maven doesn't even put the parent name in artifact coordinates, so unless we do it explicitly like with tensorflow-core -> tensorflow-core-api, etc then the benefits are even more marginal.

Ok, here is what I ended up to do: I preserved the tensorflow-utils name but now it contains the code of the library itself, including DataBuffer and NdArray APIs.

So if in the future it happens that we have more small utility classes like these that we would like to share with the world and that are independent from TF runtime, they will end up in this library as well (kind of a Guava for ML).

There is a lot of presumptions and questions that makes it hard to pick the right name for this library right away, so let's simply rename it in the future if we need to.

Sounds good, but let's name it tensorflow-util just to be consistent with the package name? Or inversely, let's name the package org.tensorflow.utils? I don't have preference for either, I just think that consistency is a good thing to have whenever it makes sense.

I think it is fine to have them mismatch here, tensorflow-utils sounds more natural for the name of library (as there is more than one utility class in it) while package names are often singular by convention (e.g. java.util.*).

In addition, we don't have a org.tensorflow.core package neither in our core artifacts.

saudet · 2019-12-04T01:24:00Z

tensorflow-core/tensorflow-core-api/pom.xml

@@ -150,6 +151,7 @@
            <compilerOption>${project.basedir}/src/main/native/server_jni.cc</compilerOption>
            <compilerOption>${project.basedir}/src/main/native/session_jni.cc</compilerOption>
            <compilerOption>${project.basedir}/src/main/native/tensorflow_jni.cc</compilerOption>
+            <compilerOption>${project.basedir}/src/main/native/tensor_buffers_jni.cc</compilerOption>


I thought everyone was OK with moving away from writing JNI manually. Why do we need this?

Because I need to create an instance of ByteBuffer out of a tensor native address and size in case Unsafe is not available, and the only way I know of is to do it from JNI. Do you have another way to do it?

This is going to do the same as the JNI code in that file:

TF_TensorData(nativeTensor).capacity(TF_TensorByteSize(nativeTensor)).asByteBuffer()

Along with a couple of null checks there too if needed.

Ok I'll give it a try, thanks

Great, that works, bye bye JNI...

saudet · 2019-12-04T01:34:01Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/types/family/TType.java

@@ -0,0 +1,3 @@
+package org.tensorflow.types.family;
+
+public interface TType {}


Why do we need empty interfaces like this? That's always a code smell to me... They probably have some functionality in common. BTW, we have default methods too starting with Java SE 8. Can we use those?

This is just to make sure that not all Java types can be passed as a generic type of a Tensor, like it is enforced here:

public static <T extends TType> Tensor<T> create(Object obj, DataType<T> dtype) {...}

It very soft compile-time type safety but better than nothing, I think.

In a future version of Java we will be able to seal this interface so it only has the implementations that we specify, which will make it more useful & reliable (as we'll be able to switch on the subclasses with exhaustiveness checking, and users won't be able to do their own implementations that our code can't work with).

I see, that sounds useful.

Question for you @Craigacp, will the sealed interface work in Java the same way as in Kotlin, meaning that will only interfaces from the same package (or the same file for Kotlin) be allowed to extend from it or will there be a different safety mechanism?

This is the current JEP for the sealed interfaces feature - https://openjdk.java.net/jeps/360. It requires them to be in the same package or module, and listed in the permits clause of the interface/class. If the subclasses are all in the same java file, then they don't have to be listed in the permits clause (but that's probably not how we'll do it).

saudet · 2019-12-04T01:40:07Z

tensorflow-utils/nio-utils/src/main/java/org/tensorflow/nio/buffer/DataBuffers.java

+      Class.forName("sun.misc.Unsafe");
+      unsafeAvailable = true;
+    } catch (ClassNotFoundException e) {
+      unsafeAvailable = false;


This is misleading. sun.misc.Unsafe is often available in other implementations of Java SE, but it doesn't always contain all the methods from OpenJDK. We need to make sure it has all the methods we want like this:
https://github.com/bytedeco/javacpp/blob/master/src/main/java/org/bytedeco/javacpp/indexer/UnsafeRaw.java

Thanks, I'll revise this

saudet · 2019-12-04T01:48:53Z

What's the copyright on the 1500x916.jpg file? I didn't see it mentioned anywhere, but that needs to be clear to put anything in a repository from Google.

karllessard · 2019-12-04T04:31:24Z

Does that mean we can now rename getShort(), getInt(), setByte(), etc to just get() and set()? I think that would be a lot nicer.

On the DataBuffer, they could but on the NdArray, they are already being used for getting/setting subelements of an array.

I was inspired by this: if you think that get()/set() are replaceable by brackets (like Kotlin does), for a given 2D matrix, it might be natural to retrieve its first row simply with matrix[0] instead of something like matrix.getElement(0). On the other hand, if get()/set() were used to return a value type, then only the right number of brackets that leads to a scalar would be allowed (matrix[0][0] in this case), which is a bit awkward.

karllessard · 2019-12-04T04:43:01Z

What's the copyright on the 1500x916.jpg file? I didn't see it mentioned anywhere, but that needs to be clear to put anything in a repository from Google.

Oops! I had a reminder to change this one for one of my own picture but clearly I forgot, thanks for pointing that out!

saudet · 2019-12-04T08:06:40Z

Does that mean we can now rename getShort(), getInt(), setByte(), etc to just get() and set()? I think that would be a lot nicer.

On the DataBuffer, they could but on the NdArray, they are already being used for getting/setting subelements of an array.

I was inspired by this: if you think that get()/set() are replaceable by brackets (like Kotlin does), for a given 2D matrix, it might be natural to retrieve its first row simply with matrix[0] instead of something like matrix.getElement(0). On the other hand, if get()/set() were used to return a value type, then only the right number of brackets that leads to a scalar would be allowed (matrix[0][0] in this case), which is a bit awkward.

Ah, so the get() methods basically return another array? That's interesting. That certainly makes it feel like NumPy, and that's pretty much how ND4J went on to do it too. It hasn't proven to be too confusing, so that sounds good. I think we should keep it like that then.

karllessard · 2019-12-04T13:33:05Z

Ah, so the get() methods basically return another array? That's interesting. That certainly makes it feel like NumPy, and that's pretty much how ND4J went on to do it too. It hasn't proven to be too confusing, so that sounds good. I think we should keep it like that then.

Ok. Like I said, we could rename those in *DataBuffer though but I'm tempted to leave them like that as well just to be consistent with *NdArray

karllessard · 2019-12-04T13:45:16Z

BTW group, I was thinking to push a new version that renames the following methods (and their siblings):

TInt32.scalar(10) to TInt32.scalarOf(10)
TInt32.vector(1, 2, 3, 4) to TInt32.vectorOf(1, 2, 3, 4)
TInt32.tensorOfShape(2, 2) to TInt32.ofShape(2, 2)
Operand.tensorData() to Operand.data()
shape(2, 2) to shapeOf(2, 2)

Do you have any preference?

saudet · 2019-12-05T01:29:01Z

@karllessard No preferences, but we should try to go with unwritten conventions people are used to:
https://blog.joda.org/2011/08/common-java-method-names.html

Craigacp · 2019-12-05T01:42:56Z

I'm fine with those renamings, they seem fairly idiomatic.

karllessard · 2019-12-07T20:17:57Z

Just to let you know that I've just pushed a new version of the library with all your recommendations.

saudet · 2019-12-08T06:04:57Z

Yes, so what about org.tensorflow.ndarray? That would make it more consistent in a way.

Craigacp

Other general points:

Should a scalar ndarray have a tag interface?
Might want to use the code gen pattern from Java Vector API (https://github.com/openjdk/panama/blob/vectorIntrinsics/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/gen-src.sh) to emit the buffers/ndarrays of a particular kind (raw, jdk, etc) from a single source file. Reduces the amount of things that can go wrong in a refactor.
Can we get remove UnsafeReference from the public API? I'm uncomfortable with public methods that need it as an argument.
Is there a relationship between buffer.Validator and ndarray.Validator? They feel like they have a lot of duplicated code.
Needs more javadoc (in the impl classes mainly). I understand how irritating it is to say that too, trust me, but bringing in new people to work in this codebase will be easier with it.

Craigacp · 2019-12-08T19:10:44Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Tensor.java

+import org.tensorflow.types.TInt64;
+import org.tensorflow.types.TString;
+import org.tensorflow.types.TUInt8;
+import org.tensorflow.types.family.TType;

 /**
 * A statically typed multi-dimensional array whose elements are of a type described by T.


Can we more strongly type Tensor to Tensor<T extends TType>?

That is what I tried at first but to discover than this enforcement became quickly viral and that it needs to be everywhere in our methods, classes, wrappers, frameworks and, ultimately, user code.

But since TType does not expose any method and since we enforce it at the beginning of the chain (i.e. at the creation of the Tensor), I realized that there is no real gain carrying it afterward. We just want to make sure that tensors are created using one of our supported data types.

I think our users will appreciate that we relax a bit this rule.

The advantage of doing it is that the erased type bound of Tensor becomes Tensor<TType> which makes it tricker for people to skirt the generics to do things that they shouldn't be doing. I agree it seems like it's everywhere in the codebase, and it does ripple a little into user code, but there is some benefit to it.

I'm aware of the benefits of enforcing these safety checks, I've always been a defender of it, but I think we can make an exception here. I don't care that much of the verbosity in our code base but I'm concerned about the user code.

I guess writing down a quick example may help us to take the right decision. Meanwhile, I'm still not sure to understand what wrong can happen if we leave the generic type parameter unbounded after creating the tensor, other than not being a good practice. Do you have a specific scenario in mind?

Craigacp · 2019-12-08T19:11:55Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Tensor.java

    t.buffer().put(data);
    return t;
  }

+  public static <T extends TType> Tensor<T> allocate(DataType<T> dtype, Shape shape) {


Needs some javadoc. Looks like this makes an empty buffer, but what does that mean for types like String? Are they populated by the null reference or the empty string etc?

I have to admit that the need to add a proper Javadoc to all endpoints affected by this PR is on my TODO list. I was waiting for some people to try it our first but then the SIG decided we should better merge it and fix it after. Are you still OK with this?

Yeah, I'm fine with you merging it without the extra javadoc. We need to make sure it's there before there is a release with this code in though, so maybe file an issue for it?

good idea, I'll create an issue right after merging

Craigacp · 2019-12-08T19:13:16Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/Tensor.java

@@ -287,21 +287,20 @@
  // Helper function to allocate a Tensor for the create() methods that create a Tensor from
  // a java.nio.Buffer.
  // Requires: dataType matches T
-  private static <T> Tensor<T> allocateForBuffer(DataType dataType, long[] shape, int nBuffered) {
-    final int nflattened = numElements(shape);
+  private static <T extends TType> Tensor<T> allocateForBuffer(DataType dataType, long[] dimSizes, int nBuffered) {


Can we type DataType with <T> here? And in a bunch of other places. Or alternatively <? extends TType>.

It should definitely extend <T>, those are left overs from previous implementation, I'm surprised my IDE is not giving me any warning about it... I'll fix it, thanks!

Craigacp · 2019-12-08T19:14:07Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/DataType.java

-  DataType(int value, int byteSize) {
-    this.value = value;
-    this.byteSize = byteSize;
+  public static <T> DataType<T> create(String name, int value, int byteSize, TensorMapper<T> tensorMapper) {


Stronger type? <T extends TType>.

Mmh, I don't remember if that one was as hemorrhagic as with Tensor, I'll give it a try and let you know.

Craigacp · 2019-12-08T19:14:47Z

tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/DataType.java

+     * @param shape the shape of the tensor
+     * @return data structure of elements of this type
+     */
+    T apply(TF_Tensor nativeTensor, Shape shape);


This exposes the JavaCPP TF_Tensor type into the public API. Can we fix that?

It is not "really" exposed. This interface is only used internally but it has to be public unfortunately because it is shared between packages.

What I can do is to move it under org.tensorflow.internal.buffer so at least it is known to be internal. And since it does not return a TF_Tensor but accepts one, there is no major "leak" neither.

It's times like this that I understand why C++ has friend classes.

Craigacp · 2019-12-08T19:32:55Z

tensorflow-utils/src/main/java/org/tensorflow/util/ndarray/index/From.java

+    this.start = start;
+  }
+
+  private long start;


These fields should be final, it signals to the JIT that it can make more optimisations (though I dunno if it actually will do).

Craigacp · 2019-12-08T19:34:35Z

tensorflow-utils/src/main/java/org/tensorflow/util/ndarray/index/Index.java

+ * that select which elements on a given dimension should be included/excluded
+ * from that view.
+ */
+public interface Index {


Is there a way to make this interface functional or more constant? I'm a little worried that the JVM won't be able to optimise all these small objects properly (until they become value types). Maybe we could make them ValueBasedClasses? The indices provide conceptually similar functionality to VarHandle from Java 9, which Paul wrote, so maybe his input would be useful on these classes.

Ok, let's do this as a follow up.

Craigacp · 2019-12-08T19:35:18Z

tensorflow-utils/src/main/java/org/tensorflow/util/ndarray/impl/dimension/UnknownDimension.java

+
+  @Override
+  public boolean equals(Object obj) {
+    return false; // All unknown dimensions are distinct


Shouldn't they have reference equality? The hashcode does.

This class is a remnant of a previous implementation and is not used anymore, I'll remove it

Craigacp · 2019-12-08T19:37:03Z

...orflow-utils/src/main/java/org/tensorflow/util/ndarray/impl/dimension/AbstractDimension.java

+   */
+  @Override
+  public int hashCode() {
+    return (int) numElements();


If we care about hashcode for this we should apply some mixing function first. Integer's hashcode is unfortunately terrible but I think it's dictated by the spec.

Totally agree, I'll fix it

Craigacp · 2019-12-08T19:38:47Z

tensorflow-utils/src/main/java/org/tensorflow/util/ndarray/impl/dense/DoubleDenseNdArray.java

+
+  @Override
+  public double getDouble(long... indices) {
+    return buffer().getDouble(positionOf(indices, true));


Some of these methods go through buffer(), some access the buffer field directly. It should pick one way, or document that they behave differently (as DoubleDenseNdArray isn't final, so could be subclassed).

buffer() is only there to let the superclass access the buffer stored in the subclass. Again, this has changed so many times... I'll sanitize it, thanks!

karllessard · 2019-12-09T01:30:18Z

Yes, so what about org.tensorflow.ndarray? That would make it more consistent in a way.

@saudet , I still think that the actual naming is better. Right now, the package tree looks like this:

- org.tensorflow.util
  |_ ndarray
  |_ buffer

It make sense because the buffer API can actually be used with no ndarrays. Also, I have moved the Shape class directly under org.tensorflow.util since it it is also found in other contexts than ndarrays (unmapped tensors, ops operands, etc.)

Also, just to clarify my previous post, I brought the counterexample of tensorflow-core-api to point out that we already don't enforce too much strictness between the name of the artifact and its packages. But I do personally prefer that each artifact have at least their discriminative base package, hence org.tensorflow.util in this case.

karllessard · 2019-12-09T04:47:26Z

Other general points:

Should a scalar ndarray have a tag interface?

@Craigacp : might be hard to enforce in all cases. For example, for a 2x2 matrix, both matrix.get(0) and matrix.get(0, 0) returns an NdArray but only the second will be a scalar. Do you see an actual benefit for having it?

Might want to use the code gen pattern from Java Vector API (https://github.com/openjdk/panama/blob/vectorIntrinsics/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/gen-src.sh) to emit the buffers/ndarrays of a particular kind (raw, jdk, etc) from a single source file. Reduces the amount of things that can go wrong in a refactor.

Sounds interesting, I'll take a look, thanks

Can we get remove UnsafeReference from the public API? I'm uncomfortable with public methods that need it as an argument.

UnsafeReference is not Unsafe but just a validator/wrapper to it. We are not exposing a free access to Unsafe as it needs to be provided by the caller itself. In addition, those methods are not exposed by the "public API", composed of the public interfaces and their static helpers, but found in the internal *.impl packages, for what it's worth.

Is there a relationship between buffer.Validator and ndarray.Validator? They feel like they have a lot of duplicated code.

I guess the difference can be subtle, I will check

Needs more javadoc (in the impl classes mainly). I understand how irritating it is to say that too, trust me, but bringing in new people to work in this codebase will be easier with it.

Yes, like I said in a previous comment, I postponed that task for later as this work is still known as "in progress" but I agree it should become a priority.

Craigacp · 2019-12-09T05:05:03Z

Other general points:

Should a scalar ndarray have a tag interface?

@Craigacp : might be hard to enforce in all cases. For example, for a 2x2 matrix, both matrix.get(0) and matrix.get(0, 0) returns an NdArray but only the second will be a scalar. Do you see an actual benefit for having it?

Ah, yeah I'd forgotten that the accessor took a varargs so you couldn't have a sharper return type. Not that the type system is quite powerful enough to do the best thing there anyway. As scalars don't have dimension then I feel like that should be represented in the type system so you can specialise when being passed a scalar (i.e. immediately unboxing it), but it's hard to get methods that emit the right types given the way the library is setup.

Can we get remove UnsafeReference from the public API? I'm uncomfortable with public methods that need it as an argument.

UnsafeReference is not Unsafe but just a validator/wrapper to it. We are not exposing a free access to Unsafe as it needs to be provided by the caller itself. In addition, those methods are not exposed by the "public API", composed of the public interfaces and their static helpers, but found in the internal *.impl packages, for what it's worth.

I understand, but anything that's public is part of the public API as far as users are concerned. I've written impl classes with big warnings on them, but unless you can lock them off with a module system or another access control people will still see them in the javadoc and use them for things you weren't expecting.

saudet · 2019-12-09T06:15:41Z

Yes, so what about org.tensorflow.ndarray? That would make it more consistent in a way.

@saudet , I still think that the actual naming is better. Right now, the package tree looks like this:
- org.tensorflow.util
  |_ ndarray
  |_ buffer
It make sense because the buffer API can actually be used with no ndarrays. Also, I have moved the Shape class directly under org.tensorflow.util since it it is also found in other contexts than ndarrays (unmapped tensors, ops operands, etc.)

Also, just to clarify my previous post, I brought the counterexample of tensorflow-core-api to point out that we already don't enforce too much strictness between the name of the artifact and its packages. But I do personally prefer that each artifact have at least their discriminative base package, hence org.tensorflow.util in this case.

I think the extra package is fine (it allows to split the module into tensorflow-util-ndarray, tensorflow-util-buffer, etc more easily if it makes sense, which we could do right away FWIW), it's just the naming inconsistency that bothers me. If we search for "util" and "utils" with https://search.maven.org, we actually get more hits for "util"! Although we get a lot more hits for "tools" than "tool", so to follow conventions, maybe that's the term you would be looking for? How does org.tensorflow.tools sound to you as package name?

karllessard · 2019-12-09T18:33:21Z

How does org.tensorflow.tools sound to you as package name?

Yes, I like tools as an artifact and package name, better than utils/util. I'll make the change.

...and we would like to thank all our supporters, friends and family :)

karllessard · 2019-12-10T14:25:38Z

but it's hard to get methods that emit the right types given the way the library is setup.

It would be possible to add a different method (something like ndarray.scalar(long...)) to return a Scalar type and also make sure that the number of indices provides is enough to reach one. This won't break the contract with the actual API. I suggest to do this though in a second phase, if we identify a real need for having it.

I understand, but anything that's public is part of the public API as far as users are concerned. I've written impl classes with big warnings on them, but unless you can lock them off with a module system or another access control people will still see them in the javadoc and use them for things you weren't expecting.

I can shuffle things a bit to mark these methods as protected instead, would that work for you? With this and the actual protection that forces the caller to do the dirty reflective job to retrieve the instance of Unsafe, I think we can consider that our library has no major security breach, what do you think?

Craigacp · 2019-12-10T15:54:41Z

but it's hard to get methods that emit the right types given the way the library is setup.

It would be possible to add a different method (something like ndarray.scalar(long...)) to return a Scalar type and also make sure that the number of indices provides is enough to reach one. This won't break the contract with the actual API. I suggest to do this though in a second phase, if we identify a real need for having it.

Sure

I understand, but anything that's public is part of the public API as far as users are concerned. I've written impl classes with big warnings on them, but unless you can lock them off with a module system or another access control people will still see them in the javadoc and use them for things you weren't expecting.

I can shuffle things a bit to mark these methods as protected instead, would that work for you? With this and the actual protection that forces the caller to do the dirty reflective job to retrieve the instance of Unsafe, I think we can consider that our library has no major security breach, what do you think?

Protected would work. I don’t consider this a security thing, more an api cleanliness one. We don’t want people to start incorporating UnsafeReference as an field or argument for their classes as then we can’t remove it without it being a breaking change, and it’s really an internal implementation detail.

Craigacp · 2019-12-11T17:02:17Z

Protected would work. I don’t consider this a security thing, more an api cleanliness one. We don’t want people to start incorporating UnsafeReference as an field or argument for their classes as then we can’t remove it without it being a breaking change, and it’s really an internal implementation detail.

We need something to let users implement those interfaces for libraries other than TensorFlow. It's either that, JavaCPP's Pointer class, or a couple of long values for pointer addresses, which are just as unsafe. Which one do you prefer that we "leak"? I think the latter is the least worst option, but Karl doesn't like it.

If you're in a subclassing relationship then you could see it if it's protected. As you can't have protected interface members, then it might need to become an abstract class (which might prove tricky), but at that point you could hide it effectively in the Raw classes.

karllessard · 2019-12-11T21:36:52Z

but at that point you could hide it effectively in the Raw classes.

I'm wondering, since I accept the Unsafe instance as an Object first (which I validate then), maybe having a public endpoint accepting the Object could be enough, i.e. something like RawMemoryMapper.create(Object) where the object passed must be an instance of sun.misc.Unsafe. Then the RawMemoryMapper will have endpoints to map memory segments from their address. So there won't be any UnsafeReference anymore, as the Unsafe instance would be kept in the RawMemoryMapper.

saudet · 2019-12-12T01:51:44Z

but at that point you could hide it effectively in the Raw classes.

I'm wondering, since I accept the Unsafe instance as an Object first (which I validate then), maybe having a public endpoint accepting the Object could be enough, i.e. something like RawMemoryMapper.create(Object) where the object passed must be an instance of sun.misc.Unsafe. Then the RawMemoryMapper will have endpoints to map memory segments from their address. So there won't be any UnsafeReference anymore, as the Unsafe instance would be kept in the RawMemoryMapper.

Actually, you're already passing the pointer address to your objects there, so I don't understand why we need to expose Unsafe at all...? If it's available, we use it, if it's not available, we just use something else like ByteBuffer or whatever OpenJDK comes up with eventually. Users don't need to know what we're doing to access their data, do they? We could expose an Enum or something to give them a choice of which implementation gets used, and I think that's enough. What other use cases are you thinking about?

saudet · 2019-12-12T23:02:28Z

but at that point you could hide it effectively in the Raw classes.

I'm wondering, since I accept the Unsafe instance as an Object first (which I validate then), maybe having a public endpoint accepting the Object could be enough, i.e. something like RawMemoryMapper.create(Object) where the object passed must be an instance of sun.misc.Unsafe. Then the RawMemoryMapper will have endpoints to map memory segments from their address. So there won't be any UnsafeReference anymore, as the Unsafe instance would be kept in the RawMemoryMapper.

Actually, you're already passing the pointer address to your objects there, so I don't understand why we need to expose Unsafe at all...? If it's available, we use it, if it's not available, we just use something else like ByteBuffer or whatever OpenJDK comes up with eventually. Users don't need to know what we're doing to access their data, do they? We could expose an Enum or something to give them a choice of which implementation gets used, and I think that's enough. What other use cases are you thinking about?

Ah, I remember what you said. You're basically considering the Unsafe object like we consider ByteBuffer to be "safe", so we'd be adding a new backend for others like MemorySegment and what not. Fair enough. It's more work for people to implement, so I suspect not everyone is going to be alright with that, but we can always add an "unsafe" backend at some point if this library becomes popular and everyone gets tired of copy/pasting the same thing around.

Craigacp · 2019-12-12T23:09:46Z

Well it's only the DataBuffer classes that need to be implemented separately for things like MemorySegment, the NdArray classes on top of them should be agnostic. So it's not that much of an implementation burden right?

The MemorySegment API was merged into 14's tree today before the feature freeze, so it should be in the 14 release in March.

karllessard · 2019-12-13T03:01:15Z

The idea of passing the Unsafe object in parameter is also to avoid having a public endpoint which is simply DataBuffer mapMe(long address, long length). While it is tempting on the API and usability level, it is completely unsafe since it allows to pass any arbitrary number as an address, which can screw up the whole JVM. The problem though is that we do need this endpoint for mapping tensor buffers from tensorflow-core-api. So by forcing the caller pass Unsafe himself at the initial memory mapping, we prevent that users "just wandering around" starts to play with this functionality too easily, while still allowing entitled libraries to use it by adding the required piping, which is still minimal.

We all know that there is no perfect solution here neither. I talked with Maurizio from Panama and on their side, they focus to enforce the safety of native memory mapping from the MemoryAddress itself. That will be interesting when it will come. We could try to do it the same way with JavaCPP pointers but that will require us to create another artifact like tensorflow-tools-javacpp. Or tensor memory could also be mapped directly in tensorflow-core-api, as it was initially. This is the safest way to achieve this as we can guarantee that the address being mapped is one of a tensor, but then we lose completely reusability and we end up reimplementing all raw buffers in other libraries. Like @Craigacp pointed out, it might not be such a big deal, but it is not necessarily what we want neither.

Now I'm experimenting something new: having a non-final RawDataBufferFactory, located in the raw buffers implementation package, that exposes public endpoints for creating raw (unsafe) buffers from arrays but keep protected those for mapping native memory. Then a library can simply extend from this factory and expose/use the "hidden" endpoints in the safest way possible, according to its needs. Since we won't have public mapMe(address, length) endpoints anymore, it is now OK to retrieve the Unsafe instance directly from this factory and relieve other libraries from this responsibility/burden. Of course, anyone can again simply extend from this class and start mapping native memory with arbitrary addresses but by locating the factory under a impl package and by hidding the native memory mapping endpoint from the JavaDoc (since they are protected), we might obfuscate it enough for wandering users.

saudet · 2019-12-13T08:11:00Z

That sounds really nice, but you guys have to realize that all this running around in circles prevents end users from actually using those tools for any libraries not explicitly supported, including all those distributed with the JavaCPP Presets:
https://github.com/bytedeco/javacpp-presets/

One of the reasons why NumPy works so well is that it facilitates data sharing between libraries, but this implementation of ndarray here doesn't...If I'm missing something though, please enlighten me.

karllessard · 2019-12-13T13:44:35Z

That sounds really nice, but you guys have to realize that all this running around in circles prevents end users from actually using those tools for any libraries not explicitly supported, including all those distributed with the JavaCPP Presets:
https://github.com/bytedeco/javacpp-presets/

One of the reasons why NumPy works so well is that it facilitates data sharing between libraries, but this implementation of ndarray here doesn't...If I'm missing something though, please enlighten me.

It does not prevent any of this, libraries just need to extend from this factory class and make use of the mapNative endpoints the way they want. Here is how TF is doing it:

class TensorRawDataBufferFactory extends RawDataBufferFactory {

  static ByteDataBuffer mapTensorToBytes(Pointer tensorMemory) {
    return mapNativeBytes(tensorMemory.address(), tensorMemory.capacity(), false);
  }

  static IntDataBuffer mapTensorToInts(Pointer tensorMemory) {
    return mapNativeInts(tensorMemory.address(), tensorMemory.capacity(), false);
  }

  static LongDataBuffer mapTensorToLongs(Pointer tensorMemory) {
    return mapNativeLongs(tensorMemory.address(), tensorMemory.capacity(), false);
  }

  static FloatDataBuffer mapTensorToFloats(Pointer tensorMemory) {
    return mapNativeFloats(tensorMemory.address(), tensorMemory.capacity(), false);
  }

  static DoubleDataBuffer mapTensorToDoubles(Pointer tensorMemory) {
    return mapNativeDoubles(tensorMemory.address(), tensorMemory.capacity(), false);
  }

  static StringTensorBuffer mapTensorToStrings(Pointer tensorMemory, long numElements) {
    long offsetByteSize = numElements * Long.BYTES;
    LongDataBuffer offsets = mapNativeLongs(tensorMemory.address(), numElements, false);
    ByteDataBuffer data = mapNativeBytes(
        tensorMemory.address() + offsetByteSize,
        tensorMemory.capacity() - offsetByteSize,
        false);
    return new StringTensorBuffer(offsets, data);
  }
}

BTW, I'm not sure why you are referring to NumPy in this case, the actual discussion is about how native memory should be mapped initially from Java and has no impact on how data can be shared thereafter or how we can manipulate it.

Craigacp · 2019-12-13T16:41:14Z

That sounds really nice, but you guys have to realize that all this running around in circles prevents end users from actually using those tools for any libraries not explicitly supported, including all those distributed with the JavaCPP Presets:
https://github.com/bytedeco/javacpp-presets/

The aim is to make a Java type for ndarrays, not to wrap every C library's version of an ndarray into the same interface. Supporting every C library's view of memory is likely to cause issues (e.g. at the moment it doesn't support different storage orders).

One of the reasons why NumPy works so well is that it facilitates data sharing between libraries, but this implementation of ndarray here doesn't...If I'm missing something though, please enlighten me.

All the Python libraries use NumPy arrays, or implement the same interface (via duck typing). So the libraries are built on numpy and opted into it's representation. This API will also require opting into it's representation by implementing it's interfaces if you want to store memory in a different way to how the current implementation does (which should at most be the buffer interface) so I'm not sure what the difference is?

karllessard · 2019-12-13T17:42:41Z

All the Python libraries use NumPy arrays, or implement the same interface (via duck typing). So the libraries are built on numpy and opted into it's representation. This API will also require opting into it's representation by implementing it's interfaces if you want to store memory in a different way to how the current implementation does (which should at most be the buffer interface) so I'm not sure what the difference is?

Ok, I might have misunderstood the original question then. FYI, it is also possible to change the layout of your data in memory just by providing a DataAdapter, without the need of reimplementing the whole DataBuffer interface (and by the way, I should probably rename that interface to DataLayout).

Craigacp · 2019-12-13T18:49:49Z

All the Python libraries use NumPy arrays, or implement the same interface (via duck typing). So the libraries are built on numpy and opted into it's representation. This API will also require opting into it's representation by implementing it's interfaces if you want to store memory in a different way to how the current implementation does (which should at most be the buffer interface) so I'm not sure what the difference is?

Ok, I might have misunderstood the original question then. FYI, it is also possible to change the layout of your data in memory just by providing a DataAdapter, without the need of reimplementing the whole DataBuffer interface (and by the way, I should probably rename that interface to DataLayout).

But for things like GPU memory vs CPU memory (where you need to apply a native hook to get the pointer into the right address space) it's probably best that they implement a new DataBuffer.

saudet · 2019-12-15T03:08:15Z

That sounds really nice, but you guys have to realize that all this running around in circles prevents end users from actually using those tools for any libraries not explicitly supported, including all those distributed with the JavaCPP Presets:
https://github.com/bytedeco/javacpp-presets/

The aim is to make a Java type for ndarrays, not to wrap every C library's version of an ndarray into the same interface. Supporting every C library's view of memory is likely to cause issues (e.g. at the moment it doesn't support different storage orders).

One of the reasons why NumPy works so well is that it facilitates data sharing between libraries, but this implementation of ndarray here doesn't...If I'm missing something though, please enlighten me.

All the Python libraries use NumPy arrays, or implement the same interface (via duck typing). So the libraries are built on numpy and opted into it's representation. This API will also require opting into it's representation by implementing it's interfaces if you want to store memory in a different way to how the current implementation does (which should at most be the buffer interface) so I'm not sure what the difference is?

No, that's not true. It's very easy to create a NumPy ndarray on top of any buffer:
https://jakevdp.github.io/blog/2014/05/05/introduction-to-the-python-buffer-protocol/
I just wanted to point out the inconsistency in your goals, not criticizing. You won't be able to eat your cake and have it too!

Craigacp · 2019-12-17T17:58:07Z

That sounds really nice, but you guys have to realize that all this running around in circles prevents end users from actually using those tools for any libraries not explicitly supported, including all those distributed with the JavaCPP Presets:
https://github.com/bytedeco/javacpp-presets/

The aim is to make a Java type for ndarrays, not to wrap every C library's version of an ndarray into the same interface. Supporting every C library's view of memory is likely to cause issues (e.g. at the moment it doesn't support different storage orders).

One of the reasons why NumPy works so well is that it facilitates data sharing between libraries, but this implementation of ndarray here doesn't...If I'm missing something though, please enlighten me.

All the Python libraries use NumPy arrays, or implement the same interface (via duck typing). So the libraries are built on numpy and opted into it's representation. This API will also require opting into it's representation by implementing it's interfaces if you want to store memory in a different way to how the current implementation does (which should at most be the buffer interface) so I'm not sure what the difference is?

No, that's not true. It's very easy to create a NumPy ndarray on top of any buffer:
https://jakevdp.github.io/blog/2014/05/05/introduction-to-the-python-buffer-protocol/
I just wanted to point out the inconsistency in your goals, not criticizing. You won't be able to eat your cake and have it too!

But the equivalent of the buffer protocol is what this PR provides? Java doesn't have a low level buffer which has multidimensional indexing, that's why Karl wrote one. It's as much effort to implement the Python buffer as it is to implement this one so I'm not sure how it's easier in Python (beside the fact that everyone has already written the code in Python, and we're just starting to write it here).

karllessard · 2019-12-18T04:33:30Z

FYI, I just pushed a new version with all requested changes except the bounded types in Tensor and DataType generics as suggested by @Craigacp. I want to write down a few examples first to compare the complexity/verbosity impacts on the user code. But everything that we agreed else should be in there.

For your concerns @saudet , I'm still not sure what TF tools cannot handle that you are proposing; with the buffer and data layout interfaces, you can customize pretty much the backing store of an ndarray the way you like. But again it might be my misunderstanding, so I suggest that once the code is merged, we can try to map some buffers of the C libraries you were referring to and then it might become more clear if something is missing and what it is.

karllessard · 2019-12-18T15:18:27Z

FYI, I just pushed a new version with all requested changes except the bounded types in Tensor and DataType generics as suggested by @Craigacp. I want to write down a few examples first to compare the complexity/verbosity impacts on the user code

Ok, I've bound types in generics to TType finally, the verbosity impacts are less that I remember and by doing it, I've detected some illegal casts that were remaining in the unit tests, which self-proved its usefulness.

So if everyone agrees on those changes, I'll merge the branch by the end of this week, thanks

saudet · 2019-12-19T02:59:32Z

But the equivalent of the buffer protocol is what this PR provides? Java doesn't have a low level buffer which has multidimensional indexing, that's why Karl wrote one. It's as much effort to implement the Python buffer as it is to implement this one so I'm not sure how it's easier in Python (beside the fact that everyone has already written the code in Python, and we're just starting to write it here).

The closest equivalent to the buffer protocol in Java is ByteBuffer. Sure, it doesn't support multiple dimensions, but that's just some additional information that can be provided separately. It's also limited to 32-bit indices, so we should allow raw addresses and things like that... Making users go through hoops to use this API isn't going to help with adoption!

For your concerns @saudet , I'm still not sure what TF tools cannot handle that you are proposing; with the buffer and data layout interfaces, you can customize pretty much the backing store of an ndarray the way you like. But again it might be my misunderstanding, so I suggest that once the code is merged, we can try to map some buffers of the C libraries you were referring to and then it might become more clear if something is missing and what it is.

It's not important for TensorFlow for Java itself because we have access to all the internal APIs. It just makes it hard to use for anything outside this library since all the glue code is not made available publicly, that's all. But that's something we can update later on. If no one ends up using this API outside TensorFlow because it's hard to use, then that's fine too. Something else will eventually come around.

karllessard · 2019-12-22T19:47:15Z

PR has been merged based on other members reviews.

Update after losses merge

karllessard force-pushed the tensor-nio branch 2 times, most recently from d4d1adf to 037a2ca Compare December 2, 2019 04:18

saudet reviewed Dec 4, 2019

View reviewed changes

SidneyLann previously approved these changes Dec 4, 2019

View reviewed changes

karllessard dismissed SidneyLann’s stale review via 6579534 December 7, 2019 20:02

karllessard force-pushed the tensor-nio branch from b35bbeb to 6579534 Compare December 7, 2019 20:02

karllessard force-pushed the tensor-nio branch 2 times, most recently from a9ff547 to a8b7996 Compare December 8, 2019 02:05

Craigacp reviewed Dec 8, 2019

View reviewed changes

karllessard force-pushed the tensor-nio branch from 73a3af8 to a100ea2 Compare December 18, 2019 04:34

karllessard changed the title ~~Tensor NIO~~ TensorFlow Tools Dec 18, 2019

karllessard added 8 commits December 22, 2019 14:42

Add support for Tensor NIO

ee73e88

Quick fixes in the README file

b2dd45b

Apply post-review changes and rename artifacts

b22a74f

Move Shape to upper level

401c8a4

Rename utils to tools

0828ca6

Add raw buffer factories

8735909

Rename data layouts and adapters

6c02c89

Bound types to TType in generics

4cd0760

karllessard force-pushed the tensor-nio branch from bc959c7 to 4cd0760 Compare December 22, 2019 19:43

karllessard merged commit 4175a4e into tensorflow:master Dec 22, 2019

deansher pushed a commit to deansher/java that referenced this pull request Mar 3, 2021

Merge pull request tensorflow#5 from tensorflow/master

a99dcb4

Update after losses merge

		@@ -0,0 +1,3 @@
		package org.tensorflow.types.family;

		public interface TType {}

TensorFlow Tools #5

TensorFlow Tools #5

Uh oh!

Conversation

karllessard commented Nov 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saudet left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

saudet commented Dec 4, 2019

Uh oh!

karllessard commented Dec 4, 2019

Uh oh!

karllessard commented Dec 4, 2019

Uh oh!

saudet commented Dec 4, 2019

Uh oh!

karllessard commented Dec 4, 2019

Uh oh!

karllessard commented Dec 4, 2019

Uh oh!

saudet commented Dec 5, 2019

Uh oh!

Craigacp commented Dec 5, 2019

Uh oh!

karllessard commented Dec 7, 2019

Uh oh!

saudet commented Dec 8, 2019 via email

Uh oh!

Craigacp left a comment

Choose a reason for hiding this comment

Uh oh!

karllessard commented Nov 30, 2019 •

edited

Loading

saudet left a comment •

edited

Loading