Skip to content

TensorFlow Tools #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 22, 2019
Merged

TensorFlow Tools #5

merged 8 commits into from
Dec 22, 2019

Conversation

karllessard
Copy link
Collaborator

@karllessard karllessard commented Nov 30, 2019

This PR merges the first draft of Tensor NIO into our master repository so that it can be tried out and tested by a larger audience. Here is a minimalist example of usage (more examples can be also found in the unit tests):

    IntNdArray heapData = NdArrays.vector(0, 1, 2, 3);

    // Creates a 2x2 matrix
    try (Tensor<TInt32> tensor = TInt32.tensorOfShape(2, 2)) {
      IntNdArray tensorData = tensor.data();

      // Copy first 2 values of the vector to the first row of the matrix
      tensorData.set(heapData.slice(Indices.range(0, 2)), 0);

      // Copy values at an odd position in the vector as the second row of the matrix
      tensorData.set(heapData.slice(Indices.odd()), 1);

      assertEquals(0, tensorData.getInt(0, 0));
      assertEquals(1, tensorData.getInt(0, 1));
      assertEquals(1, tensorData.getInt(1, 0));
      assertEquals(3, tensorData.getInt(1, 1));

      // Read rows of the tensor in reverse order
      IntNdArray reversedTensorData = tensorData.slice(Indices.all(), Indices.flip());

      assertEquals(1, reversedTensorData.getInt(0, 0));
      assertEquals(0, reversedTensorData.getInt(0, 1));
      assertEquals(3, reversedTensorData.getInt(1, 0));
      assertEquals(1, reversedTensorData.getInt(1, 1));

      try (EagerSession session = EagerSession.create()) {
        Ops tf = Ops.create(session);

        // Compute the power of the tensor by itself
        Constant<TInt32> x = tf.constant(tensor);
        IntNdArray result = tf.math.pow(x, x).tensorData();

        // Validate result by computing the same operation in Java
        tensorData.scalars().forEachIndexed((coords, s) ->
          assertEquals(Math.pow(s.getInt(), s.getInt()), result.getInt(coords), 1e-7f)
        );
      }
    }

Some additional notes:

  • The major breaking feature is that now, the Tensor class does not carry a Java boxed type, like Integer, but a custom tensor type, like TInt32. This allows us to support more types in Java while still guaranteeing compile-time safety check of operand type compatibility. A lot of the changes seen in this PR are related to this.
  • The c_api package has been moved to org.tensorflow.internal, next to the new package buffer that holds new Tensor buffer mapping utilities.
  • Right now, it looks like we do not control the order we process the @Operator annotation for generating *Ops classes. For this reason, methods are moved within the classes at each compilation, as you can observe in this PR.

@karllessard karllessard force-pushed the tensor-nio branch 2 times, most recently from d4d1adf to 037a2ca Compare December 2, 2019 04:18
Copy link
Contributor

@saudet saudet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean we can now rename getShort(), getInt(), setByte(), etc to just get() and set()? I think that would be a lot nicer.

<groupId>org.tensorflow</groupId>
<artifactId>nio-utils</artifactId>
<version>${project.version}</version>
</dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That feels strange. Why do we need this dependency here? Is it just because that we can't have a separate module until we move away from the code generator in C++?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tensor are now based on NdArray for mapping their memory and allowing direct access to it, so it must depend on nio-utils. I'm not sure to understand what is strange, you meant that memory mapping should occur outside the core API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just the names that are confusing me. When I read something like tensorflow-core, I think like this means the actual "core" without dependencies on other modules. Maybe it's just that tensorflow-nio should really be named something else and not "TensorFlow"? That might also help with adoption from other projects. That's probably something we should continue on the mailing list you created about that and one of the first things to clear up with guys from MXNet, DL4J, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, that's why I renamed the artifact tensorflow-nio-utils to nio-utils (though I'm not a huge fan of that name neither so if anyone comes up with a better one...). So the tensorflow you've noticed there is just to mention our organization in the group and does not name the artifact itself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ndarray-core? I think nio-utils is likely to make people think of java.nio.* and as the language moves away from those classes (and they are 15+ years old at this point), calling them "new" is something of a misnomer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we will group artifacts if there is a need but right now, I cannot think of any other utility library than this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a functional need to group things together. If it's just to put them in abstract categories, it will make it hard to decide which category each module belongs to, possibly without any actual benefits. Maven doesn't even put the parent name in artifact coordinates, so unless we do it explicitly like with tensorflow-core -> tensorflow-core-api, etc then the benefits are even more marginal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, here is what I ended up to do: I preserved the tensorflow-utils name but now it contains the code of the library itself, including DataBuffer and NdArray APIs.

So if in the future it happens that we have more small utility classes like these that we would like to share with the world and that are independent from TF runtime, they will end up in this library as well (kind of a Guava for ML).

There is a lot of presumptions and questions that makes it hard to pick the right name for this library right away, so let's simply rename it in the future if we need to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, but let's name it tensorflow-util just to be consistent with the package name? Or inversely, let's name the package org.tensorflow.utils? I don't have preference for either, I just think that consistency is a good thing to have whenever it makes sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to have them mismatch here, tensorflow-utils sounds more natural for the name of library (as there is more than one utility class in it) while package names are often singular by convention (e.g. java.util.*).

In addition, we don't have a org.tensorflow.core package neither in our core artifacts.

@@ -150,6 +151,7 @@
<compilerOption>${project.basedir}/src/main/native/server_jni.cc</compilerOption>
<compilerOption>${project.basedir}/src/main/native/session_jni.cc</compilerOption>
<compilerOption>${project.basedir}/src/main/native/tensorflow_jni.cc</compilerOption>
<compilerOption>${project.basedir}/src/main/native/tensor_buffers_jni.cc</compilerOption>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought everyone was OK with moving away from writing JNI manually. Why do we need this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I need to create an instance of ByteBuffer out of a tensor native address and size in case Unsafe is not available, and the only way I know of is to do it from JNI. Do you have another way to do it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to do the same as the JNI code in that file:

TF_TensorData(nativeTensor).capacity(TF_TensorByteSize(nativeTensor)).asByteBuffer()

Along with a couple of null checks there too if needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'll give it a try, thanks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, that works, bye bye JNI...

@@ -0,0 +1,3 @@
package org.tensorflow.types.family;

public interface TType {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need empty interfaces like this? That's always a code smell to me... They probably have some functionality in common. BTW, we have default methods too starting with Java SE 8. Can we use those?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to make sure that not all Java types can be passed as a generic type of a Tensor, like it is enforced here:

 public static <T extends TType> Tensor<T> create(Object obj, DataType<T> dtype) {...}

It very soft compile-time type safety but better than nothing, I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a future version of Java we will be able to seal this interface so it only has the implementations that we specify, which will make it more useful & reliable (as we'll be able to switch on the subclasses with exhaustiveness checking, and users won't be able to do their own implementations that our code can't work with).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that sounds useful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for you @Craigacp, will the sealed interface work in Java the same way as in Kotlin, meaning that will only interfaces from the same package (or the same file for Kotlin) be allowed to extend from it or will there be a different safety mechanism?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the current JEP for the sealed interfaces feature - https://openjdk.java.net/jeps/360. It requires them to be in the same package or module, and listed in the permits clause of the interface/class. If the subclasses are all in the same java file, then they don't have to be listed in the permits clause (but that's probably not how we'll do it).

Class.forName("sun.misc.Unsafe");
unsafeAvailable = true;
} catch (ClassNotFoundException e) {
unsafeAvailable = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is misleading. sun.misc.Unsafe is often available in other implementations of Java SE, but it doesn't always contain all the methods from OpenJDK. We need to make sure it has all the methods we want like this:
https://github.com/bytedeco/javacpp/blob/master/src/main/java/org/bytedeco/javacpp/indexer/UnsafeRaw.java

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll revise this

@saudet
Copy link
Contributor

saudet commented Dec 4, 2019

What's the copyright on the 1500x916.jpg file? I didn't see it mentioned anywhere, but that needs to be clear to put anything in a repository from Google.

@karllessard
Copy link
Collaborator Author

Does that mean we can now rename getShort(), getInt(), setByte(), etc to just get() and set()? I think that would be a lot nicer.

On the DataBuffer, they could but on the NdArray, they are already being used for getting/setting subelements of an array.

I was inspired by this: if you think that get()/set() are replaceable by brackets (like Kotlin does), for a given 2D matrix, it might be natural to retrieve its first row simply with matrix[0] instead of something like matrix.getElement(0). On the other hand, if get()/set() were used to return a value type, then only the right number of brackets that leads to a scalar would be allowed (matrix[0][0] in this case), which is a bit awkward.

@karllessard
Copy link
Collaborator Author

What's the copyright on the 1500x916.jpg file? I didn't see it mentioned anywhere, but that needs to be clear to put anything in a repository from Google.

Oops! I had a reminder to change this one for one of my own picture but clearly I forgot, thanks for pointing that out!

@saudet
Copy link
Contributor

saudet commented Dec 4, 2019

Does that mean we can now rename getShort(), getInt(), setByte(), etc to just get() and set()? I think that would be a lot nicer.

On the DataBuffer, they could but on the NdArray, they are already being used for getting/setting subelements of an array.

I was inspired by this: if you think that get()/set() are replaceable by brackets (like Kotlin does), for a given 2D matrix, it might be natural to retrieve its first row simply with matrix[0] instead of something like matrix.getElement(0). On the other hand, if get()/set() were used to return a value type, then only the right number of brackets that leads to a scalar would be allowed (matrix[0][0] in this case), which is a bit awkward.

Ah, so the get() methods basically return another array? That's interesting. That certainly makes it feel like NumPy, and that's pretty much how ND4J went on to do it too. It hasn't proven to be too confusing, so that sounds good. I think we should keep it like that then.

SidneyLann
SidneyLann previously approved these changes Dec 4, 2019
@karllessard
Copy link
Collaborator Author

Ah, so the get() methods basically return another array? That's interesting. That certainly makes it feel like NumPy, and that's pretty much how ND4J went on to do it too. It hasn't proven to be too confusing, so that sounds good. I think we should keep it like that then.

Ok. Like I said, we could rename those in *DataBuffer though but I'm tempted to leave them like that as well just to be consistent with *NdArray

@karllessard
Copy link
Collaborator Author

BTW group, I was thinking to push a new version that renames the following methods (and their siblings):

  • TInt32.scalar(10) to TInt32.scalarOf(10)
  • TInt32.vector(1, 2, 3, 4) to TInt32.vectorOf(1, 2, 3, 4)
  • TInt32.tensorOfShape(2, 2) to TInt32.ofShape(2, 2)
  • Operand.tensorData() to Operand.data()
  • shape(2, 2) to shapeOf(2, 2)

Do you have any preference?

@saudet
Copy link
Contributor

saudet commented Dec 5, 2019

@karllessard No preferences, but we should try to go with unwritten conventions people are used to:
https://blog.joda.org/2011/08/common-java-method-names.html

@Craigacp
Copy link
Collaborator

Craigacp commented Dec 5, 2019

I'm fine with those renamings, they seem fairly idiomatic.

@karllessard
Copy link
Collaborator Author

Just to let you know that I've just pushed a new version of the library with all your recommendations.

@karllessard karllessard force-pushed the tensor-nio branch 2 times, most recently from a9ff547 to a8b7996 Compare December 8, 2019 02:05
@saudet
Copy link
Contributor

saudet commented Dec 8, 2019 via email

Copy link
Collaborator

@Craigacp Craigacp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other general points:

  • Should a scalar ndarray have a tag interface?
  • Might want to use the code gen pattern from Java Vector API (https://github.com/openjdk/panama/blob/vectorIntrinsics/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/gen-src.sh) to emit the buffers/ndarrays of a particular kind (raw, jdk, etc) from a single source file. Reduces the amount of things that can go wrong in a refactor.
  • Can we get remove UnsafeReference from the public API? I'm uncomfortable with public methods that need it as an argument.
  • Is there a relationship between buffer.Validator and ndarray.Validator? They feel like they have a lot of duplicated code.
  • Needs more javadoc (in the impl classes mainly). I understand how irritating it is to say that too, trust me, but bringing in new people to work in this codebase will be easier with it.

import org.tensorflow.types.TInt64;
import org.tensorflow.types.TString;
import org.tensorflow.types.TUInt8;
import org.tensorflow.types.family.TType;

/**
* A statically typed multi-dimensional array whose elements are of a type described by T.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we more strongly type Tensor to Tensor<T extends TType>?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I tried at first but to discover than this enforcement became quickly viral and that it needs to be everywhere in our methods, classes, wrappers, frameworks and, ultimately, user code.

But since TType does not expose any method and since we enforce it at the beginning of the chain (i.e. at the creation of the Tensor), I realized that there is no real gain carrying it afterward. We just want to make sure that tensors are created using one of our supported data types.

I think our users will appreciate that we relax a bit this rule.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of doing it is that the erased type bound of Tensor becomes Tensor<TType> which makes it tricker for people to skirt the generics to do things that they shouldn't be doing. I agree it seems like it's everywhere in the codebase, and it does ripple a little into user code, but there is some benefit to it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware of the benefits of enforcing these safety checks, I've always been a defender of it, but I think we can make an exception here. I don't care that much of the verbosity in our code base but I'm concerned about the user code.

I guess writing down a quick example may help us to take the right decision. Meanwhile, I'm still not sure to understand what wrong can happen if we leave the generic type parameter unbounded after creating the tensor, other than not being a good practice. Do you have a specific scenario in mind?

t.buffer().put(data);
return t;
}

public static <T extends TType> Tensor<T> allocate(DataType<T> dtype, Shape shape) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs some javadoc. Looks like this makes an empty buffer, but what does that mean for types like String? Are they populated by the null reference or the empty string etc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that the need to add a proper Javadoc to all endpoints affected by this PR is on my TODO list. I was waiting for some people to try it our first but then the SIG decided we should better merge it and fix it after. Are you still OK with this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm fine with you merging it without the extra javadoc. We need to make sure it's there before there is a release with this code in though, so maybe file an issue for it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, I'll create an issue right after merging

@@ -287,21 +287,20 @@
// Helper function to allocate a Tensor for the create() methods that create a Tensor from
// a java.nio.Buffer.
// Requires: dataType matches T
private static <T> Tensor<T> allocateForBuffer(DataType dataType, long[] shape, int nBuffered) {
final int nflattened = numElements(shape);
private static <T extends TType> Tensor<T> allocateForBuffer(DataType dataType, long[] dimSizes, int nBuffered) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we type DataType with <T> here? And in a bunch of other places. Or alternatively <? extends TType>.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should definitely extend <T>, those are left overs from previous implementation, I'm surprised my IDE is not giving me any warning about it... I'll fix it, thanks!

DataType(int value, int byteSize) {
this.value = value;
this.byteSize = byteSize;
public static <T> DataType<T> create(String name, int value, int byteSize, TensorMapper<T> tensorMapper) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stronger type? <T extends TType>.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmh, I don't remember if that one was as hemorrhagic as with Tensor, I'll give it a try and let you know.

* @param shape the shape of the tensor
* @return data structure of elements of this type
*/
T apply(TF_Tensor nativeTensor, Shape shape);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exposes the JavaCPP TF_Tensor type into the public API. Can we fix that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not "really" exposed. This interface is only used internally but it has to be public unfortunately because it is shared between packages.

What I can do is to move it under org.tensorflow.internal.buffer so at least it is known to be internal. And since it does not return a TF_Tensor but accepts one, there is no major "leak" neither.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's times like this that I understand why C++ has friend classes.

this.start = start;
}

private long start;
Copy link
Collaborator

@Craigacp Craigacp Dec 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields should be final, it signals to the JIT that it can make more optimisations (though I dunno if it actually will do).

* that select which elements on a given dimension should be included/excluded
* from that view.
*/
public interface Index {
Copy link
Collaborator

@Craigacp Craigacp Dec 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make this interface functional or more constant? I'm a little worried that the JVM won't be able to optimise all these small objects properly (until they become value types). Maybe we could make them ValueBasedClasses? The indices provide conceptually similar functionality to VarHandle from Java 9, which Paul wrote, so maybe his input would be useful on these classes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's do this as a follow up.


@Override
public boolean equals(Object obj) {
return false; // All unknown dimensions are distinct
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't they have reference equality? The hashcode does.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is a remnant of a previous implementation and is not used anymore, I'll remove it

*/
@Override
public int hashCode() {
return (int) numElements();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we care about hashcode for this we should apply some mixing function first. Integer's hashcode is unfortunately terrible but I think it's dictated by the spec.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree, I'll fix it


@Override
public double getDouble(long... indices) {
return buffer().getDouble(positionOf(indices, true));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these methods go through buffer(), some access the buffer field directly. It should pick one way, or document that they behave differently (as DoubleDenseNdArray isn't final, so could be subclassed).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buffer() is only there to let the superclass access the buffer stored in the subclass. Again, this has changed so many times... I'll sanitize it, thanks!

@karllessard
Copy link
Collaborator Author

Yes, so what about org.tensorflow.ndarray? That would make it more consistent in a way.

@saudet , I still think that the actual naming is better. Right now, the package tree looks like this:

- org.tensorflow.util
  |_ ndarray
  |_ buffer

It make sense because the buffer API can actually be used with no ndarrays. Also, I have moved the Shape class directly under org.tensorflow.util since it it is also found in other contexts than ndarrays (unmapped tensors, ops operands, etc.)

Also, just to clarify my previous post, I brought the counterexample of tensorflow-core-api to point out that we already don't enforce too much strictness between the name of the artifact and its packages. But I do personally prefer that each artifact have at least their discriminative base package, hence org.tensorflow.util in this case.

@karllessard
Copy link
Collaborator Author

Other general points:

  • Should a scalar ndarray have a tag interface?

@Craigacp : might be hard to enforce in all cases. For example, for a 2x2 matrix, both matrix.get(0) and matrix.get(0, 0) returns an NdArray but only the second will be a scalar. Do you see an actual benefit for having it?

Sounds interesting, I'll take a look, thanks

  • Can we get remove UnsafeReference from the public API? I'm uncomfortable with public methods that need it as an argument.

UnsafeReference is not Unsafe but just a validator/wrapper to it. We are not exposing a free access to Unsafe as it needs to be provided by the caller itself. In addition, those methods are not exposed by the "public API", composed of the public interfaces and their static helpers, but found in the internal *.impl packages, for what it's worth.

  • Is there a relationship between buffer.Validator and ndarray.Validator? They feel like they have a lot of duplicated code.

I guess the difference can be subtle, I will check

  • Needs more javadoc (in the impl classes mainly). I understand how irritating it is to say that too, trust me, but bringing in new people to work in this codebase will be easier with it.

Yes, like I said in a previous comment, I postponed that task for later as this work is still known as "in progress" but I agree it should become a priority.

@Craigacp
Copy link
Collaborator

Craigacp commented Dec 9, 2019

Other general points:

  • Should a scalar ndarray have a tag interface?

@Craigacp : might be hard to enforce in all cases. For example, for a 2x2 matrix, both matrix.get(0) and matrix.get(0, 0) returns an NdArray but only the second will be a scalar. Do you see an actual benefit for having it?

Ah, yeah I'd forgotten that the accessor took a varargs so you couldn't have a sharper return type. Not that the type system is quite powerful enough to do the best thing there anyway. As scalars don't have dimension then I feel like that should be represented in the type system so you can specialise when being passed a scalar (i.e. immediately unboxing it), but it's hard to get methods that emit the right types given the way the library is setup.

  • Can we get remove UnsafeReference from the public API? I'm uncomfortable with public methods that need it as an argument.

UnsafeReference is not Unsafe but just a validator/wrapper to it. We are not exposing a free access to Unsafe as it needs to be provided by the caller itself. In addition, those methods are not exposed by the "public API", composed of the public interfaces and their static helpers, but found in the internal *.impl packages, for what it's worth.

I understand, but anything that's public is part of the public API as far as users are concerned. I've written impl classes with big warnings on them, but unless you can lock them off with a module system or another access control people will still see them in the javadoc and use them for things you weren't expecting.

@saudet
Copy link
Contributor

saudet commented Dec 9, 2019

Yes, so what about org.tensorflow.ndarray? That would make it more consistent in a way.

@saudet , I still think that the actual naming is better. Right now, the package tree looks like this:

- org.tensorflow.util
  |_ ndarray
  |_ buffer

It make sense because the buffer API can actually be used with no ndarrays. Also, I have moved the Shape class directly under org.tensorflow.util since it it is also found in other contexts than ndarrays (unmapped tensors, ops operands, etc.)

Also, just to clarify my previous post, I brought the counterexample of tensorflow-core-api to point out that we already don't enforce too much strictness between the name of the artifact and its packages. But I do personally prefer that each artifact have at least their discriminative base package, hence org.tensorflow.util in this case.

I think the extra package is fine (it allows to split the module into tensorflow-util-ndarray, tensorflow-util-buffer, etc more easily if it makes sense, which we could do right away FWIW), it's just the naming inconsistency that bothers me. If we search for "util" and "utils" with https://search.maven.org, we actually get more hits for "util"! Although we get a lot more hits for "tools" than "tool", so to follow conventions, maybe that's the term you would be looking for? How does org.tensorflow.tools sound to you as package name?

@karllessard
Copy link
Collaborator Author

How does org.tensorflow.tools sound to you as package name?

Yes, I like tools as an artifact and package name, better than utils/util. I'll make the change.

...and we would like to thank all our supporters, friends and family :)

@karllessard
Copy link
Collaborator Author

but it's hard to get methods that emit the right types given the way the library is setup.

It would be possible to add a different method (something like ndarray.scalar(long...)) to return a Scalar type and also make sure that the number of indices provides is enough to reach one. This won't break the contract with the actual API. I suggest to do this though in a second phase, if we identify a real need for having it.

I understand, but anything that's public is part of the public API as far as users are concerned. I've written impl classes with big warnings on them, but unless you can lock them off with a module system or another access control people will still see them in the javadoc and use them for things you weren't expecting.

I can shuffle things a bit to mark these methods as protected instead, would that work for you? With this and the actual protection that forces the caller to do the dirty reflective job to retrieve the instance of Unsafe, I think we can consider that our library has no major security breach, what do you think?

@Craigacp
Copy link
Collaborator

but it's hard to get methods that emit the right types given the way the library is setup.

It would be possible to add a different method (something like ndarray.scalar(long...)) to return a Scalar type and also make sure that the number of indices provides is enough to reach one. This won't break the contract with the actual API. I suggest to do this though in a second phase, if we identify a real need for having it.

Sure

I understand, but anything that's public is part of the public API as far as users are concerned. I've written impl classes with big warnings on them, but unless you can lock them off with a module system or another access control people will still see them in the javadoc and use them for things you weren't expecting.

I can shuffle things a bit to mark these methods as protected instead, would that work for you? With this and the actual protection that forces the caller to do the dirty reflective job to retrieve the instance of Unsafe, I think we can consider that our library has no major security breach, what do you think?

Protected would work. I don’t consider this a security thing, more an api cleanliness one. We don’t want people to start incorporating UnsafeReference as an field or argument for their classes as then we can’t remove it without it being a breaking change, and it’s really an internal implementation detail.

@Craigacp
Copy link
Collaborator

Protected would work. I don’t consider this a security thing, more an api cleanliness one. We don’t want people to start incorporating UnsafeReference as an field or argument for their classes as then we can’t remove it without it being a breaking change, and it’s really an internal implementation detail.

We need something to let users implement those interfaces for libraries other than TensorFlow. It's either that, JavaCPP's Pointer class, or a couple of long values for pointer addresses, which are just as unsafe. Which one do you prefer that we "leak"? I think the latter is the least worst option, but Karl doesn't like it.

If you're in a subclassing relationship then you could see it if it's protected. As you can't have protected interface members, then it might need to become an abstract class (which might prove tricky), but at that point you could hide it effectively in the Raw classes.

@karllessard
Copy link
Collaborator Author

but at that point you could hide it effectively in the Raw classes.

I'm wondering, since I accept the Unsafe instance as an Object first (which I validate then), maybe having a public endpoint accepting the Object could be enough, i.e. something like RawMemoryMapper.create(Object) where the object passed must be an instance of sun.misc.Unsafe. Then the RawMemoryMapper will have endpoints to map memory segments from their address. So there won't be any UnsafeReference anymore, as the Unsafe instance would be kept in the RawMemoryMapper.

@saudet
Copy link
Contributor

saudet commented Dec 12, 2019

but at that point you could hide it effectively in the Raw classes.

I'm wondering, since I accept the Unsafe instance as an Object first (which I validate then), maybe having a public endpoint accepting the Object could be enough, i.e. something like RawMemoryMapper.create(Object) where the object passed must be an instance of sun.misc.Unsafe. Then the RawMemoryMapper will have endpoints to map memory segments from their address. So there won't be any UnsafeReference anymore, as the Unsafe instance would be kept in the RawMemoryMapper.

Actually, you're already passing the pointer address to your objects there, so I don't understand why we need to expose Unsafe at all...? If it's available, we use it, if it's not available, we just use something else like ByteBuffer or whatever OpenJDK comes up with eventually. Users don't need to know what we're doing to access their data, do they? We could expose an Enum or something to give them a choice of which implementation gets used, and I think that's enough. What other use cases are you thinking about?

@saudet
Copy link
Contributor

saudet commented Dec 12, 2019

but at that point you could hide it effectively in the Raw classes.

I'm wondering, since I accept the Unsafe instance as an Object first (which I validate then), maybe having a public endpoint accepting the Object could be enough, i.e. something like RawMemoryMapper.create(Object) where the object passed must be an instance of sun.misc.Unsafe. Then the RawMemoryMapper will have endpoints to map memory segments from their address. So there won't be any UnsafeReference anymore, as the Unsafe instance would be kept in the RawMemoryMapper.

Actually, you're already passing the pointer address to your objects there, so I don't understand why we need to expose Unsafe at all...? If it's available, we use it, if it's not available, we just use something else like ByteBuffer or whatever OpenJDK comes up with eventually. Users don't need to know what we're doing to access their data, do they? We could expose an Enum or something to give them a choice of which implementation gets used, and I think that's enough. What other use cases are you thinking about?

Ah, I remember what you said. You're basically considering the Unsafe object like we consider ByteBuffer to be "safe", so we'd be adding a new backend for others like MemorySegment and what not. Fair enough. It's more work for people to implement, so I suspect not everyone is going to be alright with that, but we can always add an "unsafe" backend at some point if this library becomes popular and everyone gets tired of copy/pasting the same thing around.

@Craigacp
Copy link
Collaborator

Well it's only the DataBuffer classes that need to be implemented separately for things like MemorySegment, the NdArray classes on top of them should be agnostic. So it's not that much of an implementation burden right?

The MemorySegment API was merged into 14's tree today before the feature freeze, so it should be in the 14 release in March.

@karllessard
Copy link
Collaborator Author

karllessard commented Dec 13, 2019

The idea of passing the Unsafe object in parameter is also to avoid having a public endpoint which is simply DataBuffer mapMe(long address, long length). While it is tempting on the API and usability level, it is completely unsafe since it allows to pass any arbitrary number as an address, which can screw up the whole JVM. The problem though is that we do need this endpoint for mapping tensor buffers from tensorflow-core-api. So by forcing the caller pass Unsafe himself at the initial memory mapping, we prevent that users "just wandering around" starts to play with this functionality too easily, while still allowing entitled libraries to use it by adding the required piping, which is still minimal.

We all know that there is no perfect solution here neither. I talked with Maurizio from Panama and on their side, they focus to enforce the safety of native memory mapping from the MemoryAddress itself. That will be interesting when it will come. We could try to do it the same way with JavaCPP pointers but that will require us to create another artifact like tensorflow-tools-javacpp. Or tensor memory could also be mapped directly in tensorflow-core-api, as it was initially. This is the safest way to achieve this as we can guarantee that the address being mapped is one of a tensor, but then we lose completely reusability and we end up reimplementing all raw buffers in other libraries. Like @Craigacp pointed out, it might not be such a big deal, but it is not necessarily what we want neither.

Now I'm experimenting something new: having a non-final RawDataBufferFactory, located in the raw buffers implementation package, that exposes public endpoints for creating raw (unsafe) buffers from arrays but keep protected those for mapping native memory. Then a library can simply extend from this factory and expose/use the "hidden" endpoints in the safest way possible, according to its needs. Since we won't have public mapMe(address, length) endpoints anymore, it is now OK to retrieve the Unsafe instance directly from this factory and relieve other libraries from this responsibility/burden. Of course, anyone can again simply extend from this class and start mapping native memory with arbitrary addresses but by locating the factory under a impl package and by hidding the native memory mapping endpoint from the JavaDoc (since they are protected), we might obfuscate it enough for wandering users.

@saudet
Copy link
Contributor

saudet commented Dec 13, 2019

That sounds really nice, but you guys have to realize that all this running around in circles prevents end users from actually using those tools for any libraries not explicitly supported, including all those distributed with the JavaCPP Presets:
https://github.com/bytedeco/javacpp-presets/

One of the reasons why NumPy works so well is that it facilitates data sharing between libraries, but this implementation of ndarray here doesn't...If I'm missing something though, please enlighten me.

@karllessard
Copy link
Collaborator Author

karllessard commented Dec 13, 2019

That sounds really nice, but you guys have to realize that all this running around in circles prevents end users from actually using those tools for any libraries not explicitly supported, including all those distributed with the JavaCPP Presets:
https://github.com/bytedeco/javacpp-presets/

One of the reasons why NumPy works so well is that it facilitates data sharing between libraries, but this implementation of ndarray here doesn't...If I'm missing something though, please enlighten me.

It does not prevent any of this, libraries just need to extend from this factory class and make use of the mapNative endpoints the way they want. Here is how TF is doing it:

class TensorRawDataBufferFactory extends RawDataBufferFactory {

  static ByteDataBuffer mapTensorToBytes(Pointer tensorMemory) {
    return mapNativeBytes(tensorMemory.address(), tensorMemory.capacity(), false);
  }

  static IntDataBuffer mapTensorToInts(Pointer tensorMemory) {
    return mapNativeInts(tensorMemory.address(), tensorMemory.capacity(), false);
  }

  static LongDataBuffer mapTensorToLongs(Pointer tensorMemory) {
    return mapNativeLongs(tensorMemory.address(), tensorMemory.capacity(), false);
  }

  static FloatDataBuffer mapTensorToFloats(Pointer tensorMemory) {
    return mapNativeFloats(tensorMemory.address(), tensorMemory.capacity(), false);
  }

  static DoubleDataBuffer mapTensorToDoubles(Pointer tensorMemory) {
    return mapNativeDoubles(tensorMemory.address(), tensorMemory.capacity(), false);
  }

  static StringTensorBuffer mapTensorToStrings(Pointer tensorMemory, long numElements) {
    long offsetByteSize = numElements * Long.BYTES;
    LongDataBuffer offsets = mapNativeLongs(tensorMemory.address(), numElements, false);
    ByteDataBuffer data = mapNativeBytes(
        tensorMemory.address() + offsetByteSize,
        tensorMemory.capacity() - offsetByteSize,
        false);
    return new StringTensorBuffer(offsets, data);
  }
}

BTW, I'm not sure why you are referring to NumPy in this case, the actual discussion is about how native memory should be mapped initially from Java and has no impact on how data can be shared thereafter or how we can manipulate it.

@Craigacp
Copy link
Collaborator

That sounds really nice, but you guys have to realize that all this running around in circles prevents end users from actually using those tools for any libraries not explicitly supported, including all those distributed with the JavaCPP Presets:
https://github.com/bytedeco/javacpp-presets/

The aim is to make a Java type for ndarrays, not to wrap every C library's version of an ndarray into the same interface. Supporting every C library's view of memory is likely to cause issues (e.g. at the moment it doesn't support different storage orders).

One of the reasons why NumPy works so well is that it facilitates data sharing between libraries, but this implementation of ndarray here doesn't...If I'm missing something though, please enlighten me.

All the Python libraries use NumPy arrays, or implement the same interface (via duck typing). So the libraries are built on numpy and opted into it's representation. This API will also require opting into it's representation by implementing it's interfaces if you want to store memory in a different way to how the current implementation does (which should at most be the buffer interface) so I'm not sure what the difference is?

@karllessard
Copy link
Collaborator Author

karllessard commented Dec 13, 2019

All the Python libraries use NumPy arrays, or implement the same interface (via duck typing). So the libraries are built on numpy and opted into it's representation. This API will also require opting into it's representation by implementing it's interfaces if you want to store memory in a different way to how the current implementation does (which should at most be the buffer interface) so I'm not sure what the difference is?

Ok, I might have misunderstood the original question then. FYI, it is also possible to change the layout of your data in memory just by providing a DataAdapter, without the need of reimplementing the whole DataBuffer interface (and by the way, I should probably rename that interface to DataLayout).

@Craigacp
Copy link
Collaborator

All the Python libraries use NumPy arrays, or implement the same interface (via duck typing). So the libraries are built on numpy and opted into it's representation. This API will also require opting into it's representation by implementing it's interfaces if you want to store memory in a different way to how the current implementation does (which should at most be the buffer interface) so I'm not sure what the difference is?

Ok, I might have misunderstood the original question then. FYI, it is also possible to change the layout of your data in memory just by providing a DataAdapter, without the need of reimplementing the whole DataBuffer interface (and by the way, I should probably rename that interface to DataLayout).

But for things like GPU memory vs CPU memory (where you need to apply a native hook to get the pointer into the right address space) it's probably best that they implement a new DataBuffer.

@saudet
Copy link
Contributor

saudet commented Dec 15, 2019

That sounds really nice, but you guys have to realize that all this running around in circles prevents end users from actually using those tools for any libraries not explicitly supported, including all those distributed with the JavaCPP Presets:
https://github.com/bytedeco/javacpp-presets/

The aim is to make a Java type for ndarrays, not to wrap every C library's version of an ndarray into the same interface. Supporting every C library's view of memory is likely to cause issues (e.g. at the moment it doesn't support different storage orders).

One of the reasons why NumPy works so well is that it facilitates data sharing between libraries, but this implementation of ndarray here doesn't...If I'm missing something though, please enlighten me.

All the Python libraries use NumPy arrays, or implement the same interface (via duck typing). So the libraries are built on numpy and opted into it's representation. This API will also require opting into it's representation by implementing it's interfaces if you want to store memory in a different way to how the current implementation does (which should at most be the buffer interface) so I'm not sure what the difference is?

No, that's not true. It's very easy to create a NumPy ndarray on top of any buffer:
https://jakevdp.github.io/blog/2014/05/05/introduction-to-the-python-buffer-protocol/
I just wanted to point out the inconsistency in your goals, not criticizing. You won't be able to eat your cake and have it too!

@Craigacp
Copy link
Collaborator

That sounds really nice, but you guys have to realize that all this running around in circles prevents end users from actually using those tools for any libraries not explicitly supported, including all those distributed with the JavaCPP Presets:
https://github.com/bytedeco/javacpp-presets/

The aim is to make a Java type for ndarrays, not to wrap every C library's version of an ndarray into the same interface. Supporting every C library's view of memory is likely to cause issues (e.g. at the moment it doesn't support different storage orders).

One of the reasons why NumPy works so well is that it facilitates data sharing between libraries, but this implementation of ndarray here doesn't...If I'm missing something though, please enlighten me.

All the Python libraries use NumPy arrays, or implement the same interface (via duck typing). So the libraries are built on numpy and opted into it's representation. This API will also require opting into it's representation by implementing it's interfaces if you want to store memory in a different way to how the current implementation does (which should at most be the buffer interface) so I'm not sure what the difference is?

No, that's not true. It's very easy to create a NumPy ndarray on top of any buffer:
https://jakevdp.github.io/blog/2014/05/05/introduction-to-the-python-buffer-protocol/
I just wanted to point out the inconsistency in your goals, not criticizing. You won't be able to eat your cake and have it too!

But the equivalent of the buffer protocol is what this PR provides? Java doesn't have a low level buffer which has multidimensional indexing, that's why Karl wrote one. It's as much effort to implement the Python buffer as it is to implement this one so I'm not sure how it's easier in Python (beside the fact that everyone has already written the code in Python, and we're just starting to write it here).

@karllessard
Copy link
Collaborator Author

FYI, I just pushed a new version with all requested changes except the bounded types in Tensor and DataType generics as suggested by @Craigacp. I want to write down a few examples first to compare the complexity/verbosity impacts on the user code. But everything that we agreed else should be in there.

For your concerns @saudet , I'm still not sure what TF tools cannot handle that you are proposing; with the buffer and data layout interfaces, you can customize pretty much the backing store of an ndarray the way you like. But again it might be my misunderstanding, so I suggest that once the code is merged, we can try to map some buffers of the C libraries you were referring to and then it might become more clear if something is missing and what it is.

@karllessard karllessard changed the title Tensor NIO TensorFlow Tools Dec 18, 2019
@karllessard
Copy link
Collaborator Author

FYI, I just pushed a new version with all requested changes except the bounded types in Tensor and DataType generics as suggested by @Craigacp. I want to write down a few examples first to compare the complexity/verbosity impacts on the user code

Ok, I've bound types in generics to TType finally, the verbosity impacts are less that I remember and by doing it, I've detected some illegal casts that were remaining in the unit tests, which self-proved its usefulness.

So if everyone agrees on those changes, I'll merge the branch by the end of this week, thanks

@saudet
Copy link
Contributor

saudet commented Dec 19, 2019

But the equivalent of the buffer protocol is what this PR provides? Java doesn't have a low level buffer which has multidimensional indexing, that's why Karl wrote one. It's as much effort to implement the Python buffer as it is to implement this one so I'm not sure how it's easier in Python (beside the fact that everyone has already written the code in Python, and we're just starting to write it here).

The closest equivalent to the buffer protocol in Java is ByteBuffer. Sure, it doesn't support multiple dimensions, but that's just some additional information that can be provided separately. It's also limited to 32-bit indices, so we should allow raw addresses and things like that... Making users go through hoops to use this API isn't going to help with adoption!

For your concerns @saudet , I'm still not sure what TF tools cannot handle that you are proposing; with the buffer and data layout interfaces, you can customize pretty much the backing store of an ndarray the way you like. But again it might be my misunderstanding, so I suggest that once the code is merged, we can try to map some buffers of the C libraries you were referring to and then it might become more clear if something is missing and what it is.

It's not important for TensorFlow for Java itself because we have access to all the internal APIs. It just makes it hard to use for anything outside this library since all the glue code is not made available publicly, that's all. But that's something we can update later on. If no one ends up using this API outside TensorFlow because it's hard to use, then that's fine too. Something else will eventually come around.

@karllessard karllessard merged commit 4175a4e into tensorflow:master Dec 22, 2019
@karllessard
Copy link
Collaborator Author

PR has been merged based on other members reviews.

deansher pushed a commit to deansher/java that referenced this pull request Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants