-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Indexer to allow an hyper-rectangular selection #391
Comments
I'm giving thus a shot by myself, as it would be great to create from an existing |
That's obviously something we need, but this requires to bring in a lot of stuff. This is probably going to be more challenging than you imagine. Please also consider other solutions for this, such as ND4J or |
Wow, thanks! I checked those options but:
I added a WIP PR #392 about this issue. |
Thanks! @karllessard So, it looks like it would be a good time to start talking about this again. What do you think? |
@matteodg , for safety reasons, TF Tools does not expose publicly the endpoints to map a buffer from a native address. But you can still access them by extending a class from Would that work for you? |
Thanks for the hint: I was worried about how the memory area would still be accessible if the |
I think it depends on each case. In TensorFlow, the life of the Now, if a user decides to keep a reference to that data outside the tensor scope and if the tensor is freed, then yes, it might result in errors. e.g. IntNdArray data;
try (Tensor t = TInt32.vectorOf(1, 2, 3, 4)) {
data = t.data();
}
data.getInt(0); // unexpected result So unfortunately, we must rely on the "good behaviour" of our users. To preserve the data beyond the lifecycle of a tensor, they need to copy it. |
@karllessard We understand that it's possible to do all that manually, but the point is, we need something that is easier to use (and also safer to use). Asking users to implement that interface isn't going to make anything safer. It's just going to prevent people from getting work done. Some of them may give it a try and implement it, and do it incorrectly, making it less safe than a correct implementation that we could provide them with. If you still stand by your opinion that it should be hard to use, then there's nothing more to discuss. I just wanted to remind you that more and more users are going to request this. |
Our use case is mainly for interoperability with the HDF5 Preset from JavaCPP. Since it is the only HDF5 library allowing to use 64bit indexing, we would like to use that backend and be able to create views of datasets that the library returns. |
@saudet : I hear your point and I'm not closed to the idea of relaxing the safety surrounding the endpoints. I also think there is a bit of redundancy between @matteodg : Just to understand a bit more your use case, you are working with HDF5 datasets using JavaCPP and you are simply looking for tools to read or write the buffer data obtained by |
@karllessard Yes, that's exactly the use case, I'm dealing with lately, so all these offered solutions are great. Moreover I need a layer it is also detachable from the specific library that reads the HDF5 (as we are using the old NCSA HDF 1.8 library as a fallback, because sometimes JavaCPP HDF5 Preset is crashing unpredictably) or any other storage. So what is What about including directly in Another good thing would be to have every I have to say I like very much the On a side note I agree on renaming OK, that's enough: too many points ;-) |
It wouldn't make sense as part of the main artifact of JavaCPP, but it could be made available as a separate module, sure. But if the rest of TensorFlow isn't going to use that module, it won't help in terms of usability anyway. We would then have 2 incompatible implementations that couldn't be used interchangeably. |
Thanks @matteodg , those are all very valuable comments,
I'll think of a different name and propose it during TF Java team meeting (SIG JVM) at the end of the month, which you are invited to join if you want. The library can still be distributed under the TensorFlow organization (to gain some visibility) or not, we'll see what we come up with.
One thing interesting about this library is that it is lightweight and does not have any external dependency. So if it cannot be done the other way around (i.e. JavaCPP depending on this library), then it could be done as a separate module like @saudet was suggesting, either on the JavaCPP side or on TF Tools side. Another solution would be to make the dependency to JavaCPP in TF Tools optional and detect its presence in the classpath dynamically but these kind of solutions are always a bit cumbersome.
For TensorFlow, in my previous example, the tensor would still be closed and the memory released even if the
@saudet : the idea was that TensorFlow would be using it as well. What would happen of the indexers currently in JavaCPP, would they be part of this module as well? In TensorFlow, we currently don't use them but if they are mapped to the |
No, that's fine. The idea is just to keep a reference to
Ok, it's good to hear that we're converging in intent. long time = System.nanoTime();
FloatPointer largePointer = new FloatPointer(1024 * 1024 * 1024);
FloatIndexer largeIndexer = FloatIndexer.create(largePointer, 1024, 1024, 1024);
for (int i = 0; i < 1024; i++) {
for (int j = 0; j < 1024; j++) {
for (int k = 0; k < 1024; k++) {
largeIndexer.put(new long[] {i, j, k}, 2 * largeIndexer.get(new long[] {i, j, k}));
}
}
}
System.out.println("Took " + (System.nanoTime() - time) / 1000000 + " ms"); Took 20474 ms long time = System.nanoTime();
FloatPointer largePointer = new FloatPointer(1024 * 1024 * 1024);
FloatIndexer largeIndexer = FloatIndexer.create(largePointer, 1024, 1024, 1024);
for (int i = 0; i < 1024; i++) {
for (int j = 0; j < 1024; j++) {
for (int k = 0; k < 1024; k++) {
largeIndexer.put(i, j, k, 2 * largeIndexer.get(i, j, k));
}
}
}
System.out.println("Took " + (System.nanoTime() - time) / 1000000 + " ms"); Took: 3215 ms Which is pretty close to an equivalent C/C++ program that doesn't check bounds: int main() {
float *large = new float[1024 * 1024 * 1024];
for (int i = 0; i < 1024; i++) {
for (int j = 0; j < 1024; j++) {
for (int k = 0; k < 1024; k++) {
large[i * 1024 * 1024 + j * 1024 + k] = 2 * large[i * 1024 * 1024 + j * 1024 + k];
}
}
}
return large[1024 * 1024 * 1024 - 1];
} g++ -O3 test.cpp The use cases, the level of abstraction of |
…Index` and a `CustomStridesIndex` to keep the previous logic
…Index` and a `CustomStridesIndex` to keep the previous logic
…lass `CustomStridesIndex` to keep the previous logic. Add also `DefaultIndex`
… the same backing array/buffer/pointer
… the same backing array/buffer/pointer
@mcimadamore This sounds great for what |
…lass `CustomStridesIndex` to keep the previous logic. Add also `DefaultIndex`
… the same backing array/buffer/pointer
…. HyperslabIndex now extends StrideIndex
….strides() (package private) only for backward compatibility, because we need to return strides() in Indexer.
…x.strides(), add Index.ONE as single dimension index of size one, remove Indexer.ONE_STRIDE
…o we save some multiplications
@karllessard Just to keep you up-to-date: I ended up using JavaCPP |
I saw guys that you added a lot of stuff in JavaCPP indexers to get this working. While it's totally fine, I'm just a bit concerned though about the nature of JavaCPP, that seems to do way more that just providing "a missing bridge between Java and native C++" and starts to become a framework. Would it make sense to provide the indexers as a separate artifact, for example? |
Yes, I actually agree and I was going to suggest that too. |
It's still pretty slim, we didn't need to add that much. The main problem that There's also the issue of what we should do to support GPUs, among other things. That should be our main concern here. On the CPU with Java, we don't need something like NumPy. We can do loops and stuff manually and get speed that is pretty close to C++. The whole point of creating a NumPy-like interface for Java would be to allow computation on the GPU, and that means compiling native code with CUDA and what not. PyTorch actually markets itself as a replacement for NumPy that works on GPUs:
https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html We need to be able to do the same with something similar in Java. Maybe the default mode would be 100% pure Java or whatever, but it should be able to use TensorFlow as backend. We would need an implementation of |
In general you can attach whatever index pre-processing capability you want with In your example the filtering function could be something like this (taken from your example): @Override
public long index(long i, long j, long k) {
return (offsets[0] + hyperslabStrides[0] * (i / blocks[0]) + (i % blocks[0])) * strides[0]
+ (offsets[1] + hyperslabStrides[1] * (j / blocks[1]) + (j % blocks[1])) * strides[1]
+ (offsets[2] + hyperslabStrides[2] * (k / blocks[2]) + (k % blocks[2])) * strides[2];
} So, assuming you have a plain indexed var handle whose only coordinate is a
to
where, on each access, the above function will be computed, yield a long index value which can then be used to access the underlying memory region. In other words - offset and stride access are only two ready-made combinator that developers can use - which are meant to support common use cases; but in the next iteration of the API (which we are close to finalize) there's a rich VarHandle combinator API which lets you express pretty much what you want (as long as the translation can be reasonably expressed in a functional way, same deal as with MethodHandle combinators). |
@mcimadamore Ok, but with |
The question was:
To which I've replied. Implementing custom addressing schemes as method handle adapters gives a chance to C2 to inline the entire access expression, which is something that I doubt would happen with the overload strategy in the JavaCPP case (benchmark needed of course). The information you seek is already available out there (all JEPs, CSRs, code review were public all along, really). As with all things, sometimes you also have to get your hands dirty and try things out if you want to get a better sense of how things are used in practice. |
Yes, of course, overloaded methods do get fully inlined and optimized. That's what I showed above:
I understand, but still I'm pretty sure the information I would like to find isn't available. If you feel otherwise, please help me find it. What I'm looking should allow us to do something like this: static long index(long i, long offset, long hyperslabStride, long block, long stride) {
return (offset + hyperslabStride * (i / block) + (i % block)) * stride;
}
// ...
VarHandle floatHandle = MemoryHandles.varHandle(float.class, ByteOrder.nativeOrder());
MethodHandle index = MethodHandles.lookup().findStatic(thatClass, "index", MethodType.methodType(long.class, long.class, long.class, long.class, long.class, long.class));
VarHandle indexedElementHandle = MemoryHandles.withIndex(floatHandle, 4, index, offsets, hyperslabStrides, blocks, strides); How can we do that? |
Note that TF Tools only focus on I/O operations (per-design) and not linear algebra like NumPy or other implementation of ND arrays that I saw (ND4J, DJL, ...). So basically that would just mean to read/write data from/to GPU memory. I never looked at this though, do we really need TF ops or is DMA also available? |
@karllessard Well, we do need some way to shuffle data in GPU memory for sure, see bytedeco/javacpp-presets#863 /cc @okdzhimiev |
The documentation for the combinator API is fully available in the CSR that has been submitted few weeks ago: https://bugs.openjdk.java.net/browse/JDK-8243496 Inside you will find code changes, spec changes and a link to the foreign package javadoc. If you look inside As for your specific case, I think the code would look something like this:
So, the resulting indexed handle will take 5 coordinates and will produce a long which will be forwarded to the original var handle. If you want to inject some of the coordinates to some known constant value that will not change across accesses, you can use the
And so forth. |
Great, thanks! Next time though, please just copy/paste the URL instead of saying something vague like "rich VarHandle combinator API". I don't consider these changes to be "rich", so such subjective statements lead to confusion. And on my side, I will make sure to keep in mind that everything you mention is available online, somewhere, and that I just need to ask, politely. :)
I see, that's rather user-unfriendly, but I guess it should be able to get the job done. Thanks! @matteodg Any comments regarding that API? |
This is of course subjective - if you look at the amount of combinators added to http://cr.openjdk.java.net/~mcimadamore/8243491_v3/javadoc/jdk/incubator/foreign/MemoryHandles.html You will see quite few combinators there. While these are not the full set of combinators available on MethodHandles - they're pretty close, and they have been hand-picked precisely to make sure that use cases such as the one you describe (along with many others) were expressible. Which is what I meant by "rich". Now, you might not like the API idiom forced down by MethodHandle/VarHandle, but that's a different (and also equally subjective!) problem. I'm not an expert of the So, is |
I actually checked out the |
@matteodg - I think you hit the nail on the head; the goal for the new memory access API is to serve as a foundation for building expressive and efficient access abstractions. Currently the only foundation frameworks can build on top of is What would be very interesting would be to see if the |
The changes including HyperslabIndex have been released with JavaCPP 1.5.4! |
What about improve the
Indexer
classes to use the same concert of "hyper-rectangular" selection that HDF5 library is using?See https://portal.hdfgroup.org/display/HDF5/Reading+From+or+Writing+To+a+Subset+of+a+Dataset
The text was updated successfully, but these errors were encountered: