Skip to content

Add Session Result class #167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 19 commits into from
Closed

Conversation

rnett
Copy link
Contributor

@rnett rnett commented Dec 8, 2020

Adds Session.Result class, which is a wrapper around the output tensors and the fetched outputs. This allows querying results in the same style as fetch (from name, name and index, Output, and Operand). The Output and Operand methods are also typesafe. It's Iterable and AutoClosable (close closes the result tensors). The run().get(0) usage works just fine still, so even though it's a breaking change, it should be pretty minor.

I also moved the getOutput (from name and from name and index) methods from Session to Graph and made them public.

I'd like to make feed typesafe as well, but was not able to find a good way to do it.

Copy link
Collaborator

@Craigacp Craigacp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should remove the AutoCloseableList class from the tests too, as this basically replaces it. You might also want to look at the very similar class I wrote in the ONNX Runtime Java API - https://github.com/microsoft/onnxruntime/blob/master/java/src/main/java/ai/onnxruntime/OrtSession.java#L1091.

* @return operation in the graph with this name
* @see #operation(String)
*/
public GraphOperation operationOrError(String name) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't this return null or Optional<GraphOperation> rather than throwing? Is it a Kotlin thing?

Copy link
Contributor Author

@rnett rnett Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of, it definitely works better from it, eventually I'd want to mark it @NonNull. I moved the method here from session, where it did throw like that rather than returning null. Since we already had the operation in Graph (which returns null if not found) I added this as operationOrError rather than moving the check to any uses.

int index = Integer.parseInt(output.substring(colon + 1));
return new Output(operationOrError(op), index);
} catch (NumberFormatException e) {
return new Output(operationOrError(output), 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fallback case seems odd, as it doesn't log anything when it's likely to be programmer error if it's triggered right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved that from Session, I don't think it's too odd? It's less about the actual number and more about if you have an op name like scope:myOp you want to get scope:myOp, not try to interpret myOp as a number.

Logging it would be a good idea, but it doesn't look like there's any logging currently set up. Is there one I can use?

}

/**
* Get the result for {@code output} or throw an {@code IllegalArgumentException} if it wasn't fetched.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given these get methods are basically Map lookups it's a bit unfriendly to have them throw rather than return null or Optional<> as would be idiomatic for Java.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm of two minds here. On one hand, it is a map lookup, so that makes sense. On the other hand, since everything in this map is stuff you passed to fetch, you will most likely only be calling get with things you know exist, and any times you don't will most likely be errors for which throwing makes more sense. I could always add getOrError but that seems like overkill especially w/ the added contains methods. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, now that I have a getter for the map, if you want null returns you can just use that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Craigacp thoughts on renaming it to fetch (w/ the same behavior as now)?

inputs.add(op.output(index));
inputTensors.add(t);
}
Operation op = graph.operationOrError(operation);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a behaviour change as before it used to silently not add anything if the operation didn't exist, now it throws an exception. This should be documented.

Copy link
Contributor Author

@rnett rnett Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't actually change, since operationByName (which is now Graph#operationOrError) would throw if it was going to return null.

if (op != null) {
outputs.add(op.output(index));
}
Operation op = graph.operationOrError(operation);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc update for the behaviour change.

targets.add(op);
}
GraphOperation op = graph.operationOrError(operation);
targets.add(op);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc update for the behaviour change.

@karllessard
Copy link
Collaborator

Hi @rnett , sorry for the delay, I've been a bit busy these days, but before I go through this one, I wanted to let you know that I was also planning to add something similar in the tensor type refactoring that is currently in progress, you can take a look at the TensorList and TensorMap utilities in the original draft here: https://github.com/karllessard/tensorflow-java/tree/tensor-ttype/tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/util

The main purpose in my case was to support implicit casting of the tensor types in the list itself, hence why Session don't return a List anymore but an object. So we can do:

try (TFloat32 t = s.runner().fetch(x).run().single()) {
    // ...
}

instead of

try (TFloat32 t = (TFloat32)s.runner().fetch(x).run().get(0)) {
    // ...
}

So before I start reviewing, I don't know how critical having this PR merged is for you but we need to take into accounts our future needs if there is an agreement on what they are, either we can start that discussion now or either we wait until the type refactoring to be over, wdyt?

@rnett
Copy link
Contributor Author

rnett commented Dec 9, 2020

Looking at TensorMap/List, I think we'll still want a Result class, for get(Operand/Output) (with type safety + implicit casting) and for immutability. I'd probably use TensorList and TensorMap instead of the List and Map I have currently. What do you think about adding ImmutableTensorMap/List classes? I'm not sure how much it would be used, but for this it would certainly be nice. Generic key types (like for Output) for TensorMap would also be nice.

As for the timing, I think we'd be fine to merge this now and then refactor it a bit with the type system, but it doesn't matter to much either way. I mostly wanted this so I could add proper extensions to the Kotlin API, which is waiting on the type system refactor anyways.

@rnett rnett force-pushed the rn_session_run_result branch from d7bd18a to a7903cd Compare December 28, 2020 02:59
@google-cla
Copy link

google-cla bot commented Dec 28, 2020

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

1 similar comment
@google-cla
Copy link

google-cla bot commented Dec 28, 2020

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@karllessard karllessard changed the base branch from master to type-refactor December 28, 2020 16:34
@karllessard karllessard changed the base branch from type-refactor to master December 28, 2020 16:34
@karllessard
Copy link
Collaborator

@rnett , I've pushed my unmerged-branch to a new branch in this repository: https://github.com/tensorflow/java/tree/type-refactor

Can you please rebase your work on it and target it in your PR so that we only the changes coming from your branch?

@karllessard karllessard mentioned this pull request Dec 28, 2020
@rnett rnett changed the base branch from master to type-refactor December 28, 2020 21:27
@rnett
Copy link
Contributor Author

rnett commented Dec 28, 2020

@karllessard your branch is a few commits behind master. Are you planning on rebasing it? If so I'll wait for you to do that before doing mine.

@karllessard karllessard deleted the branch tensorflow:master December 29, 2020 17:17
@karllessard karllessard reopened this Dec 29, 2020
@karllessard
Copy link
Collaborator

Oh sorry @rnett , looks like I've pushed the wrong version of my branch, you should be good now to rebase on it. Refreshing it had also the vicious effect of closing the PR that were targeting that branch, I've reopened this one but if you see other PRs that have been closed by mistake, please reopen them, thanks

@google-cla
Copy link

google-cla bot commented Dec 30, 2020

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@Craigacp
Copy link
Collaborator

@rnett the order swaps because we haven't run down all the non-determinism in the ops generator, so sometimes it emits things in a different order causing spurious changes in PRs.

@rnett rnett changed the base branch from type-refactor to master December 30, 2020 22:58
@rnett rnett force-pushed the rn_session_run_result branch from c51e45a to 5a2f602 Compare December 30, 2020 22:59
@rnett rnett requested a review from Craigacp December 30, 2020 23:02
@karllessard
Copy link
Collaborator

Thanks for looking at this @rnett , I was eager to see such utility to simplify the manipulation of a group of tensors.

Before going to deep into the analysis of the current proposal, I would like to point out something important to consider. While TF Java has always been graph/session centric, it will gradually move towards to a more functional approach, like @tf.function does in Python. This is what the new ConcreteFunction is partially achieving and the core will continue to build up around it to improve the support of functions as the main API for building and executing graphs.

That being said, I think it is important for the solution we are choosing to manipulate a "bundle" of tensors to be fully compatible with the concept of functions. For example, in functions, tensors are not only bundled as a result of an execution but also to feed the underlying session. Right now everything is handled using HashMap but I'm sure that whatever utility we come up for returning the result of a session run should apply to the feeding as well.

If we do that, then the Result class should live outside Session and be renamed to something else. Again I did not went through the whole proposal yet but before doing it I would like to hear your thoughts about it, thanks!

@Craigacp
Copy link
Collaborator

Craigacp commented Jan 1, 2021

@karllessard, I understand the desire for a bundle as inputs, when using ONNXRuntime which has a result object as the output recently I've been missing an equivalent object for inputs to clean those up easily. However for TF I think it might not be ideal. The session result object is helpful because it's an easy way to control the lifetime of a block of tensors which are expected to have the same lifetime, however the input tensors may have different lifetimes (e.g. I frequently have a long lived boolean tensor that sets the network into training mode, and often a long lived one for the current training epoch or learning rate). When the lifetimes of the tensors differ then an AutoCloseable container seems less relevant.

If we wanted to lean into dynamic code generation (which we probably don't due to having to update ASM to track new JDKs every 6 months) then we could make a type safe autocloseable container for the inputs and outputs on load of the ConcreteFunction and then it would expose getters and setters with the right names.

@rnett
Copy link
Contributor Author

rnett commented Jan 1, 2021

@karllessard The Result class is a bit different that a generic tensor container in that it provides keyed and indexed access, as well as Operand/String -> Output mapping, which depends on the Graph instance (it also has the run metadata). If we do end up having tensor containers, I think they would be best used as the backing List/Map in Result, rather than replacing it.

The concrete function API looks neat. Correct me if I'm wrong (it's been a while since I worked with @tf.function), but the goal is essentially to create what looks like a eager function but is actually backed by a graph? I've been playing around with ideas for something similar in Kotlin using compiler plugins (you could use annotations on functions or lambdas), but I'm not sure how you could do the same in Java without ASM generation.

This is probably more of a Kotlin thing than Java, but have you given any thought to having some mechanism for tensor lifetime scopes (like PointerScope)? It seems like most tensors should be lexically scoped (i.e. try with resources), and having some kind of scoping mechanism would make this a lot easier to manage, while still allowing non/globally scoped tensors when needed.

@karllessard
Copy link
Collaborator

karllessard commented Jan 2, 2021

Thanks @Craigacp and @rnett for your good feedbacks. My belief is that if we are about to improve the usability of our API, we should focus more on the functions than on the graph and sessions. @rnett to answer your question, yes, the goal of a ConcreteFunction is to mimic a little bit what Python does, i.e. convert easily a function that can be called eagerly or backed by a graph. Right now, only graph mode is supported by ConcreteFunction but nothing prevents a user to call directly the same method passed as the functionBuilder with an eager session to execute it eagerly. Though I would prefer to make the eager support more explicitly integrated with the function concept. Now should we use an annotation or not, like Python does, I guess it could work but I didn't tried to think how this would fit in the actual design.

Now for the differences in resource management between the inputs and outputs, I was also aware of this detail. For the sake of brainstorming, maybe reference count could be useful here. For example, when we pass a tensor to a bundle, we could just increase the reference count so the tensor gets only released once all references are released. Also, ConcreteFunction has already its way to release or not its resources (the graph and the session) depending on how it has been allocated... I don't have the complete paradigm in mind but we can continue to think about it if we all think that could be something useful, wdyt?

Another point if favor to focus on the functional API is that it worked both with training and inference (after loading a saved model bundle), while using directly the graph and sessions works better only for training since TF2.0, if you remember the issues @Shajan was facing before.

@rnett
Copy link
Contributor Author

rnett commented Jan 2, 2021

I made #181 for the functional API discussion. I don't think any of what we discussed here is blocking this PR?

rnett added 4 commits January 2, 2021 16:42
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
rnett added 15 commits January 2, 2021 16:42
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
@rnett rnett force-pushed the rn_session_run_result branch from 5a2f602 to 3dc6def Compare January 3, 2021 00:43
@karllessard
Copy link
Collaborator

I made #181 for the functional API discussion. I don't think any of what we discussed here is blocking this PR?

I don't want to block the progress of the good work being done here, but I admit I would feel more comfortable adding all these new utilities if I could clearly understand how they would fit in an architecture that promotes the functional API. What about we continue the discussion in #181 a bit further before merging this one?

@rnett
Copy link
Contributor Author

rnett commented Jan 3, 2021

Fine by me, although I think we'll want something similar (and nice to use) for Session regardless of what the functional API looks like. There's always going to be someone who has to run things manually.

@rnett
Copy link
Contributor Author

rnett commented Jan 14, 2021

@karllessard after the discussion and looking further at ConcreteFunction, I don't think we need to wait on this. I do think ConcreteFunction could use a Result class that is essentially TensorMap which throws on missing keys, but that will differ from this enough that there's not much to be shared (the inner map for that will be String -> Tensor, for this it's Output -> Tensor, and we have indices).

If you plan to add TensorMap and TensorList soon I could see waiting for them, however.

@karllessard karllessard mentioned this pull request Jan 17, 2021
@karllessard
Copy link
Collaborator

Hi @rnett , I prefer to continue keeping that PR on hold for now, as #188 seems to slowly converges towards it but with some redesign. I'm sorry if the PRs are not merged as fast as you would hope but these are addressing important core aspects that requires greater attention and reflection from our end.

@rnett
Copy link
Contributor Author

rnett commented Jan 17, 2021

No worries, I just wanted to make sure that something was indeed blocking it.

@Craigacp Craigacp closed this Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants