Skip to content

Add TensorScope #188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 35 commits into from
Closed

Add TensorScope #188

wants to merge 35 commits into from

Conversation

rnett
Copy link
Contributor

@rnett rnett commented Jan 14, 2021

Adds a TensorScope class for managing native tensor resources. Behavior is as described in the class javadoc:

Any tensors created between a scope's creation and calling close(), that haven't been detached, are guaranteed to be closed with the scope (even if they are created in a sub-scope).

It manages RawTensor instances instead of managing TF_Tensor directly to allow for multiple RawTensors that reference the same native handle. While this is an edge case, there's no real cost to handling it.

Tensors created from eager operations will be closed when the eager session closes like they are now, but can be detached.

It does not manage eager operations or NDArrays in any way.

I also added HasTensors to allow for easy resource management of tensor groups/collections (i.e. detach the whole list or object at once). TensorList, TensorMap and similar classes should implement this.

@rnett
Copy link
Contributor Author

rnett commented Jan 14, 2021

Re eager operations: I could include them in TensorScope, but I think doing that in an EagerScope makes it clearer what is being managed. TensorScope closing Operands in addition to tensors seems like it would lead to confusion, and if we did use the same scope, I'd want flags to do just one or the other anyways.

@rnett
Copy link
Contributor Author

rnett commented Jan 14, 2021

Also, the CI seems broken.

@saudet
Copy link
Contributor

saudet commented Jan 14, 2021

How would you go about supporting other resources than "tensors"?

@rnett
Copy link
Contributor Author

rnett commented Jan 14, 2021

How would you go about supporting other resources than "tensors"?

I'm intentionally not handling anything other than tensors in TensorScope. I think the only other resource we would need to handle is TFE_TensorHandle, which would be done via the aforementioned EagerScope or something similar. I haven't looked into the implementation of that too much but it seems like there's no way to have multiple Operand instances referencing the same handle so they could be managed directly.

@saudet
Copy link
Contributor

saudet commented Jan 14, 2021

I understand that you're intentionally not doing anything for other resources than "tensors", but there are other resources than tensors. What would you do for those?

@rnett
Copy link
Contributor Author

rnett commented Jan 14, 2021

I understand that you're intentionally not doing anything for other resources than "tensors", but there are other resources than tensors. What would you do for those?

Which other resources? They're being managed just fine now, I wouldn't do anything to change how they are handled, except potentially the eager handles.

@karllessard
Copy link
Collaborator

Which other resources? They're being managed just fine now, I wouldn't do anything to change how they are handled, except potentially the eager handles.

Tensors are the most annoying one, for sure, but there are many more resources that are managed natively by TF, including graphs, sessions, builders... and soon function graphs.

It is probably ok to keep the actual try-with-resource technic for these but we need to make sure that we are not missing another interesting use case that could benefit from retaining a resource beyond the life of its initial scope.

@rnett
Copy link
Contributor Author

rnett commented Jan 14, 2021

Which other resources? They're being managed just fine now, I wouldn't do anything to change how they are handled, except potentially the eager handles.

Tensors are the most annoying one, for sure, but there are many more resources that are managed natively by TF, including graphs, sessions, builders... and soon function graphs.

It is probably ok to keep the actual try-with-resource technic for these but we need to make sure that we are not missing another interesting use case that could benefit from retaining a resource beyond the life of its initial scope.

Yeah, they are all "big" enough and singular enough I don't think manually handling them is a huge issue. Plus as far as I know the memory use of each is much smaller than tensors (or comes mostly from it's tensors), so being on top of closing them isn't such a big issue.


if (!localScopeStack.contains(this)) {
throw new IllegalStateException("This scope is not on the scope stack, but was not closed."
+ " This should not be possible.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you allowing scopes only on the stack??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it closed child scopes by closing any scopes in front of it in the queue. If somehow it got removed from the queue, that would silently close all scopes. This shouldn't happen, but did before I started storing the stack as a local variable (if it was closed in a different thread than it was opened in), so this is just for safety.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean, if localScopeStack doesn't contain this, then this test fails. That makes it impossible to use a given instance as a field and/or from some other thread! Doesn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of. localScopeStack is a local variable, set from the thread local when the scope is created. It's not thread local itself, so this should always be in it.

It does mean that semantically, the scope is closed on the same thread it was opened on. This makes some sense, because it's only going to collect tensors from that thread anyways.

Copy link
Collaborator

@Craigacp Craigacp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty confused by what this is trying to do. Is there a clear example of the behaviours it's supposed to enable?

The interaction between the threadlocal stack and the closing seems like it will need careful documentation to ensure correct usage, as it looks like there are plenty of ways to get it wrong.

* they are created in a sub-scope). Tensors may be manually closed earlier without issue.
* <p>
* When auto-attach is true, tensors are automatically tracked on creation. A tensor can me manually added to a scope
* with {@link TensorScope#attach(Tensor)} or {@link Tensor#attachToCurrentScope()}. The tensor will then be closed when the first of it's managing scopes closes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want Tensors to be closed by the first scope closure rather than the last? If it's attached to multiple scopes then I'd expect it to still be alive if any of the scopes were alive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, is it because of the stacking of scopes? Even though they could be stored in fields and thus mess up the logical stacking?

Copy link
Contributor

@saudet saudet Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this situation shouldn't happen anyway. It's not clear to me that this was designed to be used in fields, so let's assume we only have references on the stack. If we assume that's true, then any "inner" scopes are supposed to be closed before "outer" scopes. If this doesn't happen, there's probably some logic gone pretty wrong somewhere that this should be the least of our worries...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly, it's not intended to be used in fields, unless it's as a way to ensure that any created tensors are closed when the creator is closed, like in EagerSession. I'm thinking about removing that anyways.

* {@link Tensor#detach()} detaches the tensor from all scopes, requiring the user to close it manually or attach it to
* another scope.
* <p>
* Note that scope management is thread local, except for detach, which will detach even from scopes on other threads.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want the thread local scoping? How is it going to interact with Loom?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, Loom is supposed to give us an alternative to TLS:
http://cr.openjdk.java.net/~rpressler/loom/loom/sol1_part2.html#scope-variables
But that's not going to happen anytime soon, so in the mean time we need something... It sounds to me though that if we were to replace TLS with something like that, we would need to juggle 3 types of objects: Tensor, TensorScope, and something like TensorScopeScope... Am I reading this correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scoped would be perfect for this, I think. Probably rather than a TensorScopeScope I'd have it add itself to the current scope as a child on creation, or something like that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I assume it's thread local to avoid a bunch of synchronisation, but it's not clear why we'd want the scopes to be thread local. What's the argument for that behaviour?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's a lexical scope (like {}), representing a block of code or certain segment of execution. It doesn't make any sense to do something like

try(TensorScope scope = new TensorScope()){
  myTensorFunction();
}

and have it pick up tensors created by something else going on in another thread.

My latest changes should alleviate the issue somewhat, I'm using InheritableThreadLocal for the current scope, but the hierarchy can cross threads, so if you create new threads in a scope like above, they will add tensors to the scope, but previously existing threads don't.

}

localScopeStack = scopeStack.get();
localScopeStack.push(this);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if this gets stored in a field then any other scopes allocated by that thread get stuck inside it? What happens if someone uses a FJP or other executor service and has a scope in there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See recent changes and comment above.

@Craigacp
Copy link
Collaborator

I guess I'd thought of a scope as something I could use to pass around a logical grouping of tensors and ensure they are all closed at the same time. For example the output of session.run. This stacking behaviour seems to contradict that as if I store the scope somewhere to keep that collection of tensors for later, then all the other scopes created afterwards get stuck to it, and if I close one of the scopes then any created afterwards would die.

That seems extremely counter-intuitive to me. Did I misunderstand how this works?

@rnett
Copy link
Contributor Author

rnett commented Jan 15, 2021

The semantics I'm going for here is more of a bounds/guarantee that resources won't leak beyond the scope unless explicitly flagged (via detach()). It's a bit confusing b/c of the similar name to PointerScope, but it's doesn't "own" resources in the same way PointerScope does. It's not something you should hold onto in the same way, it's meant to be wrapped around a region of code (or time) and guarantee everything is cleaned up once that region is done.

The biggest reason for this, other than the "flag to keep" semantics, it that it leaves resource management entirely up to the end user at the highest level. The user doesn't have to know or care about any tensor creation done by the functions they call. Instead of having to use special collections or implement some scope class, or use TensorScope at each level, the user just uses TensorScope themselves. You could use ordinary collections, custom data classes, or w/e.

For example, with result:

Session session = new Session(graph);
TFloat32 answer;
try(TensorScope scope = new TensorScope){
  Result result = session.runner().run();
  answer = result.get(1);
  answer.detach();
}

Regardless of what outputs are returned by the runner, all of them except answer will be closed.

The thread interactions definitely need some notes in the docs. I went with thread local since I'm trying to wrap regions of code, and there could be multiple running in parallel on different threads, whose scopes shouldn't interact. I don't know how that would interact with loom, but I don't think it would cause issues? From what I've seen, they are trying to make code using the current thread API port seamlessly. It definitely doesn't work well with Kotlin's coroutines, since they can suspend and resume on different threads, but I can add stuff for that in the Kotlin API.

@saudet
Copy link
Contributor

saudet commented Jan 15, 2021

For example, with result:

Session session = new Session(graph);
TFloat32 answer;
try(TensorScope scope = new TensorScope){
  Result result = session.runner().run();
  answer = result.get(1);
  answer.detach();
}

Regardless of what outputs are returned by the runner, all of them except answer will be closed.

How is the answer.detach() method supposed to know from which scope to detach itself? It looks like it causes itself to be detached from all scopes, but what if the user wishes to have it detached from only one of them and not all of them? Also, if I'm reading this correctly, when we close() a scope, it closes all its tensors, regardless of whether they are owned by other scopes or not. That doesn't sound too useful. If the idea is to make things the least complicated possible without introducing "features" like this that are not going to be used, we could make this a lot simpler by just ensuring that any given tensor belongs to, at most, a single scope.

@rnett
Copy link
Contributor Author

rnett commented Jan 16, 2021

The only time you get a tensor attached to more than one scope is if you manually add it to a new one, which is mostly when it was detached in the past, or you want to manually give it a shorter lifetime. Remember this is a lexical scope, not a persistent resource owner. The only times you have multiple scopes in play, they will be nested, and an inner scope naturally expires when it's containing scope does. There shouldn't be "overlapping" lifespans.

detach() is intended as an "I'm going to manually manage this tensor now" method. I do think a "detach and attach to parent" method would be useful, but it would be hard to implement with the current queue based setup. I'll look at alternatives. At minimum I can add moveTo methods that detach and then attach to this scope.

@saudet
Copy link
Contributor

saudet commented Jan 16, 2021

Ok, let me try to ask you this a different way: If we're not supposed to have the same tensor attached to multiple scopes, why try to support that in the first place?

@rnett
Copy link
Contributor Author

rnett commented Jan 16, 2021

Ok, let me try to ask you this a different way: If we're not supposed to have the same tensor attached to multiple scopes, why try to support that in the first place?

I can't think of a use case, I'm pretty sure it's allowed because I didn't see a reason to explicitly forbid it. I'll do that now, though, it does make things clearer.

@saudet
Copy link
Contributor

saudet commented Jan 16, 2021

The reason for guaranteeing that we have only a single known reference to an object is that deallocating it won't have any unintended consequences.

@rnett
Copy link
Contributor Author

rnett commented Jan 16, 2021

Ok, I pushed a refactor with a few things:

  • Tensors can only have one scope. When attached to a new scope, they are detached from their old scope if they have one.
  • releaseToParent to release all child scopes and tensors to the parent scope, and close this scope. There's attachToParent in Tensor which does the same thing for a single tensor, and release(boolean requireParent) which can release resources even without a parent.
  • Hierarchy is done via storing parent and children, which works much better with threads. The parent will be detected at creation, but children can be on any thread, and the current scope is inherited on thread creation.
  • No more tensor management in EagerSession. If you're converting operands to tensors, you should manage them with a TensorScope just like anywhere else.

@Craigacp
Copy link
Collaborator

Craigacp commented Jan 16, 2021

Ok, so I think I understand what you're aiming for a little better. However I don't understand why introducing the notion of scoping with it's implicit magic that checks the valid scopes is better than having an AutoCloseable collection of Tensor which has factory methods for creating tensors, and methods for transferring ownership between collections and from an unowned Tensor to a collection. That would be more straightforwardly Java, not require any of the thread local tricks and should be easier to reason about because all the operations are obvious given the API.

What's the positive case for TensorScope as opposed to something simpler like a closeable collection?

@rnett
Copy link
Contributor Author

rnett commented Jan 16, 2021

You mean like TensorMap/TensorList? They aren't mutually exclusive, and indeed tensor collections will work very well with TensorScope if they implement HasTensors, and decently well otherwise. TensorScope has the advantage of not requiring knowledge of all the tensors or tensor collections that are created. It's "explicitly keep stuff" vs "explicitly close stuff".

I've gotten rid of most of the thread local tricks, too, it's a good bit easier to reason about.

If you mean something more factory like, the biggest reason I can see off the top of my head is to keep resource control in the hands of the end user. If we use a factory, we have to pass it through everywhere, and if you don't somewhere, there's no way to capture or manipulate those tensors after the fact (assuming you can't edit the API). In theory it's fine, but in practice it's easy to get burnt by someone else's poorly designed APIs. Especially given that a lot of deep learning code is written as part of research, and not necessarily the best quality by software engineering metrics.

The implicitness is there anyways from EagerSession, this just extracts out the tensor management part to a single dedicated scope.

cc @karllessard, we talked about this during the call and you have a better idea of the overall API direction than I do.

@saudet
Copy link
Contributor

saudet commented Jan 16, 2021 via email

@Craigacp
Copy link
Collaborator

You mean like TensorMap/TensorList? They aren't mutually exclusive, and indeed tensor collections will work very well with TensorScope if they implement HasTensors, and decently well otherwise. TensorScope has the advantage of not requiring knowledge of all the tensors or tensor collections that are created. It's "explicitly keep stuff" vs "explicitly close stuff".

I've gotten rid of most of the thread local tricks, too, it's a good bit easier to reason about.

I'd prefer to have as little implicit behaviour as possible. With this PR any method that returns a Tensor could have the state of that Tensor changed depending on the calling context, and that change could come from arbitrarily deep in the call stack. This is true across library boundaries, packages, modules. Explicitly closing a Tensor collection using try-with-resources will get the same bounded lifetimes as a scope, with the downside that the user has to write a small amount of extra code (e.g. they need to have any tensors added to the collection). I don't think that this downside is too big an issue given the benefits of no implicit behaviour. Most IDEs and linters will remind users to close objects which implement AutoCloseable anyway so there shouldn't be too many free floating tensors that aren't closed. Alternatively it could be made explicit that tensors need to be added to TensorScope rather than implicitly added. That's pretty much what I mean by the tensor collection anyway, as the collection accessors are probably of limited use unless they line up with the inputs of a function.

I'm not sure if implementing List would be the best idea, as it has a large surface (e.g. ListIterator, sublists). Map is a little more straightforward, but this is probably a separate design conversation.

If you mean something more factory like, the biggest reason I can see off the top of my head is to keep resource control in the hands of the end user. If we use a factory, we have to pass it through everywhere, and if you don't somewhere, there's no way to capture or manipulate those tensors after the fact (assuming you can't edit the API). In theory it's fine, but in practice it's easy to get burnt by someone else's poorly designed APIs. Especially given that a lot of deep learning code is written as part of research, and not necessarily the best quality by software engineering metrics.

The fact that lots of deep learning code is written by researchers is probably less relevant to TF-Java as it's not as likely to be used in academia, but I agree that it's important to have well designed APIs that are easy to use. My point about factories was that a Tensor collection could have methods which create Tensor objects on it which are automatically added to it. I'm not sure if that's actually worthwhile given the ndarray & TType work as it might be tricky to make type-safe without an explosion in the number of methods or classes, but it's a way of making the ownership automatic and explicit.

@rnett
Copy link
Contributor Author

rnett commented Jan 17, 2021

Given that each Tensor should belong to only one scope, what's the point of keeping track of "parents" and "children"? What's the use case?

It's just implementation details, it replaces the queue. It's necessary for the "parents close their children" behavior and for reverting the current scope when a scope is closed.

@rnett
Copy link
Contributor Author

rnett commented Jan 17, 2021

I'd prefer to have as little implicit behavior as possible. With this PR any method that returns a Tensor could have the state of that Tensor changed depending on the calling context, and that change could come from arbitrarily deep in the call stack. This is true across library boundaries, packages, modules. Explicitly closing a Tensor collection using try-with-resources will get the same bounded lifetimes as a scope, with the downside that the user has to write a small amount of extra code (e.g. they need to have any tensors added to the collection). I don't think that this downside is too big an issue given the benefits of no implicit behaviour. Most IDEs and linters will remind users to close objects which implement AutoCloseable anyway so there shouldn't be too many free floating tensors that aren't closed. Alternatively it could be made explicit that tensors need to be added to TensorScope rather than implicitly added. That's pretty much what I mean by the tensor collection anyway, as the collection accessors are probably of limited use unless they line up with the inputs of a function.

I agree with you broadly about explicitness, I just think it makes more sense to require the rare and dangerous case (i.e. a detached tensor) to be made explicit, rather than the default/safe case. The callee can opt out of the automatic management using detach, this just requires them to be explicit about it. And while the scoping is a bit implicit, lexical scoping is well defined and common enough I don't think it's a huge burden. It's one of the reasons I wend with a try-with-resources based API over storing scopes as fields, because the second is very implicit and hard to reason about. And note that the only implicit state changes are almost entirely hidden from the user, and can be checked with isAttached(). 99% of the time the writer of a function doesn't need to care about whether they are in a scope or not (and I can't think of examples for the 1%, or I'd have fixed them).

I'm not too worried about the extra boilerplate of manual lifetime management, although it is annoying. What I don't like is that it requires the user to be aware of every created tensor (to add them to the collection), or forces them to assume that 3rd party code is correct. It's a lot of mental overhead when reasoning about the code, and it's easy to have one slip through the cracks, so to speak (there's a reason memory bugs are so common in non-GC languages). The automatic attachment is more of a safety net than anything, to prevent the end user from needing to keep track of every tensor. In high memory use cases, they are going to want to manage tensors manually anyways, rather than waiting for the end of the scope.

I'm not sure if implementing List would be the best idea, as it has a large surface (e.g. ListIterator, sublists). Map is a little more straightforward, but this is probably a separate design conversation.

Yeah, I'm just drawing from the examples I saw in the type refactor code at one point.

@Craigacp
Copy link
Collaborator

I'd prefer to have as little implicit behavior as possible. With this PR any method that returns a Tensor could have the state of that Tensor changed depending on the calling context, and that change could come from arbitrarily deep in the call stack. This is true across library boundaries, packages, modules. Explicitly closing a Tensor collection using try-with-resources will get the same bounded lifetimes as a scope, with the downside that the user has to write a small amount of extra code (e.g. they need to have any tensors added to the collection). I don't think that this downside is too big an issue given the benefits of no implicit behaviour. Most IDEs and linters will remind users to close objects which implement AutoCloseable anyway so there shouldn't be too many free floating tensors that aren't closed. Alternatively it could be made explicit that tensors need to be added to TensorScope rather than implicitly added. That's pretty much what I mean by the tensor collection anyway, as the collection accessors are probably of limited use unless they line up with the inputs of a function.

I agree with you broadly about explicitness, I just think it makes more sense to require the rare and dangerous case (i.e. a detached tensor) to be made explicit, rather than the default/safe case. The callee can opt out of the automatic management using detach, this just requires them to be explicit about it. And while the scoping is a bit implicit, lexical scoping is well defined and common enough I don't think it's a huge burden. It's one of the reasons I wend with a try-with-resources based API over storing scopes as fields, because the second is very implicit and hard to reason about. And note that the only implicit state changes are almost entirely hidden from the user, and can be checked with isAttached(). 99% of the time the writer of a function doesn't need to care about whether they are in a scope or not (and I can't think of examples for the 1%, or I'd have fixed them).

I think the appeal to lexical scoping is missing that the scope tracks through method calls. If I open a scope, then call into a library to convert my input data into tensors, which then calls into another library, then those tensors get attached to the scope. The library developer doesn't know how to manage the tensors they create, as their methods could be called within a scope or outside a scope. Library code shouldn't have its state changed implicitly by things in the calling environment, it breaks the encapsulation.

I'm not too worried about the extra boilerplate of manual lifetime management, although it is annoying. What I don't like is that it requires the user to be aware of every created tensor (to add them to the collection), or forces them to assume that 3rd party code is correct. It's a lot of mental overhead when reasoning about the code, and it's easy to have one slip through the cracks, so to speak (there's a reason memory bugs are so common in non-GC languages). The automatic attachment is more of a safety net than anything, to prevent the end user from needing to keep track of every tensor. In high memory use cases, they are going to want to manage tensors manually anyways, rather than waiting for the end of the scope.

The automatic attachment is only a safety net if people use it. We can't enforce it, and users can't see how it works in their code because it's implicit meaning it might be refactored around because it uses mechanisms not present elsewhere in Java. Java doesn't have scope variables that implicitly alter the state of enclosed code. As I said, most IDEs and linters will flag AutoCloseable objects that haven't been closed anyway. With the scopes (and probably with collections) it's likely that they'll flag all the tensors that have been created because they can't see that the scope will close them creating a bunch of extra noise.

@karllessard
Copy link
Collaborator

@rnett :

What I don't like is that it requires the user to be aware of every created tensor (to add them to the collection), or forces them to assume that 3rd party code is correct. It's a lot of mental overhead when reasoning about the code, and it's easy to have one slip through the cracks

This is a great summary of the issue we want to tackle here. For sure, the usage of a thread-local stack is very tempting, I also had in mind other cases which would also benefit from it, for example to carry the last instance of Ops created by a user. This would avoid passing that variable around like it is done here.

On the other side, I agree with @Craigacp that being to implicit in some cases might eventually lead to more issues and confusions for the developers, for all the reasons he pointed out.

It seems though that we all agree that we need some sort of tensor autocloseable container, like proposed in #167 . I continue to think though that the solution presented in that PR is too session-centric and should be redesigned to consider all possible use cases for grouping tensors together, i.e.:

  • outputs of a session run
  • outputs of a function
  • inputs to a function
  • collection of tensors resolved from an eager execution
  • any arbitrary group of tensors the user might want to create

Doing this might not remove the need of having some sort of TensorScope later but that could probably done as a second step, depending on how the tensor containers will behave and how good/bad the experience of resource management will be by using them.

@saudet
Copy link
Contributor

saudet commented Jan 18, 2021

Given that each Tensor should belong to only one scope, what's the point of keeping track of "parents" and "children"? What's the use case?

It's just implementation details, it replaces the queue. It's necessary for the "parents close their children" behavior and for reverting the current scope when a scope is closed.

I mean, I was asking for a use case, what is the use case you have in mind? The only thing I can see is something like this:

try (TensorScope scope1 = new TensorScope()) {
    try (TensorScope scope2 = new TensorScope()) {
          Result result = session.runner().run();
          answer = result.get(1);
          answer.releaseToParent();
    }
}

But we don't need to have TensorScope babysit this for us by tracking everything. The following works just fine by doing it manually, and it's the same number of lines:

try (TensorScope scope1 = new TensorScope()) {
    try (TensorScope scope2 = new TensorScope()) {
          Result result = session.runner().run();
          answer = result.get(1);
          answer.attach(scope1);
    }
}

What do you mean by "reverting"? In any case, parents don't need to close their children, because they get closed before anyway. So, I'm asking you again, what's the use case? I don't see it!

I'm coming at this from another angle, but allowing only a single scope to own any given tensor makes the use of thread-local storage a bit moot. The code above would work exactly the same, correct? We don't get an implicit attach to scope2, but nothing gets implicitly attached anyway when no scope is found on the stack. To solve both cases, users need to be aware of TensorScope anyway!

BTW, the reason I'm using thread-local storage for PointerScope is because JavaCPP needs to map C++ APIs that were designed with a stack in mind, but here we are in control of the API, so we can design something that makes more sense for the Java language.

@rnett
Copy link
Contributor Author

rnett commented Jan 18, 2021

@Craigacp

I think the appeal to lexical scoping is missing that the scope tracks through method calls. If I open a scope, then call into a library to convert my input data into tensors, which then calls into another library, then those tensors get attached to the scope. The library developer doesn't know how to manage the tensors they create, as their methods could be called within a scope or outside a scope. Library code shouldn't have its state changed implicitly by things in the calling environment, it breaks the encapsulation.

Agreed. I need to change the API little to do this, but I want TensorScope to be entirely transparent. You shouldn't need to know whether your method is getting called in a scope or outside of it, because yeah, that would be bad

The way I envision it, there's three possible resource management states for tensors you're working with when writing a method: managed by me with a scope, explicitly detached from all scopes, or the caller's responsibility. They roughly map to attach, detach, and release (which doesn't exist yet, but is basically attachToParent(false). So a method could look like:

public TFloat32 runModel(TFloat32 inputA){
  try(TensorScope scope = new TensorScope()){
    TFloat32 inputB = getInputB();
    TFloat32 result = session.runner().feed("a", inputA).feed("b", inputB).fetch("result).run().get(0);
    result.release();
    return result;
  }
}

This works the same regardless of whether it's called in a scope or not. inputB will be managed by the method's scope and closed on return (assuming getInputB() follows this, which it should), result is the caller's responsibility (will be part of it's scope if it's using one, or detached if not).

The "default" state of the tensors, to to speak, is released, i.e. resource management is left up to the parent. The callee can take control of the management either by using a TensorScope (attach) or detaching their tensors. But since that changes the behavior of the method, it should be explicit. Regardless of which method is used, the behavior inside the method is the same regardless of what the caller is doing wrt. tensor management.

The automatic attachment is only a safety net if people use it. We can't enforce it, and users can't see how it works in their code because it's implicit meaning it might be refactored around because it uses mechanisms not present elsewhere in Java. Java doesn't have scope variables that implicitly alter the state of enclosed code. As I said, most IDEs and linters will flag AutoCloseable objects that haven't been closed anyway. With the scopes (and probably with collections) it's likely that they'll flag all the tensors that have been created because they can't see that the scope will close them creating a bunch of extra noise.

I haven't actually seen any AutoClosable warnings in IntelliJ, which now that I think about it is weird, because I would expect to. We could have Tensor and HasTensors not implement AutoClosable, but given Eclipse at least seems like it would only flag these issues as potential resource leaks (see here) I don't think it's a huge issue, although definitely not ideal.

The only refactorings that I can think of that would cause issues is moving statements outside of the try block, which I would think would be avoided anyways, because that would change semantics of any code. And yeah, we can't make people use it, but we shouldn't. They are free to manually manage lifetimes if they wish to.

@karllessard

This is a great summary of the issue we want to tackle here. For sure, the usage of a thread-local stack is very tempting, I also had in mind other cases which would also benefit from it, for example to carry the last instance of Ops created by a user. This would avoid passing that variable around like it is done here.

That would be interesting, although we'd have to be careful about mixing execution environments (like with tf.function style graph in eager code). One thing I don't like about that use case is that methods that depend on an Ops instance will fail without it, whereas for TensorScope they wills still work fine regardless of whether it's being called from within a scope or not (which is the issue Adam brought up).

On the other side, I agree with @Craigacp that being to implicit in some cases might eventually lead to more issues and confusions for the developers, for all the reasons he pointed out.

See above, which mitigates some (most?) of it. There's still a bit of implicitness, but it should be transparent to callers and callees.

It seems though that we all agree that we need some sort of tensor autocloseable container, like proposed in #167 . I continue to think though that the solution presented in that PR is too session-centric and should be redesigned to consider all possible use cases for grouping tensors together, i.e.:

  • outputs of a session run
  • outputs of a function
  • inputs to a function
  • collection of tensors resolved from an eager execution
  • any arbitrary group of tensors the user might want to create

Doing this might not remove the need of having some sort of TensorScope later but that could probably done as a second step, depending on how the tensor containers will behave and how good/bad the experience of resource management will be by using them.

My issue with the idea of using a "container", if I understand what you're thinking of right, is that you still have to know about all the tensors being used, some of them are just grouped. Sure, it makes it easier, but you still have to juggle them mentally, there's just less of them.

R.e. Session.Result, that class is really doing two entirely separate things: providing a Output -> Tensor mapping based on the run's fetches, and managing tensor lifetimes. Part of what I like about TensorScope, and would ideally want out of any solution for tensor management, is that it separates these. You can make collections of tensors, or even just data classes, without having to care about lifetimes. Currently the lifetime management tends to infect any code that uses tensors, which imo should be avoided. Ideally, Session.Result should just be the Output -> Tensor mapping and friendly accessors.

@saudet

The issue with your second example is that it assumes the parent scope is always in (the language) scope. If you wrapped your inner scope in a method, what you did wouldn't work, which is why I'm saving the parent.

What I mean by reseting the current scope is that since I'm using a single variable instead of a scope, when an inner scope closes the outer scope needs to go back to being the current scope. It's like

Scope old = current;
block();
current = old;

Plus, with the issues Adam brought up regarding encapsulation and transparency, the API shouldn't change depending on if you're in a scope or not, so the "do I have a parent" question needs to be answered internally and hidden from the user.

For thread local, imagine something like:

ExecutorService e = ...;
try(TensorScope scope1 = new TensorScope()){
  e.submit(() -> {
   someTensorStuff1();
  });
  e.submit(() -> {
    try(TensorScope scope2 = new TensorScope()){
     someTensorStuff2();
    }
  });
}

The launched tasks should execute their code "in" scope1, and scope2 should be a child/inner scope of scope1. Also, tensors created in someTensorStuff1 shouldn't be captured by scope2, so the current scope at least has to be thread local.

@saudet
Copy link
Contributor

saudet commented Jan 19, 2021

The issue with your second example is that it assumes the parent scope is always in (the language) scope. If you wrapped your inner scope in a method, what you did wouldn't work, which is why I'm saving the parent.

That's not actually a feature, that's a bug. We can work around limitations of the Java language using thread-local storage, but altering the state like that is a bug. That's kind of what old-style APIs like OpenGL used to do with a state machine, which they got rid of with Vulkan. There are at least 2 sane ways to go about it:

  1. Passing the "output scope" as an argument
  2. Detaching any objects returned, letting the caller do what they want with them

What I mean by reseting the current scope is that since I'm using a single variable instead of a scope, when an inner scope closes the outer scope needs to go back to being the current scope. It's like

Scope old = current;
block();
current = old;

Ok, but what does that get us, concretely? Why does it matter? If you're worried about having to call attach() manually 1000 times, then let's see what we can do about that. Maybe pass the scope to sessions and let them do the attaching for us? Maybe have the session return a scope/group, in which case maybe we can name it TensorGroup?

Plus, with the issues Adam brought up regarding encapsulation and transparency, the API shouldn't change depending on if you're in a scope or not, so the "do I have a parent" question needs to be answered internally and hidden from the user.

For thread local, imagine something like:

ExecutorService e = ...;
try(TensorScope scope1 = new TensorScope()){
  e.submit(() -> {
   someTensorStuff1();
  });
  e.submit(() -> {
    try(TensorScope scope2 = new TensorScope()){
     someTensorStuff2();
    }
  });
}

The launched tasks should execute their code "in" scope1, and scope2 should be a child/inner scope of scope1. Also, tensors created in someTensorStuff1 shouldn't be captured by scope2, so the current scope at least has to be thread local.

I think this works fine without thread-local storage:

ExecutorService e = ...;
try(TensorScope scope1 = new TensorScope()){
  e.submit(() -> {
   someTensorStuff1(scope1);
  });
  e.submit(() -> {
    try(TensorScope scope2 = new TensorScope()){
     someTensorStuff2(scope2);
    }
  });
}

What am I missing?

rnett added 17 commits March 4, 2021 14:25
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
Signed-off-by: Ryan Nett <rnett@calpoly.edu>
@rnett
Copy link
Contributor Author

rnett commented Mar 4, 2021

So @rnett , it looks like we are good to merge this PR, or were you planning to add/change something from the previous feedbacks? You'll need to rebase it though.

I keep telling you guys that this doesn't work when noPointerGC is enabled... Please fix this.

You mean because the weak refs will be dropped but the deallocation won't be done on the JavaCPP side? My thinking is to make the weakness optional, with it being strong by default if noPointerGC is true. Is that ever exposed anywhere?

Signed-off-by: Ryan Nett <rnett@calpoly.edu>
@rnett rnett force-pushed the rn_tensor_scope branch from a66ca55 to 46645ee Compare March 4, 2021 22:41
@saudet
Copy link
Contributor

saudet commented Mar 5, 2021

I agree that TensorScope could be made more generic and renamed to something like ResourceScope. It looks like the only dependency on Tensor inside it is to detach/attach a tensor to its own internal scope when attaching/detaching it to another scope.

This is doable, but the Resource interface would have to expose it's current scope for detaching, which is something we may not want as a public API. It would need to be settable as well, which I really don't like exposing. We could do better with an abstract class but that has all other sorts of issues.

That's the kind of thing I've been arguing about. If we don't make attach()/detach() a part of the API, we don't need to worry about these things!

If we really need access to that internal pointer/scope, we could define a new interface type (NativeResource?) that extends from AutoCloseable and exposes some generic methods like Pointer pointer(), PointerScope pointerScope(), etc. to become completely independent from Tensor inside ResourceScope. Normally, that could be reused as is for other native resources we would like to manage with scopes in the future, like eager operands, even if this is encapsulated by the Ops.

I don't actually need the PointerScope, but I do need the current ResourceScope for detaching (which needs to be settable) and a close method. See my above comment.

If we don't allow detaching, we don't need to worry about that!

@rnett wdyt? I know this could kind of slow down a bit again the merge of this PR but memory management is really on the critical path of TF Java and we better take our time for doing it well, let me know if you need help.

It's a pretty easy change if we're ok with exposing things and I agree it makes the code better, but the only other thing I foresee being tracked this way is eager operands, I don't think anything else is memory heavy enough. Given that there's only two uses imo it's not worth having to expose the Resource's scope setters, unless someone can think of a way around that.

Would you still foresee the need to expose anything if we don't allow users to attach/detach stuff?

You mean because the weak refs will be dropped but the deallocation won't be done on the JavaCPP side? My thinking is to make the weakness optional, with it being strong by default if noPointerGC is true. Is that ever exposed anywhere?

There's that, but simply keeping weak references to pointers also doesn't work with GC anyway. The JVM needs at least 2 passes to clear weak and phantom references. We can (and will) lose access to weakly referenced pointers before we can get the chance to free them, deterministically, and then free the context they belong to, so memory will leak. We need to free the ops before the context gets freed. There is no way to enforce this ordering of events using GC. I think the original approach that was done for TF 1.x does work correctly tough, see #229 (comment):

Right, delete() and deleteAll() appear synchronized enough that calling them concurrently should be fine. Looking more closely, I think the only issue is when close() doesn't get called, the context itself never gets deleted. Anyway, if we wanted to do something similar to that with JavaCPP, I'd synchronize the deallocate() method here:
https://github.com/tensorflow/java/blob/master/tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/internal/c_api/AbstractTFE_Op.java#L30
That gets assigned to Pointer.deallocator, which we can retrieve with the protected deallocator() method from those subclasses, keep strong references to them, and call deallocate() manually on them when desired. That should work well regardless of whether noPointerGC is enabled or not.

@rnett
Copy link
Contributor Author

rnett commented Mar 8, 2021

Would you still foresee the need to expose anything if we don't allow users to attach/detach stuff?

If we only ever assigned/attached to scopes on creation, we could get rid of these. We'd need a copyToNewScope like you suggested. That would essentially be reference counting though, so we might as well just go all the way (although that may require exposing the reference count methods, with the same issues). Another point for reference counting over copying is that I'd like the tensor scope to match the operand scope and copying operands is not easily doable. I'm gonna make another reply about reference counting in general.

You mean because the weak refs will be dropped but the deallocation won't be done on the JavaCPP side? My thinking is to make the weakness optional, with it being strong by default if noPointerGC is true. Is that ever exposed anywhere?

There's that, but simply keeping weak references to pointers also doesn't work with GC anyway. The JVM needs at least 2 passes to clear weak and phantom references. We can (and will) lose access to weakly referenced pointers before we can get the chance to free them, deterministically, and then free the context they belong to, so memory will leak. We need to free the ops before the context gets freed. There is no way to enforce this ordering of events using GC. I think the original approach that was done for TF 1.x does work correctly tough, see #229 (comment):

Right, delete() and deleteAll() appear synchronized enough that calling them concurrently should be fine. Looking more closely, I think the only issue is when close() doesn't get called, the context itself never gets deleted. Anyway, if we wanted to do something similar to that with JavaCPP, I'd synchronize the deallocate() method here:
https://github.com/tensorflow/java/blob/master/tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/internal/c_api/AbstractTFE_Op.java#L30
That gets assigned to Pointer.deallocator, which we can retrieve with the protected deallocator() method from those subclasses, keep strong references to them, and call deallocate() manually on them when desired. That should work well regardless of whether noPointerGC is enabled or not.

I'm still not entirely understanding this. You're saying that we may have a tensor get GC'd and then have the scope closed, where the pointer deallocate and the scope closing deallocate go off at the same time because the weak reference isn't removed in time? What are you referring to with "context"? There's nothing that needs to be freed other than tensors.

@rnett
Copy link
Contributor Author

rnett commented Mar 9, 2021

cc @karllessard @Craigacp

For reference counting more generally, the motivating example for me is something like:

class KeepTensor(var tensor: Tensor?)

fun main(){
  val keep = KeepTensor(null)
  TensorScope().use{
    keep.tensor = makeTensor(this)
  }
  // keep.tensor is now closed
}

This is very contrived, but the example came from me wondering if ConstantInitializer could use tensors instead of the value type enum. Variable and anything holding onto Operands would have this issue too. In cases where you want a tensor to match the object lifetime there's no good solution. Consider if KeepTensor was implemented like:

class KeepTensor(tensor: Tensor?){
  private val scope = TensorScope()
  var tensor: Tensor? = tensor
    set(value) {
      field = value?.let(scope::attach)
    }
}

While this works in most cases, if we change our example to something like:

fun main(){
  val keep = KeepTensor(null)
  val keep2 = KeepTensor(null)
  TensorScope().use{
    Tensor temp = makeTensor(this)
    keep.tensor = temp
    keep2.tensor = temp
  }
}

Since we only allow attaching to one scope, the tensor will only be attached to keep2's scope, and may be closed while keep is still active. This happens whenever a tensor is attached/detached by the caller, after being stored somewhere in a called function. Now, if we had a copyInScope(TensorScope) method, it would nicely solve this, at least for pure tensors. However it doesn't work well with collections or containers, or with Operands, and I would like the Operand scope to match the tensor one (at least, ideally they would be the same). Reference counting is a better solution here, and is what copying would be doing anyways (via PointerScope).


Implementation wise, if we want to generify it to Resource (which I think is a good idea), Resource would have to track and expose reference counters, which I don't like as it has all the same problems as exposing the scope. However, we could instead have scopes track their members and track all active scopes, using that to get the reference count instead. If we make the scope tracking weak, this also nicely removes references from GC'd scopes, although it won't necessarily clean up their now 0-reference resources (that would require another Cleaner like object and thread, but it's a backstop anyways so imo it's not worth that. Although if/when we add a Cleaner for something else we could use it here). Given that we shouldn't have a huge amount of scopes open, this should be fairly cheap (it's linear wrt the number of open scopes). We could also provide a nice CompositeResource implementes Resource base class for resources that have resources (i.e. tensor collections) that has a scope and protected attach methods, and implements close via that scope.

@rnett
Copy link
Contributor Author

rnett commented Mar 11, 2021

I'd also like to add a "weak" version of the resource counter scope that doesn't increment the count, just tries to free things (so if another scope uses and then discards them, they will be discarded then). This would be useful in places like initScope or EagerSession where we don't want to keep things alive unnecessarily.

@saudet
Copy link
Contributor

saudet commented Mar 11, 2021

Would you still foresee the need to expose anything if we don't allow users to attach/detach stuff?

If we only ever assigned/attached to scopes on creation, we could get rid of these. We'd need a copyToNewScope like you suggested. That would essentially be reference counting though, so we might as well just go all the way (although that may require exposing the reference count methods, with the same issues). Another point for reference counting over copying is that I'd like the tensor scope to match the operand scope and copying operands is not easily doable. I'm gonna make another reply about reference counting in general.

If the copyToNewScope() refers to the creation of references and doing reference counting, yes, I think that's all we would need, although I think I'd prefer to call it more something like Tensor.newReference(newScope). I don't see the need to make anything else public or expose anything about reference counting either.

You mean because the weak refs will be dropped but the deallocation won't be done on the JavaCPP side? My thinking is to make the weakness optional, with it being strong by default if noPointerGC is true. Is that ever exposed anywhere?

There's that, but simply keeping weak references to pointers also doesn't work with GC anyway. The JVM needs at least 2 passes to clear weak and phantom references. We can (and will) lose access to weakly referenced pointers before we can get the chance to free them, deterministically, and then free the context they belong to, so memory will leak. We need to free the ops before the context gets freed. There is no way to enforce this ordering of events using GC. I think the original approach that was done for TF 1.x does work correctly tough, see #229 (comment):

Right, delete() and deleteAll() appear synchronized enough that calling them concurrently should be fine. Looking more closely, I think the only issue is when close() doesn't get called, the context itself never gets deleted. Anyway, if we wanted to do something similar to that with JavaCPP, I'd synchronize the deallocate() method here:
https://github.com/tensorflow/java/blob/master/tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/internal/c_api/AbstractTFE_Op.java#L30
That gets assigned to Pointer.deallocator, which we can retrieve with the protected deallocator() method from those subclasses, keep strong references to them, and call deallocate() manually on them when desired. That should work well regardless of whether noPointerGC is enabled or not.

I'm still not entirely understanding this. You're saying that we may have a tensor get GC'd and then have the scope closed, where the pointer deallocate and the scope closing deallocate go off at the same time because the weak reference isn't removed in time? What are you referring to with "context"? There's nothing that needs to be freed other than tensors.

No, I'm saying that an object may get deallocated after the scope is closed. So, you will have live objects even after the scope is closed. You could argue that this isn't a big issue for tensors that have no context, but it renders the use of "scope" moot. If we can't even guarantee when everything is going to get deallocated we might as well have no scope at all! Moreover, in the case of tensors and ops that do have a context, it is a big deal:

// "Context" under which operations/functions are executed. It encapsulates
// things like the available devices, resource manager etc.
// TFE_Context must outlive all tensor handles created using it. In other
// words, TFE_DeleteContext() must be called after all tensor handles have
// been deleted (with TFE_DeleteTensorHandle).

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/c/eager/c_api.h#L92

Implementation wise, if we want to generify it to Resource (which I think is a good idea), Resource would have to track and expose reference counters, which I don't like as it has all the same problems as exposing the scope. However, we could instead have scopes track their members and track all active scopes, using that to get the reference count instead. If we make the scope tracking weak, this also nicely removes references from GC'd scopes, although it won't necessarily clean up their now 0-reference resources (that would require another Cleaner like object and thread, but it's a backstop anyways so imo it's not worth that. Although if/when we add a Cleaner for something else we could use it here). Given that we shouldn't have a huge amount of scopes open, this should be fairly cheap (it's linear wrt the number of open scopes). We could also provide a nice CompositeResource implementes Resource base class for resources that have resources (i.e. tensor collections) that has a scope and protected attach methods, and implements close via that scope.

The only thing that needs reference counting in there are the underlying Pointer objects, and we can easily call retainReference() and releaseReference() on those for reference counting, without using PointerScope if that proves easier to manage. I don't see the need to expose any of that to the API. It could look like this and (probably) work correctly (not sure about all the details though):

try (Scope outerScope = Scope.create()) {
    Tensor t;
    try (EagerSession s = EagerSession.create()) {
      EagerOperation add =
          opBuilder(session, "Add", "ReturnResult")
              .addInput(tf.constant(2).asOutput())
              .addInput(tf.constant(4).asOutput())
              .build();
      t = add.tensor(0).newReference(outerScope);
    }
}

Precisely what do you feel would be missing here?

I'd also like to add a "weak" version of the resource counter scope that doesn't increment the count, just tries to free things (so if another scope uses and then discards them, they will be discarded then). This would be useful in places like initScope or EagerSession where we don't want to keep things alive unnecessarily.

You mean forced deallocation like a scope.deallocate()? Sure, why not, although it should be clearly documented as unsafe...

@rnett
Copy link
Contributor Author

rnett commented Mar 11, 2021

If the copyToNewScope() refers to the creation of references and doing reference counting, yes, I think that's all we would need, although I think I'd prefer to call it more something like Tensor.newReference(newScope). I don't see the need to make anything else public or expose anything about reference counting either.

It would work fine for just tensors, but doesn't work well for more generic Resources for the reasons I outlined in one of the last few comments (either creation would need to be handled in the scope, or attaching in the object, and it leaves resource counting to the user, which isn't an issue for Pointers but could be with other things). Requiring all resources have a copy/clone style method is non-trivial.

No, I'm saying that an object may get deallocated after the scope is closed. So, you will have live objects even after the scope is closed. You could argue that this isn't a big issue for tensors that have no context, but it renders the use of "scope" moot. If we can't even guarantee when everything is going to get deallocated we might as well have no scope at all! Moreover, in the case of tensors and ops that do have a context, it is a big deal:

// "Context" under which operations/functions are executed. It encapsulates
// things like the available devices, resource manager etc.
// TFE_Context must outlive all tensor handles created using it. In other
// words, TFE_DeleteContext() must be called after all tensor handles have
// been deleted (with TFE_DeleteTensorHandle).

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/c/eager/c_api.h#L92

You're saying that it's possible for a tensor to be marked weak (and removed from the set), but added to the deallotator reference key after the context, even though the context is still alive when the scope is closed? So you're suggesting we synchronizing the deallocator and making it do nothing if the object is already deallocated, and then store the deallocator and call them before session/scope close (so the objects can still be GC'd, but we are guaranteed all deallocators fire).

That said, this is a non issue because the docs lie. I went and tested it, and this:

val s = EagerSession.create()
val tf = Ops.create(s)
val a: Operand<TInt32> = tf.math.add(tf.constant(1), tf.constant(2))
TFE_DeleteContext(s.nativeHandle)
println(a.asTensor().getInt())

works fine, it even prints correctly. Trying to create new ops after deletion does fail, but that's as intended.

In general, I think scopes are fine to not be strict about this. You may have a resource close slightly after the scope, and we can note that in the docs, but requiring every resource to define a non-self-referencing deallocator is a lot of overhead when there aren't any situations that require it (eager context would be one if the above test didn't work). Resources won't leak indefinitely, there would just be a slight delay (if I'm understanding things right). If there is a need for strict resource -> context deallocation ordering it should be handled by the context. @karllessard @Craigacp are you ok with this (specifically not being strict on ordering)?

The only thing that needs reference counting in there are the underlying Pointer objects, and we can easily call retainReference() and releaseReference() on those for reference counting, without using PointerScope if that proves easier to manage. I don't see the need to expose any of that to the API. It could look like this and (probably) work correctly (not sure about all the details though):

try (Scope outerScope = Scope.create()) {
    Tensor t;
    try (EagerSession s = EagerSession.create()) {
      EagerOperation add =
          opBuilder(session, "Add", "ReturnResult")
              .addInput(tf.constant(2).asOutput())
              .addInput(tf.constant(4).asOutput())
              .build();
      t = add.tensor(0).newReference(outerScope);
    }
}

Precisely what do you feel would be missing here?

This works fine if all resources use Pointer internally, but that's not necessarily always the case. In particular, variables would be composite resources that hold onto a bunch of operands, and would have an internal scope to keep them alive, but would also themselves be Resources. Functions would be similar, holding on to multiple ConcreteFunctions.

You mean forced deallocation like a scope.deallocate()? Sure, why not, although it should be clearly documented as unsafe...

Nah, I mean something that tries to close everything attached to it if a) it has no references and b) isn't already closed, but doesn't increment the count itself. For cases like:

try(ResourceScope scope = new WeakResourceScope(){
  Tensor a = newTensor(scope);
  Tensor b = newTensor(scope);
  Composite c = new Composite(a, b);
  // stuff
  c.close()
}

where Composite does it's own scope management internally. In cases like this we would want a and b to be closed when c is, and not outlive it because the outer scope has a reference. And sure, in this example you could just close scope after c is created, but for longer lived scopes like init_scope that's not possible.

@saudet
Copy link
Contributor

saudet commented Mar 12, 2021

If the copyToNewScope() refers to the creation of references and doing reference counting, yes, I think that's all we would need, although I think I'd prefer to call it more something like Tensor.newReference(newScope). I don't see the need to make anything else public or expose anything about reference counting either.

It would work fine for just tensors, but doesn't work well for more generic Resources for the reasons I outlined in one of the last few comments (either creation would need to be handled in the scope, or attaching in the object, and it leaves resource counting to the user, which isn't an issue for Pointers but could be with other things). Requiring all resources have a copy/clone style method is non-trivial.

I'm not sure I see the issue, it feels to me like a pretty general way of doing things. It's just reference counting! It's what they do in the Python API. It works fine there, there's no reason we can't do everything the same way in Java. Maybe some things wouldn't be pretty, I don't know, but there's no reason it just wouldn't work. Would you have an concrete example of what you would like to do but that wouldn't work?

No, I'm saying that an object may get deallocated after the scope is closed. So, you will have live objects even after the scope is closed. You could argue that this isn't a big issue for tensors that have no context, but it renders the use of "scope" moot. If we can't even guarantee when everything is going to get deallocated we might as well have no scope at all! Moreover, in the case of tensors and ops that do have a context, it is a big deal:

// "Context" under which operations/functions are executed. It encapsulates
// things like the available devices, resource manager etc.
// TFE_Context must outlive all tensor handles created using it. In other
// words, TFE_DeleteContext() must be called after all tensor handles have
// been deleted (with TFE_DeleteTensorHandle).

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/c/eager/c_api.h#L92

You're saying that it's possible for a tensor to be marked weak (and removed from the set), but added to the deallotator reference key after the context, even though the context is still alive when the scope is closed? So you're suggesting we synchronizing the deallocator and making it do nothing if the object is already deallocated, and then store the deallocator and call them before session/scope close (so the objects can still be GC'd, but we are guaranteed all deallocators fire).

Yes, that's what the code in TF 1.x was doing. Why do you guys keep pushing back against what was being done in TF 1.x? I mean, I know it's convoluted, but at least it kind of sort of worked and did what you want to do. (I still think it's a bad idea in general, so I'm not going to care personally about this.)

That said, this is a non issue because the docs lie. I went and tested it, and this:

val s = EagerSession.create()
val tf = Ops.create(s)
val a: Operand<TInt32> = tf.math.add(tf.constant(1), tf.constant(2))
TFE_DeleteContext(s.nativeHandle)
println(a.asTensor().getInt())

works fine, it even prints correctly. Trying to create new ops after deletion does fail, but that's as intended.

In general, I think scopes are fine to not be strict about this. You may have a resource close slightly after the scope, and we can note that in the docs, but requiring every resource to define a non-self-referencing deallocator is a lot of overhead when there aren't any situations that require it (eager context would be one if the above test didn't work). Resources won't leak indefinitely, there would just be a slight delay (if I'm understanding things right). If there is a need for strict resource -> context deallocation ordering it should be handled by the context. @karllessard @Craigacp are you ok with this (specifically not being strict on ordering)?

Why do you want to play with fire when there's a perfectly valid way to work around this? And yes, resources can leak indefinitely, see, for example, https://blogs.oracle.com/javamagazine/epsilon-the-jdks-do-nothing-garbage-collector. That's the kind of thing that people using DJL are talking about. Like I already told @Craigacp, if that's not the kind of application that interests people here, they'll probably have to move away from TF Java and I'll help them get whatever they need with JavaCPP directly.

The only thing that needs reference counting in there are the underlying Pointer objects, and we can easily call retainReference() and releaseReference() on those for reference counting, without using PointerScope if that proves easier to manage. I don't see the need to expose any of that to the API. It could look like this and (probably) work correctly (not sure about all the details though):

try (Scope outerScope = Scope.create()) {
    Tensor t;
    try (EagerSession s = EagerSession.create()) {
      EagerOperation add =
          opBuilder(session, "Add", "ReturnResult")
              .addInput(tf.constant(2).asOutput())
              .addInput(tf.constant(4).asOutput())
              .build();
      t = add.tensor(0).newReference(outerScope);
    }
}

Precisely what do you feel would be missing here?

This works fine if all resources use Pointer internally, but that's not necessarily always the case. In particular, variables would be composite resources that hold onto a bunch of operands, and would have an internal scope to keep them alive, but would also themselves be Resources. Functions would be similar, holding on to multiple ConcreteFunctions.

Ok, so we should probably flesh this out a bit to figure out exactly what we want the thing to do. I don't see why having a detach() method alone would help with any of what you mention here.

You mean forced deallocation like a scope.deallocate()? Sure, why not, although it should be clearly documented as unsafe...

Nah, I mean something that tries to close everything attached to it if a) it has no references and b) isn't already closed, but doesn't increment the count itself. For cases like:

try(ResourceScope scope = new WeakResourceScope(){
  Tensor a = newTensor(scope);
  Tensor b = newTensor(scope);
  Composite c = new Composite(a, b);
  // stuff
  c.close()
}

where Composite does it's own scope management internally. In cases like this we would want a and b to be closed when c is, and not outlive it because the outer scope has a reference. And sure, in this example you could just close scope after c is created, but for longer lived scopes like init_scope that's not possible.

That sounds just like an implicit detach(). Panama isn't going in this direction (see openjdk/panama-foreign#466), so I'm not sure this is the kind of thing we should explore, but sure, like I said, we're only doing alpha releases for now, so we can experiment. I'd be fine with that.

@rnett
Copy link
Contributor Author

rnett commented Mar 13, 2021

I'm not sure I see the issue, it feels to me like a pretty general way of doing things. It's just reference counting! It's what they do in the Python API. It works fine there, there's no reason we can't do everything the same way in Java. Maybe some things wouldn't be pretty, I don't know, but there's no reason it just wouldn't work. Would you have an concrete example of what you would like to do but that wouldn't work?

It would work, it's not impossible to do, but it requires a) every resource implementing a copy method, and b) the constructor and copy method must accept a scope and attach to the scope. That's not something I want to force Resource classes to deal with when it can be handled as part of the scope. To be clear, it's the implementation I have issues with, the general concept is fine.

Why do you want to play with fire when there's a perfectly valid way to work around this? And yes, resources can leak indefinitely, see, for example, https://blogs.oracle.com/javamagazine/epsilon-the-jdks-do-nothing-garbage-collector. That's the kind of thing that people using DJL are talking about. Like I already told @Craigacp, if that's not the kind of application that interests people here, they'll probably have to move away from TF Java and I'll help them get whatever they need with JavaCPP directly.

If you're using a "Do nothing GC" but still relying on JavaCPP's GC cleanup, I don't think that's our issue. This would work fine with Epsilon as long as JavaCPP's noPointerGC is true (since they would be strong references). We can separate the strong references from noPointerGC a bit and provide an option to always use strong references in scopes that defaults to noPointerGC, which covers even the edge cases.

We could also make registering a deallocator optional, and use it for Pointer based classes, I just don't want to require it for every resource when it's almost always a non issue and is a decent amount of overhead to implement (and prone to confusing and hard to debug bugs if you keep the resource reference around).

Ok, so we should probably flesh this out a bit to figure out exactly what we want the thing to do. I don't see why having a detach() method alone would help with any of what you mention here.

I'm not proposing a detach method. This has nothing to do with detach. I'm proposing a fully reference counting scope instead of the only tracking version there currently, that hides the reference counting in internal PointerScopes. It wouldn't have a detach method (although it may want something like releaseReference, we'll see).

That sounds just like an implicit detach(). Panama isn't going in this direction (see openjdk/panama-foreign#466), so I'm not sure this is the kind of thing we should explore, but sure, like I said, we're only doing alpha releases for now, so we can experiment. I'd be fine with that.

Again, detach doesn't have anything to do with this. This is analogous to the default scope mentioned in that PR, but instead of a single default it's one that can be used by our existing top level scopes. It still "owns" resources in the sense that it will close them when it closes, but doesn't prevent them being closed earlier if they are part of some other earlier-closing scope.

@rnett
Copy link
Contributor Author

rnett commented May 26, 2021

As per the above comment, I'm closing this PR, we need a non-tensor specific resource scope. I don't have time to do it at the moment, but may eventually.

@rnett rnett closed this May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants