Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterators and resource management #22466

Open
davidanthoff opened this issue Jun 21, 2017 · 10 comments
Open

Iterators and resource management #22466

davidanthoff opened this issue Jun 21, 2017 · 10 comments
Labels
collections Data structures holding multiple items, e.g. sets

Comments

@davidanthoff
Copy link
Contributor

This is a problem I have in Query.jl: I sometimes want to iterate things where I am acquiring some resource in the start method that needs to be explicitly freed later on. For example I might open a file in the start method, and at some point I'd like to close it again. I could close it in the last call to the done function, but I can't be sure that it will ever be called, so that is really kind of sloppy.

One solution might be that a normal for loop gets lowered to something like this:

s = start(source)
try
    while !done(source, s)
        v, s = next(source, s)
        # Loop body
    end
finally
    dispose(source, s)
end

With a default implementation of dispose:

@inline dispose(a,b) = nothing

And then I could add a method to dispose for my type that closes say a file that I opened in start.

That design would probably only work if the compiler was able to detect cases where the whole try...finalize block did nothing (i.e. where the finally block didn't do anything) and then was able to remove the whole try...finally construct. I assume otherwise it would cause too much of a performance problem.

In any case, even if my idea here doesn't work, it would be good to have some resource management story for iterators.

Probably relates to #18823 and #11207.

@kshyatt kshyatt added the collections Data structures holding multiple items, e.g. sets label Jun 21, 2017
@yuyichao
Copy link
Contributor

What's wrong with just manually doing the clean up? We should have better support for function local resource cleanup but I don't think that should be in the iterator protocol at all.

@vtjnash
Copy link
Member

vtjnash commented Jun 22, 2017

It's too high overhead to add try/finally around every try/catch

But it is occasionally a useful pattern that I've used a couple times (for example, https://github.com/JuliaLang/julia/blob/master/base/weakkeydict.jl#L121)

@davidanthoff
Copy link
Contributor Author

What's wrong with just manually doing the clean up? We should have better support for function local resource cleanup but I don't think that should be in the iterator protocol at all.

@yuyichao How? The resource is allocated in start, is needed in next and done, but should be freed when the iterator state is no longer used.

It's too high overhead to add try/finally around every try/catch

@vtjnash I don't understand, there is no try/catch in this at all, right? And yes, obviously we can't wrap every loop in a try/finally, but couldn't the compiler remove try/finally blocks where there is actually nothing happening in the finally block?

But it is occasionally a useful pattern that I've used a couple times (for example, https://github.com/JuliaLang/julia/blob/master/base/weakkeydict.jl#L121)

Yes, that kind of works, but not really. Good example is when I open a file in the start method. Yes, I can close it in the finalizer, but that means that the file might be open until the finalizer actually runs, whereas it would be much nicer if it was guaranteed to be closed after the code block that iterates is done.

@yuyichao
Copy link
Contributor

The resource is allocated in start, is needed in next and done, but should be freed when the iterator state is no longer used.

Change the API.

@davidanthoff
Copy link
Contributor Author

Change the API.

There are cases in base that would benefit from a solution to this problem (e.g. here), and it seems generally a useful pattern. Note that e.g. in .Net IEnumerator inherits from IDisposable and for loops have the kind of try/finally block that I suggested above, so there is certainly precedence for this kind of construction.

@yuyichao
Copy link
Contributor

For that case, eachline should create the resource and then that should be managed manually.

In general, I don't think the resource management should be bound to iterator since the start call is implicit to the user. It can certainly be bound to the creation of the iterator though.

And as I said in the first comment, other than this, I don't think there's anything different from any other resource management so we should fix just that.

@davidanthoff
Copy link
Contributor Author

In general, I don't think the resource management should be bound to iterator since the start call is implicit to the user. It can certainly be bound to the creation of the iterator though.

That would mean that you can't have any lazy iterator story if any of your iterators depends on a non-managed resources. That would exclude countless really useful scenarios (especially in the file IO and DB access scenario) and a pattern that is used successfully a lot on other platforms.

@yuyichao
Copy link
Contributor

That would mean that you can't have any lazy iterator story if any of your iterators depends on a non-managed resources.

Why is that? Just do for i in get_iterator(obj); end ...

FWIW, the proposed API is not better than just closing the resource when the iteration is done. If it is a resource only needed for iteration, you don't need it after done returns true. If it is a resource referenced by iterator output, the loop isn't the correct scope to close it (the correct scope is only known by the user).

@davidanthoff
Copy link
Contributor Author

FWIW, the proposed API is not better than just closing the resource when the iteration is done.

What if the user of the iterator never iterates everything? Then the final done will never be called and the resource will stay open forever. I think resource management a la eachline in done is just a bug. @vtjnash's suggestion with the finalizer at least cleans the resource at some point if someone for example breaks from a for loop.

@tpapp
Copy link
Contributor

tpapp commented Nov 12, 2018

Is a finalizer still the best solution/workaround for this issue? BTW, I looked at the io.jl source linked above (which moved on since) and EachLine does not seem to register a finalizer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

No branches or pull requests

5 participants