Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

each instance of a comprehension variable should be different #1571

Closed
StefanKarpinski opened this issue Nov 19, 2012 · 24 comments
Closed

each instance of a comprehension variable should be different #1571

StefanKarpinski opened this issue Nov 19, 2012 · 24 comments
Assignees
Labels
breaking This change will break code parallelism Parallel or distributed computation
Milestone

Comments

@StefanKarpinski
Copy link
Member

Currently we have this:

julia> map(f->f(), [ ()->i for i=1:10 ])
10-element Int64 Array:
 10
 10
 10
 10
 10
 10
 10
 10
 10
 10

I think this should definitely evaluate to [1:10]. I also think that's what for loops should do, but in the case of comprehensions, I think the case is much more powerful.

I've tagged this with "parallel" because comprehensions are a very natural place for implicit parallelism in the language, but that essentially would require the semantics I'm proposing in order to work correctly. Note that I'm talking about evaluating a normal comprehension using multiple threads, rather than producing a DArray distributed across multiple machines.

@toivoh
Copy link
Contributor

toivoh commented Nov 19, 2012

+1

@StefanKarpinski
Copy link
Member Author

Note that I'm talking about evaluating a normal comprehension using multiple threads, rather than producing a DArray distributed across multiple machines.

This distinction suggests to me that the @parallel macro should maybe be renamed to @distributed to allow for possible single-process parallelism, e.g. automatic threaded comprehensions and for loops.

@JeffBezanson
Copy link
Member

Ok then let's change this for for loops according to the existing issue,
and let comprehensions inherit it.
Having both distributed and parallel starts to get messy and we should have
a simple overall model instead.
On Nov 19, 2012 3:16 PM, "Stefan Karpinski" notifications@github.com
wrote:

Note that I'm talking about evaluating a normal comprehension using
multiple threads, rather than producing a DArray distributed across
multiple machines.

This distinction suggests to me that the @parallel macro should maybe be
renamed to @distributed to allow for possible single-process parallelism,
e.g. automatic threaded comprehensions and for loops.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1571#issuecomment-10529117.

@staticfloat
Copy link
Member

This is tangentially related to the discussion on Parellel Callbacks in C on the list back in August, specifically Stefan's post regarding having threads with completely independent global state. This pretty much eliminates the (user-facing) differences between multithreading and multiprocessing, and is pretty much the model I reference in my link to "Writing Perfect Multithreaded Programs with ZMQ" further down the discussion.

But this is discussing more the difference between distributed and parallel (or lack thereof), not the instance at hand.

@toivoh
Copy link
Contributor

toivoh commented Dec 3, 2012

I think it would make sense to extend this change to all variables that are local to a for loop/comprehension.
I.e., the change should apply to

map(f->f(), [ (j=i; ()->i) for i=1:10 ])

just as well as map(f->f(), [ ()->i for i=1:10 ]).

@StefanKarpinski
Copy link
Member Author

That seems like the best semantics to me as well. I'm not sure how hard it will be to implement though.

@JeffBezanson
Copy link
Member

If all loop-local variables are fresh on each iteration, then this loop won't work:

for i = 1:n
    if i == 1
        x = 1
    else
        x = x+1  # x is undefined
    end
end

In parallel loops and comprehensions, it's reasonable to disallow dependence on previous iterations, but this seems really strict for ordinary for loops. And might break a lot of code. Of course, you can still do reductions if the reduction variable is initialized outside the loop.

@StefanKarpinski
Copy link
Member Author

This example seems pretty weird to me – I would never write a loop like that, but maybe other people do. I was thinking that only the loop variable i would be fresh on each iteration. Does that wreak havoc with our scoping rules?

@StefanKarpinski
Copy link
Member Author

Also, this example currently works but is useless since the scope of x is limited to the for loop – you can't get its value outside of the loop. The only way you can get its value outside of the loop is if it was declared or assigned outside of the loop, in which case this issue doesn't exist since it wouldn't be loop local and therefore wouldn't be fresh on each iteration.

@JeffBezanson
Copy link
Member

I agree my example is weird. It's not totally useless though since I could imagine using the value of x inside the loop.

It's easy to make only i fresh. The current behavior of let is that in let x=0,y=1; ...; end only x and y are "new" variables, and the rest is an ordinary scope block. The idea of this was to give precise control over which variables are new. But I could imagine making new bindings for all local variables in a let, or perhaps in every scope block.

@StefanKarpinski
Copy link
Member Author

I prefer the precise approach. Otherwise, if I'm understanding correctly, there would be no difference between

let x = 1
  # ...
end

and

let
  x = 1
  # ...
end

Which seems weird. Why even bother with the first one in that case?

@JeffBezanson
Copy link
Member

There is one obscure difference: in let x = y, y is evaluated in the enclosing scope, not in the new scope that contains x. let x=x is meaningful.

@StefanKarpinski
Copy link
Member Author

Right. I think I'd still prefer the precise approach where the local variables in the let block are normal locals.

@JeffBezanson
Copy link
Member

That might be ok, but we would have

julia> map(f->f(), [ ()->i for i=1:3 ])
3-element Int64 Array:
 1
 2
 3

julia> map(f->f(), [ (m=i; ()->m) for i=1:3 ])
3-element Int64 Array:
 3
 3
 3

Or, if not in comprehensions, then in this case:

i = 1
a = cell(3)
while i <= 3
    let j = i
        m = j
        a[i] = ()->m
    end
    i += 1
end
a[1]()  # => 3

@StefanKarpinski
Copy link
Member Author

Yeah, it's a tough call. For comprehensions it is pretty clearly more intuitive for m to be "new" in each iteration. What if we had new bindings for every scope block? That would certainly simplify things because there would only be one behavior to learn. I don't think it's unreasonable to require that a variable be scoped outside of a loop in order to be used across loop iterations. In most use cases you need that anyway.

@JeffBezanson
Copy link
Member

Yes, I'm really starting to think every scope block should allocate new bindings for its variables every time it is entered. Jameson agrees too.

@StefanKarpinski
Copy link
Member Author

I think that's the least gotcha behavior by far and if we can do it without performance issues, we should. Simply avoiding the entire if issue of when there are new bindings and when there aren't is a big win.

JeffBezanson added a commit that referenced this issue Aug 1, 2013
the most problematic change was forcing for loop variables to be loop-local,
instead of reusing existing local vars. for example

    local i
    for i = 1:2
    end
    return i

currently works and returns 2.

for now, the behavior is to introduce new variables for each iteration unless
we are overwriting an existing local. comprehensions, however, always
introduce fresh variables and never overwrite existing ones.
@JeffBezanson
Copy link
Member

To clarify that commit message, for now I'm keeping the behavior where for i can overwrite an existing i. This is technically a different issue from whether any new variables are per-loop or per-iteration. I think this is a good compromise for minimizing code breakage.

@StefanKarpinski
Copy link
Member Author

Yes, I like that approach. It seems the least surprising.

@JeffBezanson
Copy link
Member

jb/newvars branch merged.

@StefanKarpinski
Copy link
Member Author

I'm unreasonably excited about this.

@StefanKarpinski
Copy link
Member Author

So, I immediately set out trying to break this and mostly everything is cool, but this seems inconsistent:

julia> for i = 1:5
           if i == 1
               x = 1
           else
               x = x+1  # x is undefined
           end
           println(x)
       end
1
2
3
4
5

julia> a = {}
0-element Any Array

julia> for i = 1:5
           if i == 1
               x = 1
           else
               x = x+1  # x is undefined
           end
           println(x)
           push!(a,()->x)
       end
1
ERROR: x not defined
 in anonymous at no file:5

Of course, I never would have thought to even write an evil loop like this except that you used it as an example above, but it seems like these should either both succeed or both fail.

@JeffBezanson
Copy link
Member

There is a small change I can make to fix this. So far I've left it this way (1) to ease the transition, since less code will break, and (2) because fixing it requires adding an enormous number of (newvar x) nodes to catch every possible place where a variable might need to be re-nulled (system image +300K). These are only needed in cases where a variable might be accessed undefined, which is quite rare, but we don't know that until much later. Some work is required to cut down on these.

@StefanKarpinski
Copy link
Member Author

Honestly, it is probably fine to leave it this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code parallelism Parallel or distributed computation
Projects
None yet
Development

No branches or pull requests

4 participants