-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misleading tip #76
Comments
Similarly, the first tip:
Also seems a bit misleading, as I believe the answer here is no. |
The inner loop uses kk, meaning each loop adds to a different part of the array. Although not immediately obvious, if you switch the loops around as he mentions in the third point (I think this is possible due to the lack of dependencies), then the inner loop is summing over ii, so it adds to a scalar for the duration of the inner loop. |
+1 - this confused me for a good hour or two as well. I believe the intention of the text is to explain how to think in general about these problems -- rather than specifically about this problem -- which I think is a good goal, but the way it's worded left me spending ages trying to figure out how to use scalars instead of arrays too. Not sure what I think the best way to change it is, but if this the above is correct then maybe emphasising that it's a more general method to approach these problems and that not all of the points are relevant to this example would alleviate a lot of the trouble. |
@ashleydavies is thinking along the right lines. I am nudging you towards thinking in particular ways now so that, once you get to CW5 and 6, you can (hopefully) "see" the kinds of optimisation you can apply to squeeze out performance. |
The instructions in Part 4 give us this tip:
I would say that this is a pretty misleading tip. If I'm not mistaken, there are no dependencies between the iteration nodes at this point. The problem of correctness when you naively parallelise the outer loop is not a problem resulting from anything to do with the iteration space, but a more inherent issue with parallel programming (race conditions). Am I correct in saying this?
The text was updated successfully, but these errors were encountered: