Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distance2boundary: sometimes returns the wrong distance #517

Closed
yeganer opened this issue Mar 10, 2016 · 17 comments
Closed

distance2boundary: sometimes returns the wrong distance #517

yeganer opened this issue Mar 10, 2016 · 17 comments

Comments

@yeganer
Copy link
Contributor

yeganer commented Mar 10, 2016

While looking at the code I came across a weird behaviour in distance2boundary.
I thought the recently_crossed_boundary flag was necessary because the old implementation couldn't handle packets close to the boundary. I was wrong. This flag was implemented to correct numerical errors in the case that moving a packet resulted in a radius outside it's shell boundary. This happens quite a lot actually, around 10% of the invocations to distance2boundary have either r<r_inner or r>r_outer.Most of those cases are r<r_inner and this is checked for with

if (recently_crossed_boundary ==1):
    return d_outer

But in the other case where r>r_outer nothing is checked and the distances are calculated the normal way. For d_outer the formula used is
d = sqrt(r_outer ** 2 + (mu**2 - 1) r**2 ) - mu * r.
For r>r_outer this gives the distance to the nearest crossing instead of the farthest.
I think this is not a huge issue but since I haven't found anything contradicting me this can result in a packet crossing the inner boundary multiple times in a row and being absorbed in that process or that the r and shell_id of the packet are messed up and are not synchronised anymore.

In my opinion there are two ways to fix this:

  1. Implement a check similiar to the one for r<r_inner and return the correct distance. This will leave these numerical errors in the code.
  2. Remove the numerical errors by explicitly setting r=r_inner|r_outer when crossing the shell boundary. I'm not sure of the consequences of this approach. @unoebauer might have some experience with this particular problem.
@yeganer
Copy link
Contributor Author

yeganer commented Mar 11, 2016

Sorry for the confusion. The calculated distance seems to always be correct.

So the only thing that's strange now is why removing recently_crossed_boundary changes the result.
I'll do some more tests and come back later.

@wkerzendorf
Copy link
Member

@yeganer there is no need to apologize. It's very important to question the code! @tardis-sn/tardis-core to make sure that everyone has a look.

@wkerzendorf
Copy link
Member

@ssim did we discuss setting r=r_inner at some stage?

@unoebauer
Copy link
Contributor

@yeganer; what about the possibility we talked about yesterday (see PR #520). I think this is technically still possible in the current master

@unoebauer
Copy link
Contributor

@yeganer - is this still relevant? Which your modifications, I thought this issue is resolved. Or am I wrong?

@yeganer
Copy link
Contributor Author

yeganer commented Mar 30, 2016

@unoebauer This is kind of still relevant. There is still a probability > 0 that a packet is outside the outer boundary and misses it. This would result in the package interacting only with lines and continuum. After such an interaction the packet gets a new mu. If it then hits any boundary we are fine but there is again a small possibility that the packet doesn't hit anything and slowly drifts away. The farer away the packet, the bigger the probability that it will miss it's shell again.

So in the current code and in my improved version there is the possibility for an endless loop. But this probability is extremely small since every interaction after the first miss of the outer boundary has to have a mu that will again miss the boundary.

A fix would be to explicitly set the radius instead of calculating it after moving the packet accross the shell boundary.

@ssim
Copy link
Contributor

ssim commented Mar 30, 2016

@yeganer I think that it would help me to understand if you can give a specific example of when there's a problem - i.e. can you, step by step, go through a (hypothetical) example where the code would currently do the wrong thing.

@yeganer
Copy link
Contributor Author

yeganer commented Mar 30, 2016

@ssim Sure.
Let's assume our packet just encountered the move_packet_across_boundary event but due to roundoff errors the new r is set to a value outside the outer boundary. Now we assume the packet was traveling very tangentially with a very small mu. This results in a negative sqrt(...) during the boundary calculation. Our d_boundary is now NaN. During determination of the next event every check against d_boundary returns False so no interaction further move_packet_across_boundary occurs. The packet will only interact with lines and thomson_scattering. If the packet happens to get a mu>0 after every interaction the packet will move away from the boundary and more angles will return a 'miss outer boundary'. Once the packet is far enough away that d_boundary is always bigger than the other distances it is highly unlikely that the packet will return. One could estimate that probability with a random walk but I'd estimate the total probability of that happening to ~10^-100

Of course it's very unlikely that a packet enters the state that it misses the outer boundary. I did a calculation once and the requirement was something like mu < sqrt(2*e_m + e_m **2) where e_m is the machine precision. Tests showed the maximum difference between r and r_outer was ~3 e_m. All in all we get ~1e-7. But this is only the probability for a dangerous angle. For the whole thing to happen the packet has to interact very closely to the boundary, too.

@ssim
Copy link
Contributor

ssim commented Mar 30, 2016

@yeganer I see - yes, that's helpful - I think I understand the point better now. So this is when we are getting (in terms of numerical precision) close to the case where a packet's trajectory is close to being a tangent line to the boundary: i.e. a packet has come down towards a shell boundary (from outside) in such a way that it just skims the boundary of that shell. The way the code deals with this case should be that (in the hypothetical perfect case of no rounding errors) the packet would reach the boundary, be temporarily assigned to the inner zone, but then calculate that the distance to move back into the outer zone would be zero ... so it would immediately transition back. However, if the "crossing point" is numerical calculated to be just outside the boundary (rather than exactly on, or just inside the boundary) the sqrt goes NaN and the distance is handled wrongly.

Clearly this is going to be a very rare thing, but I agree with you that we should catch it. My gut instinct (based on making as minimal change as possible) is just to have a check that the value of d_outer on line 160 of cmontecarlo.c (I'm looking at the current master) is finite. If it's not, print a warning out and set it to zero. This should happen very rarely but I think it follows the logic of the method current used and so is a very minimal change.

It's interesting that your test for this (in your PR) never trips the check on the determinant being negative. Perhaps that's just luck (it's too unlikely to be that close) or perhaps there's some cleverness in the way that mu is updated when a packets moves that we don't encounter this?

@yeganer
Copy link
Contributor Author

yeganer commented Mar 30, 2016

@ssim: @unoebauer did a quick check on the determinant being negative and he didn't find any. Considering the requirements for that case I come to the conclusion that we ran the test with far to few packets to encounter that.
As stated above only in 1 out of ~10^7 cases the packet has a mu which is small enough that the check would be able to trigger. But at the same time the packet has to be near the boundary. Thats a probability of ~10^-15 on top of that ( assuming the r is uniform distributed between r_inner and r_outer )

I don't think we will ever encounter that.

The questions that are relevant:

  1. Do we care that packets are sometimes outside of their shell (by up to 3 machine errors)?
  2. If we care, do we want to change that behavior? A test I did produced a relative difference of ~10^-12 between the virtual spectra. In that test I manually set rpacket.r to the corresponding radius.

@ssim
Copy link
Contributor

ssim commented Mar 30, 2016

@yeganer
We don't care that packets are sometimes outside the boundaries of their cells to a few machine precision: that's part of the philosophy we originally adopted and the reasoning behind the integer flag to say where the packet is...that flag rules supreme and vetoes the exact value of r. The way I think of it is that the packet IS inside the shell, it's just that the value of r recorded is not sufficiently precise to be able to judge from that.

So I would say that we don't care and we don't want to change it. It would still be good, however, to have the code throw a message / error if that determinant is negative, just on principle. After all, maybe people will still be using TARDIS a couple of Hubble times from now and it will save them the trouble of finding this thread when it does happen. Those "clever packets" will get there in the end! ;)

@yeganer
Copy link
Contributor Author

yeganer commented Mar 30, 2016

@ssim
Calculating the determinant every time would be possible but reduce performance of course. Additionally it would complicate the code further.
My suggestion would be to check somewhere whether d_boundary < 2 * r_outer and if that's false do some warning. This check will fail if something went wrong during calculation AND if d_boundary is NaN.

In the new version of that function in PR #514 this flag isn't needed anymore. So I think it is safe to merge those changes, including the removal of recently_crossed_boundary.

@ssim
Copy link
Contributor

ssim commented Mar 30, 2016

Checking on d_boundary as you suggest sounds fine, provided NaN < something is always false (you know that better than me).

Recently_crossed_boundary is still in PR #514 isn't it? But your plan is now to remove it?
Am I correct in thinking that the statement that it is safe to remove it is the statement that when a packet crosses a boundary in the outward direction it will never have a mu that is negative (numerically)?

By the way, I spent quite a long time just now being confused by the meaning of "next_shell_id" however, which appears to be different form what I'd expect. Can you easily change the comment on this quantity in rpacket.h to correctly explain that it is not, in fact, the ID of the next shell but is +1 if the packet is heading outwards and -1 if heading inwards. Or should make a separate issue to have this fixed? (I guess that this use of the variable got changed at some point but the comment wasn't altered.)

@yeganer
Copy link
Contributor Author

yeganer commented Mar 30, 2016

@ssim
The tests for the cpart are in the process of being rewritten. With the new framework it should be easy to check for special cases (provide input that forces NaN) and then verify the behavior.

You are right in your thinking about recently_crossed_boundary. Because we know what circle the packet will hit, we don't have to calculate both distances and compare them but only calculate the one we are hitting.
In the old implementation without recently_crossed_boundary the distance to the inner boundary would sometimes be falsely calculated and then be smaller than d_outer which would result in errors. This can't happen in the new implementation.

I found that to be quite confusing myself. I'm not sure whether we need that variable anyway.
Assuming move_packet always calculates a new mu it should be easy to determine whether a packet is crossing a boundary outward or inward. I'd have to think more about it but I'm quite positive that's a variable we don't need either.

@yeganer
Copy link
Contributor Author

yeganer commented Apr 29, 2016

@tardis-sn/tardis-core I think we should close this.
If we want to check for NaN during calculation of distance2boundary we should open a new issue.

@wkerzendorf
Copy link
Member

can you open a new issue and close this one?

@yeganer
Copy link
Contributor Author

yeganer commented Apr 29, 2016

Closing this. The followup is #552.

@yeganer yeganer closed this as completed Apr 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants