Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an issue with the relative difference prior #696

Merged
merged 5 commits into from
Oct 6, 2020

Conversation

robbietuk
Copy link
Collaborator

The inclusion of the epsilon term in the denominator square

@robbietuk
Copy link
Collaborator Author

From my notes, we have to consider the RDP with the addition of the epsilon term to prevent divide by zero.
image

@AnderBiguri
Copy link
Collaborator

Hum, you just changed the location of the epsilon right? if its only there to avoid division by zero, then its value must be much smaller than the expected values for the rest of the numerator and denominator (otherwise it would have a numerical impact, and we just want to avoid division by zero).

If that is the case, it does not really matter for it to be mathematically correct on its derivative. You just need to add epsilon to any division, and thats it. In theory, it will only have an impact when everything else is zero, so the error you are introducing by not being mathematically correct is neglectable (and if its not, then your epsilon is too big!)

Is that the case here? I am missing a but the context, so maybe I am talking out of place!

@robbietuk
Copy link
Collaborator Author

robbietuk commented Sep 24, 2020

Yes, the purpose is to prevent the divide by zeros and it also exists in the computation of the function

current = weights[dz][dy][dx] * 0.5 *
(pow(current_image_estimate[z][y][x]-current_image_estimate[z+dz][y+dy][x+dx],2)/
(current_image_estimate[z][y][x]+current_image_estimate[z+dz][y+dy][x+dx]
+ this->gamma * abs(current_image_estimate[z][y][x]-current_image_estimate[z+dz][y+dy][x+dx]) + this->epsilon ));

In general I would agree with you as generally the function and gradient are slightly decoupled. However, I am trying to be mathmatically correct as I am using line searches to find the MAP solution and therefore discrepancies between the function and its gradient can lead to problems (which is what I think I am finding). Numerical impact may infact be a problem but that just means it isnt the "true" RDP.

I could consider discarding the epsilon and instead use if x == 0 and y == 0: return 0. This would let it be both mathmatically correct and handle divide by zeros.

@AnderBiguri
Copy link
Collaborator

@robbietuk fair, fair. However, maybe you are just using an epsilon too large? As I said in the teams chat, if the value you are using as "epsilon" is large enough to have influence in your maths, either it is too large on value, or you have other serious issues on the math itself, as its too sensitive to numerical accuracy.

@robbietuk
Copy link
Collaborator Author

I agree, I feel that epsilon is causing a large issue right now. I want to try and replace it with the if statement.

@AnderBiguri
Copy link
Collaborator

AnderBiguri commented Sep 24, 2020

Yes, that works too! But I guess my point is that if those microscopic epsilons are causing a problem, I don't expect your problems to go away... If the epsilon you are imputing is indeed small enough to be on the "noise" fraction of your data, and that impacts the numerical method, then any other "noise" (i.e. floating point error) that may arise just naturally on your computations will cause the same problems in the method!

@robbietuk
Copy link
Collaborator Author

robbietuk commented Sep 24, 2020

Sorry, I dont think I phrased the above correctly. I think the epsilon I am using right now is too large for normalised data. That and the "decoupled" mathmatical function and gradient, and there is a problem.

Yes there is an computational problem of using very small epsilons and I think we are both in agreement that moving away from it is the correct thing?

@AnderBiguri
Copy link
Collaborator

AnderBiguri commented Sep 24, 2020

No! That is not what I meant.

I think there is absolutely no issue with using very small epsilons, if their purpose is only avoiding by zero. They just need to be marginally bigger than zero, but just that. so 1e-7 should also work. In fact, you want epsilon to be as small as possible, in such a way that it can be considered noise for your purposes. Yo only want it so when our "incomplete" numerical system of using floating point arithmetic fails because it does not know how to divide by zero (and give you Inf), it doesn't, and gives you a "practical" Inf, something very large. Or, if the numerator is also zero, then for your division to give something that is essentially zero, but not, and avoid NaN.

In some sense, it has nothing to do with math, epsilon is not there to do something mathematically smart, its just there to help us overcome a limitation of using computers. This is why I was surprised when you used it in the gradient, I don't think its necessary. You just want to replace your divisions of a/b by a/(b+eps), not change your equations. Its not a regularizer, or some other mathematical construct, its just a "hack".

So, in short, before changing anything and going crazy with equations, I suggest you just do this->epsilon = 0.00000001; and try your maths again.

@robbietuk
Copy link
Collaborator Author

Yes, as epsilon > 0 tends to 0, the function and gradient behave like their non-epsilon counterparts (except when voxels are 0). The trouble is, what is the correct value of epsilon for the general case?

Removing epsilon, adding a check on the two voxel values being 0 (then return 0) becomes equiviant to an optimal epsion implementation, and we dont have to answer that question.

@KrisThielemans
Copy link
Collaborator

I strongly recommend making the gradient consistent "formula-wise" with the objective function. I think that should be the case for whatever value of epsilon you use.

I don't think checking with if is going to work. The division will likely explode in other cases as well.

We could check with Johan Nuyts et al. if you like

@AnderBiguri
Copy link
Collaborator

AnderBiguri commented Sep 24, 2020

@robbietuk

The trouble is, what is the correct value of epsilon for the general case?

As close as zero without being zero. For any case. You just want a small enough perturbation to avoid division by zero.

@KrisThielemans

I strongly recommend making the gradient consistent "formula-wise" with the objective function.

While you are correct, and doing this will have no negative impact, in all cases in the past where I could do this but instead I just replaced divisions, there was no measurable difference with the simple "non-formula" way. In fact, one could argue that if there is a difference when using the formula, then your epsilon is too big in value. Conceptually you just want to introduce a value in the noise range of your computation, and by definition that should mean that it has no impact on your results. Its the same as adding doubles vs singles for all this. You are adding more numbers in the noise level of your method, thus it does not matter.

In fact, I would argue that if you do the "proper formula way" vs the "just adding it in the division", and there is a measurable impact in your algorithm, then your epsilon is wrong.

But of course, there should not be an issue with adding the proper way.

@robbietuk
Copy link
Collaborator Author

robbietuk commented Sep 25, 2020

Okay I did some numerical test on the RDP. The trouble with using an epsilon in the denominator is that the equation and its gradient are significantly sensitive to poorly selected values. Unlike a function, such as the QP = (x - y)^2 , the RDP is sensitive to differences between voxels relatvie to the voxel values. I guess the clue is in the name.

I wrote some python to investigate this. One set of functions (RDP_e and grad_e have an epsilon = 1e-10) and the formulee are equivilant to that of my first comment (#696 (comment)).

The second set of RDP tests of this use an epsilon = 0.0 but have the addition of an if statement if x == 0.0 and y == 0.0 and epsilon == 0.0: return 0.0 for both function and its gradient computation. This is only to prevent the evaluation of 0.0/0.0, which I have assumed to equal 0.

x and y are a set of values given by the following python code [0.0] + [math.e ** i for i in range(-30, 5)] (i.e. [0.0, 9e-14, 2e-13,..., 54]) so voxel values both above and below the epsilon values are compared for this experiment. On the below images, the vertical and horizontal axis values are equivilant. Each pixel of the 2D images are the function or gradient value for the comparison between the respective x,y values.

I will compare these two sets of functions and show that the if statement version is equivilant to an optimally selected epsilon value of infintesimally small but positive value and therefore I believe we should use the if statements and do away with epsilon.

Function

The RDP and RDP_e are visually comparable (RDP shown below).
image
However subtracting one from the other (RDP - RDP_e) we can see the discrepancies, even if they are very small.
image
As the epsilon value tend to 0 (e.g. epsilon=1e-40) then this difference image becomes uniform 0's.

Gradient

The gradient is another matter entirely. The gradient of the RDP (grad) is given.
image
Only once is the if statement True, at x=0 and y=0. As is clear, the RDP at low voxel values is sensitive.

Now compare to grad_e, with the inclusion of epsion=1e-10 and we see that the sensitivity at lower values (<epsilon) has been reduced as the gradient tends to 0.
image

Lowering the epsilon value to 1e-40, << the smallest non-zero x and y, leads to virtually equivilant behaviours between grad and grad_e.

Remarks

While I agree we should set epision to be as small as possible, which should be much less than the smallest value of x, y, I think I have shown that its usage is redundant. If epsilon is too large, then it does have significant impact on the gradient computation and the selection of its value may become dangerous.

P.S. While not rediculously slow, the RDP is a large formula to recompute many times. We can possibly speed it it up with a queery using the if statement.

@robbietuk
Copy link
Collaborator Author

robbietuk commented Sep 25, 2020

I guess then that the answer to my problem is that the current epsilon is too large for normalised data and that is why I am seeing weird behaviour in my line searches. I have have not seen this before with the RDP as I did not use normalised data, thus the voxel values were of the order of magnetude 1e3.

@AnderBiguri
Copy link
Collaborator

Very nice analysis!

@robbietuk
Copy link
Collaborator Author

robbietuk commented Sep 25, 2020

Drafted the removal of epsilon, if we dont choose to use it. We can just revert the commit. It build for me but I want to ensure travis is happy.

@robbietuk
Copy link
Collaborator Author

I can't see any more issues, I am happy to merge.

@robbietuk
Copy link
Collaborator Author

robbietuk commented Oct 2, 2020

After discussion with @KrisThielemans, we decided to go for a comprimise. We will keep the epsilon but by default set it to 0. The if statements will handle the case when epsilon=0, x=0, and y=0 by returning 0. This allows a user to set the value of epsilon.

Also corrected default gamma to the suggested value of 2.

@robbietuk
Copy link
Collaborator Author

As discussed via email, it is not obvious how to extend the RPD allow for negitive number and a reliable concave function. In the presence of a non-negitiviety constraint, the RDP will now work as expected, even with epsilon =0.

One idea is to incorporate an assert(x>=0) and assert(y>=0) onto every voxel evaluation. However, I don't know how well this would work in practice (it may be simplified by assert(min(volume)>=0)). Do we need to provide this hard boundary? Does STIR ever allow negitive voxel values in its reconstruction algoriothms. Should this assert really be a if statement that outputs a warning?

Copy link
Collaborator

@KrisThielemans KrisThielemans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. would be nice to add an example like \examples\samples\OSMAPOSL_QuadraticPrior.par. if you do, please do it with [ci skip] in the first commit line

@KrisThielemans KrisThielemans merged commit d45d7ed into UCL:release_4 Oct 6, 2020
@KrisThielemans KrisThielemans added this to the v4.1 milestone Oct 6, 2020
@robbietuk robbietuk deleted the RDP_grad_fix branch October 8, 2020 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants