Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about occupancy probablity #10

Open
zzzxxxttt opened this issue Sep 27, 2023 · 5 comments
Open

Question about occupancy probablity #10

zzzxxxttt opened this issue Sep 27, 2023 · 5 comments

Comments

@zzzxxxttt
Copy link

zzzxxxttt commented Sep 27, 2023

Hi @tarashakhurana,

In model.py the occupancy probablity is calculated in line pog = 1 - torch.exp(-sigma), what's the reason behind this function 1-exp(-sigma)? And I found in dvr.cu, the way to get occupancy probablity is p[count] = 1 - exp(-sd), where sd = _sigma * _delta, why there is a * _delta involved?

@tarashakhurana
Copy link
Owner

Thanks for writing a detailed explanation! If you can convert it to latex I will be very happy to include the derivation in the supplement. I had a version but I lost it's latex copy.

@tarashakhurana tarashakhurana reopened this Nov 5, 2023
@zzzxxxttt
Copy link
Author

Thank you for your reply! I withdraw my previous comment since I found it not complete, there are still two questions remained:

The first question is, what does this "option 2" mean?
image

And the second question, I create a simple test case in which the predicted sigma is (for brevity I omit the batch and time dimensions here):

[[[0, 0, 0, 0, 100],
  [0, 0, 0, 0, 0],
  [0, 0, 0, 0, 0],
  [0, 0, 0, 0, 0],
  [0, 0, 0, 0, 0]]]

And the the origin is at [0, 0, 0], the end point is at [4, 0, 0].
Now I pass the sigma and points to dvr rendering, the returned gradient is:

[[[-4, -3, -2, -1, 0],
  [ 0, 0, 0, 0, 0],
  [ 0, 0, 0, 0, 0],
  [ 0, 0, 0, 0, 0],
  [ 0, 0, 0, 0, 0]]]

This is confusing since the predicted occupancy is perfectly aligned with the gt point, but the gradient is still very large, especially at near the origin?

@peiyunh
Copy link

peiyunh commented Nov 22, 2023

Hi @zzzxxxttt , great question and thanks for the example. It may seem unintuitive, but the code is working as intended. I will try to unpack it below. Let me know if there is any part that makes no sense.

First, the returned gradient is derivative of d (predicted depth) w.r.t. sigma (predicted density). To simplify the example, let's assume it is a 1-D grid and we have 5 voxels to consider. We predict 5 intensities (s for sigma): s[0], s[1], s[2], s[3], and s[4].

And say the probability of a ray terminating at voxel 0 can be written as p[0] = 1 - exp(-s[0]). Similarly, the probability of the same ray terminating at voxel 1 and can be written as: p[1] = exp(-s[0]) * (1 - exp(-s[1])), which is equal to the probability of it not terminating at voxel 0 times the conditional probability that it terminates at voxel 1.

Following this logic, we can write out the probability that the ray terminates at voxel 4 as: p[4] = exp(-s[0]) * exp(-s[1]) * exp(-s[2]) * exp(-s[3]) * (1 - exp(-s[4])).

It is also possible that the ray terminates outside the voxel grid, which we write as p[out] = exp(-s[0]) * exp(-s[1]) * exp(-s[2]) * exp(-s[3]) * exp(-s[4]).

Note that p[0] + p[1] + p[2] + p[3] + p[4] + p[out] = 1.

Now we can write the predicted depth as: d = p[0] * 0 + p[1] * 1 + p[2] * 2 + p[3] * 3 + p[4] * 4 + p[out] * 4.

Now to your first question, option 2 refers to the fact that we assign the same depth we assign to p[4] (i.e., 4) to p[out] - the event where the ray terminates outside the voxel grid.

To your second question, if we expand the formula for the predicted depth, we have: d = 4 - p[0] * 4 - p[1] * 3 - p[2] * 2 - p[3] * 1. Notice there is no p[4] (due to option 2 in this case), which explains why dd_dsigma[4] is equal to 0.

Let's compute dd_dsigma[3], we can follow chain rule and do: d(d)/d(s[3]) = d(d)/d(p[3]) * d(p[3])/d(s[3]). We know that d(d)/d(p[3]) = -1 and d(p[3])/d(s[3]) = exp(-s[0]) * exp(-s[1]) * exp(-s[2]) * exp(-s[3]) = 1. Therefore, d(d)/d(p[3]) = -1.

Similarly, you can compute d(d)/d(s[2]) = d(d)/d(p[2]) * d(p[2])/d(s[2]) + d(d)/d(p[3]) * d(p[3])/d(s[2]) = (-2) * 1 + (-1) * 0 = -2. And you can do the same for d(d)/d(s[1]) and d(d)/d(s[0]) as well.

Here, sigma is a non-negative quantity and is the output of a RELU function, which is non-differentiable at x = 0. When the input to ReLU is equal to or less than 0, we define a zero sub-gradient, which means during backprop, all the weights before RELU will get zero gradients and therefore won't get updated.

@peiyunh
Copy link

peiyunh commented Nov 22, 2023

In case you are interested, here is somewhat a more complete derivation: raytrace.pdf

  • note a typo in equation (2) where it should have been $$s_i = 1 - \exp^{-\sigma_i}$$

@zzzxxxttt
Copy link
Author

Very nice explaination, thanks @peiyunh !
As for the non-differentiable 0 in Relu, I tried set the sigma to [0.001, 0.001, 0.001, 0.001, 100], the returned gradient is [-3.9990, -2.9991, -1.9993, -0.9996, -0.0000], still very large near the origin, maybe the non-differentiable 0 is not the key point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants