Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update truncated/censored notebook to v4 #271

Merged
merged 8 commits into from
Jan 28, 2022

Conversation

drbenvincent
Copy link
Contributor

Following #261, this pull request updates GLM-truncated-censored-regression.ipynb

Changes:

  • uses pm.Censored
  • I've hopefully updated it to conform to the new style guide

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Jan 12, 2022

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2022-01-12T06:07:13Z
----------------------------------------------------------------

I think in the limit of zero noise, estimates would still be biased for the censored case (but be fine in the truncated one)


drbenvincent commented on 2022-01-12T09:02:15Z
----------------------------------------------------------------

Good point. Amending this text

@review-notebook-app
Copy link

review-notebook-app bot commented Jan 12, 2022

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2022-01-12T06:07:13Z
----------------------------------------------------------------

This is outdated, since the code shows no potentials anymore. You could says its equivalent to what Censored does, but I would perhaps not refer to potentials anymore


@review-notebook-app
Copy link

review-notebook-app bot commented Jan 12, 2022

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2022-01-12T06:26:39Z
----------------------------------------------------------------

where is that function defined?


drbenvincent commented on 2022-01-12T17:19:31Z
----------------------------------------------------------------

This reviewnb doesn't show all cells. But it is actually there and defined...

def truncated_regression(x, y, bounds):
  with pm.Model() as model:
    slope = pm.Normal("slope", mu=0, sigma=1)
    intercept = pm.Normal("intercept", mu=0, sigma=1)
    σ = pm.HalfNormal("σ", sigma=1)

    pm.TruncatedNormal(
      "obs",
      mu=slope * x + intercept,
      sigma=σ,
      observed=y,
      lower=bounds[0],
      upper=bounds[1],
    )
  return model

Copy link
Contributor Author

Good point. Amending this text


View entire conversation on ReviewNB

Copy link
Contributor Author

This reviewnb doesn't show all cells. But it is actually there and defined...

def truncated_regression(x, y, bounds):
  with pm.Model() as model:
    slope = pm.Normal("slope", mu=0, sigma=1)
    intercept = pm.Normal("intercept", mu=0, sigma=1)
    σ = pm.HalfNormal("σ", sigma=1)

    pm.TruncatedNormal(
      "obs",
      mu=slope * x + intercept,
      sigma=σ,
      observed=y,
      lower=bounds[0],
      upper=bounds[1],
    )
  return model


View entire conversation on ReviewNB

@drbenvincent
Copy link
Contributor Author

All comments so far have been dealt with. Any other major issues to address?

@review-notebook-app
Copy link

review-notebook-app bot commented Jan 12, 2022

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2022-01-12T21:08:20Z
----------------------------------------------------------------

This paragraph seems to contradict the next one: "is a little more involved" vs "it is really straightforward"

drbenvincent commented on 2022-01-12T22:45:42Z
----------------------------------------------------------------

Well spotted. I deleted the whole paragraph as it is no longer true. Not for the user anyway.

@review-notebook-app
Copy link

review-notebook-app bot commented Jan 12, 2022

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2022-01-12T21:27:07Z
----------------------------------------------------------------

The target should be (GLM-truncated-censored-regression)=. The id points to the title right below, so it can be interpreted as making the id "equal" to the section title in a referencing meaning.

I would also add the tag generalized linear model given it is in that folder.


drbenvincent commented on 2022-01-12T22:44:45Z
----------------------------------------------------------------

I did both of these things now.

@review-notebook-app
Copy link

review-notebook-app bot commented Jan 12, 2022

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2022-01-12T21:27:08Z
----------------------------------------------------------------

use {class}'pymc.Censored' (but using backticks instead of apostrophes, otherwise there are reviewnb rendering issues) for the first occurrence.


drbenvincent commented on 2022-01-12T22:46:08Z
----------------------------------------------------------------

I believe I've done this right. Let me know if not.

@review-notebook-app
Copy link

review-notebook-app bot commented Jan 12, 2022

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2022-01-12T21:27:09Z
----------------------------------------------------------------

The plot feels a bit strange now. I can try to have a go at in and add an arrow, like in the rectified gaussian from https://en.wikipedia.org/wiki/Rectified_Gaussian_distribution#/media/File:Truncated_Gaussian.jpg. Matplotlib is quite nice with arrows. I probably won't have time to do it until after a couple days though so if someone else can do it go ahead

as a first approximation I think the code would look like:

ax.arrow(-1, 0, -1, .35, color="C1")

drbenvincent commented on 2022-01-12T22:47:56Z
----------------------------------------------------------------

I know what you mean, but I'm not 100% convinced an arrowhead on top of the line there already would be a big improvement. But I can come back to this if you think it's a big deal

ricardoV94 commented on 2022-01-13T04:36:24Z
----------------------------------------------------------------

What about a scatter point with a label "truncation bound"?

OriolAbril commented on 2022-01-13T05:10:29Z
----------------------------------------------------------------

on top of the line there already

I meant replacing the spiking like by an arrow, like it's done in the example image (copied below). It's not a strong preference though.

drbenvincent commented on 2022-01-13T12:28:07Z
----------------------------------------------------------------

I wasn't sure how to interpret that Ricardo.

Oriol: I might be misunderstanding, but the height of the line/arrow matters, equating to the cumulative density up to the lower bound. So in practice it would look the same as it does now but with an arrow on the top.

Maybe I switch back to a histogram with narrow bins? Although I think it's reasonable as it is

@review-notebook-app
Copy link

review-notebook-app bot commented Jan 12, 2022

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2022-01-12T21:27:10Z
----------------------------------------------------------------

There is no need to remove the link to your profile. If other notebooks don't do it is because they didn't and I did not add it, but so far when updating notebooks from other people I have always kept their personal links if they where there.


drbenvincent commented on 2022-01-12T22:46:45Z
----------------------------------------------------------------

Thanks. Have added links to GitHub profile back in.

@review-notebook-app
Copy link

review-notebook-app bot commented Jan 12, 2022

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2022-01-12T21:27:10Z
----------------------------------------------------------------

very minor nit, can you add a watermark title right above and remove xarray from the -p flag? it is imported in this notebook so adding it there ends up duplicating it in the watermark


drbenvincent commented on 2022-01-12T22:46:58Z
----------------------------------------------------------------

Done

Copy link
Contributor Author

I did both of these things now.


View entire conversation on ReviewNB

Copy link
Contributor Author

Well spotted. I deleted the whole paragraph as it is no longer true. Not for the user anyway.


View entire conversation on ReviewNB

Copy link
Contributor Author

I believe I've done this right. Let me know if not.


View entire conversation on ReviewNB

Copy link
Contributor Author

Thanks. Have added links to GitHub profile back in.


View entire conversation on ReviewNB

Copy link
Contributor Author

Done


View entire conversation on ReviewNB

Copy link
Contributor Author

I know what you mean, but I'm not 100% convinced an arrowhead on top of the line there already would be a big improvement. But I can come back to this if you think it's a big deal


View entire conversation on ReviewNB

Copy link
Member

What about a scatter point with a label "truncation bound"?


View entire conversation on ReviewNB

Copy link
Member

OriolAbril commented Jan 13, 2022

on top of the line there already

I meant replacing the spiking like by an arrow, like it's done in the example image (now copied below). It's not a strong preference though.

imatge


View entire conversation on ReviewNB

Copy link
Contributor Author

I wasn't sure how to interpret that Ricardo.

Oriol: I might be misunderstanding, but the height of the line/arrow matters, equating to the cumulative density up to the lower bound. So in practice it would look the same as it does now but with an arrow on the top.

Maybe I switch back to a histogram with narrow bins? Although I think it's reasonable as it is


View entire conversation on ReviewNB

@drbenvincent
Copy link
Contributor Author

Just a quick judge to @OriolAbril if I may? Any strong objections to leaving this as it is?
Screenshot 2022-01-18 at 17 43 14

@OriolAbril
Copy link
Member

MIssed the previous response. No objections to leaving the plot as it is.

but the height of the line/arrow matters, equating to the cumulative density up to the lower bound

Again, no opposition to not updating the plot, we can go ahead and merge the notebook as is. It already looks like a spike and that is what matters to me. I do have one nit to add on that comment.

I don't think the quoted comment is true though. I think what matters is the area, not the height (which given the 0 width would be infinite) otherwise the pdf woudn't be normalized anymore right?

Setting the height to the value of the area (as a rule, not in this case where it already looks like a spike) could be confusing (even end up lower than the rest of the pdf!) if you take the (b) subfigure of the image I shared above for example, the area represented in the spike is 0.5 but its height is ~0.75 because otherwise it would probably look to close to the rest of the pdf. Moreover, if sigma of the normal were lower, then the mode would have higher pdf value:

imatge

the same censoring at 0 on the 0, .2 normal would end up with the "spike" way below the pdf, making the censoring not noticeable unless the spike is make higher than its area.


After writing that I realize we probably need to update some of the wording on the PR, maybe also the docs? The intuition is still the same: All the probability before/after the bound is concentrated at the bound. But the way to achieve this mathematically is not by setting pdf(lower)=cdf(lower) but by adding a delta with that area at the boundary.

@OriolAbril OriolAbril merged commit 81e6ffb into pymc-devs:main Jan 28, 2022
@OriolAbril
Copy link
Member

related to #90

@drbenvincent drbenvincent deleted the censored_v4 branch January 28, 2022 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants