Treatment of weights in Records object constructor, Part 1 #1439

hdoupe · 2017-06-20T19:19:24Z

I'm opening this issue in response to PR #1429.

In the case where a subsample of the PUF file is passed to a Records object, I'm a bit confused as to why we scale the weights by the full sample size divided by the subsample size. It would make more sense to me to scale the weights by the the sum of the weights in the full sample divided by the sum of the weights in the subsample.

# we are doing this:
wt = wt * (N_fullsample / N_subsample)
# but why don't we do it this way
wt = wt * (sum(fullsample_weights) / sum(subsample_weights))

I feel like I'm missing something. I'd appreciate any thoughts that you all may have on this.

cc @martinholmer

The text was updated successfully, but these errors were encountered:

martinholmer · 2017-06-20T22:06:28Z

Thanks for opening an issue on this matter. I'll be back from Canada next week and I'll study this.

feenberg · 2017-06-20T23:14:14Z

On Tue, 20 Jun 2017, Henry Doupe wrote: I'm opening this issue in response to PR #1429. In the case where a subsample of the PUF file is passed to a Records object, I'm a bit confused as to why we scale the weights by the full sample size divided by the subsample size. It would make more sense to me to scale the weights by the the sum of the weights in the full sample divided by the sum of the weights in the subsample.

Same in expected value terms, but more accurate. dan

…

# we are doing this: wt = wt * (N_fullsample / N_subsample) # but why don't we do it this way wt = wt * (sum(fullsample_weights) / sum(subsample_weights)) I feel like I'm missing something. I'd appreciate any thoughts that you all may have on this. cc @martinholmer — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.[AHvQVV1hSk4wGw9PmTBgyt7o28nssHuDks5sGBs9gaJpZM4OACPP.gif]

hdoupe · 2017-06-21T17:14:53Z

@feenberg Thanks for the reply.

Same in expected value terms, but more accurate.

That makes sense, but more accurate in what way?

martinholmer · 2017-06-27T11:56:14Z

@hdoupe said:

@feenberg Thanks for the reply.

Same in expected value terms, but more accurate.

That makes sense, but more accurate in what way?

I think the point @feenberg is making is that when picking an unweighted sample, scaling up the sample weights in these two ways produces the same expected value of the sum of the scaled-up sample weights. But that using the scale-up factor you suggest (the ratio of the weights) produces a sum of the scaled-up weights that has a lower variance than using a scale-up factor that is simply the ratio of the unweighted counts (as in the current code). By lower variance, I mean when the random-number seed changes that sum of the scaled-up weights will vary less (when using the ratio of the weights for the scale-up factor).

Independently, scaling up the sample weights using the ratio of the weights (instead of ratio of the raw counts) is the correct approach when picking a weighted sample.

Does that make sense?

hdoupe · 2017-06-27T14:16:26Z

@martinholmer Let me make sure that I have this right. We are trying to choose an estimator for the factor that will be used to scale up the weights. For an unweighted sample, we expect this factor to be 1/subfrac. While both the ratio of the weights and the ratio of the sample sizes are the same in expected value, the ratio of the sample sizes has a lower variance. Thus, it is a better choice. That makes sense to me.

For the weighted sample, the ratio of the weights becomes a better estimator for the scaling factor. In my experience with PR #1429, the ratio of the weights was closer to 4% when subfrac was set to 2%. So I think this makes sense to me, too.

martinholmer · 2017-06-27T14:22:14Z

@hdoupe said in issue #1439:

Let me make sure that I have this right. We are trying to choose an estimator for the factor that will be used to scale up the weights. For an unweighted sample, we expect this factor to be 1/subfrac. While both the ratio of the weights and the ratio of the sample sizes are the same in expected value, the ratio of the sample sizes has a lower variance. Thus, it is a better choice. That makes sense to me.

For the weighted sample, the ratio of the weights becomes a better estimator for the scaling factor. In my experience with PR #1429, the ratio of the weights was closer to 4% when subfrac was set to 2%. So I think this makes sense to me, too.

This is the way I understand these matters. So, yes, I think you "have this right."

Next question is whether you think pull request #1441 is sensible.

hdoupe · 2017-06-27T14:33:36Z

@martinholmer Ok, thanks for helping me understand this.

martinholmer mentioned this issue Jun 27, 2017

Treatment of weights in Records object constructor, Part 2 #1440

Closed

martinholmer changed the title ~~Treatment of weights in Records object constructor~~ Treatment of weights in Records object constructor, Part 1 Jun 27, 2017

martinholmer mentioned this issue Jun 27, 2017

Improve sample weight adjustment #1441

Merged

hdoupe closed this as completed Jun 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Treatment of weights in Records object constructor, Part 1 #1439

Treatment of weights in Records object constructor, Part 1 #1439

hdoupe commented Jun 20, 2017

martinholmer commented Jun 20, 2017

feenberg commented Jun 20, 2017 via email

hdoupe commented Jun 21, 2017 •

edited

Loading

martinholmer commented Jun 27, 2017

hdoupe commented Jun 27, 2017

martinholmer commented Jun 27, 2017

hdoupe commented Jun 27, 2017

Treatment of weights in Records object constructor, Part 1 #1439

Treatment of weights in Records object constructor, Part 1 #1439

Comments

hdoupe commented Jun 20, 2017

martinholmer commented Jun 20, 2017

feenberg commented Jun 20, 2017 via email

hdoupe commented Jun 21, 2017 • edited Loading

martinholmer commented Jun 27, 2017

hdoupe commented Jun 27, 2017

martinholmer commented Jun 27, 2017

hdoupe commented Jun 27, 2017

hdoupe commented Jun 21, 2017 •

edited

Loading