-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making distance calculations manageable #22
Comments
Here is an approach I have used with large matching problems in the past that has worked well:
Anyway, this is one approach that should be practical. Variants are possible, including a variant that incorporates ideas from the predictive mean matching that OSPC does between the PUF and CPS. |
I think that's a great alternative. If it proves too slow, maybe then we
should reconsider the indexing.
…On Fri, Dec 14, 2018 at 7:49 PM Max Ghenis ***@***.***> wrote:
This makes sense but probably precludes usage of scipy.cdist, which runs
all pairwise comparisons for two tables. Since this is more optimized
(written in C) how about adapting it to still use buckets, and include +/-1
bucket, like this (could be AGI or something simpler):
[image: image]
<https://user-images.githubusercontent.com/6076111/50036753-41f24400-ffc0-11e8-835b-bb557054a2f0.png>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#22 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AGPEmLDzRXNoDbptGVyOEMpnwf06UQH5ks5u5Ec3gaJpZM4ZRMv0>
.
|
Could you share how you're running Tax-Calculator? Is it from the CLI? @andersonfrailey what would be the easiest way to call Tax-Calculator on synpuf to (a) ensure validity and (b) calculate AGI, from Python? |
I build a Windows system command to call CLI from R, as follows. I presume something similar would be practical in Python.
|
@MaxGhenis asked:
I would say what @donboyd5 has done works. Alternatively we could create a short python script that does something like take the name of the file as an argument, run it through Tax-Calculator, and save a new file with AGI included. Would we also want to run the file through the reforms we've included in this repo? I'd be happy to work on this. |
Great item for our call today. I think it would be very valuable to stack one or more files together (e.g., puf and synpuf variants 1 and 2) and then run them through Tax-Calculator. It would be important to not simply duplicate what @feenberg is doing. A few thoughts about that:
|
Simply FYI: @MaxGhenis I suspect your blocking approach with +/- 1 income groups will be sufficiently fast, but here are two possible additional ideas to consider if it turns out to be too computationally-intensive:
|
Interesting, I'll try the Based on my Python kernel crashing, I suspect the bigger issue is the large matrices being stored in memory, rather than the computation time. More efficient from this perspective would then be finding the nearest record for each synthetic record, rather than aggregating the full matrix at the end. This will need to be vectorized since I'd expect loops to take forever, and parallelizing like parDist could help. This SO question could be relevant. |
Issue:
We need a better way to reduce the problem size.
The text was updated successfully, but these errors were encountered: