Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjusted code to allow users to modify mortality assumptions in cstwMPC #214

Closed
wants to merge 3 commits into from

Conversation

jkoosak1
Copy link

No description provided.

@mnwhite mnwhite changed the title Adjusted code to allow users to modify mortality assumptions Adjusted code to allow users to modify mortality assumptions in cstwMPC Dec 17, 2018
@mnwhite
Copy link
Contributor

mnwhite commented Dec 17, 2018

  1. File names should not have spaces in them.

  2. You deleted the 2 lines of code that specify the default behavior.

  3. You put new mortality data files into the cstwMPC project folder, with descriptive names, then direct the user to use a file specifically named alternativedata.txt in their downloads folder. No file will actually download with that name, and people do not usually do work in their downloads folder. Moreover, your code requires the user to copy the new mortality table data you added into their downloads folder and rename it to use it. That's not a normal workflow.

  4. Basically none of the code you added to SetupParamsCSTW.py is necessary. The file already has the capability to use a different mortality table: just change the name of the file in the one line where it appears! It's fine to add a line at the top of the file that says mortality_file_name = 'SSAactuarial.txt', and then have the line that opens the file refer to that variable, but the block of you added is unnecessary. Python will throw an error on its own if someone tries to open something that is not a file.

  5. The instructions for how to change the mortality file are not useful. You can trust that the user knows to put a file in a specific place and change a name. If you want to make it easier to change mortality assumptions, the relevant documentary information is the exact file format the code expects: tab-delimited text, survival probabilities in column X, starting from age Y, etc.

  6. More generally, I'm not sure we want to edit cstwMPC to add any new capabilities. That's a discussion for outside this PR thread, however.

@llorracc
Copy link
Collaborator

llorracc commented Dec 18, 2018 via email

@mnwhite
Copy link
Contributor

mnwhite commented Dec 18, 2018

This goes back to our discussion of what is in "HARK itself" vs what is an "application of HARK". Where I thought we landed this summer was that cstwMPC, SolvingMicroDSOPs, and (maybe?) StickyE would be distributed in HARK and initially live in the user's sitepackages directory (where they should not touch it).

BUT: There would be a top level function that we advertise, like HARK.installExamples(path), that the user should run right after pip installing HARK. This would copy all of the "example application" files into the specified path, in a working directory of the user's choosing. Then they could muck about and experiment with (say) alternate mortality tables or income risk sequences in cstwMPC, the exact sort of thing we do want people to do.

Didn't Nate or one of the OTS people write this function right around the time I was at SciPy? If so, your students can do things like add additional mortality processes into the project files (and no need to have a weird wonky workaround to the "don't touch code in your sitepackages" rule).

@@ -0,0 +1,120 @@
0 0.092840 100000.00 49.820000 0.082350 100000.00 53.600000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's problematic to distribute data without proper license information. It's not at all clear that this data can be shared like this, and it doesn't contain a source either. What website/researcher/other was this obtained from? In general, it can be much easier to supply a function to generate such a table to avoid these types of considerations.

@@ -0,0 +1,120 @@
0 0.00199 100000 80.98 0.00202 100000 87.13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above (Matt already mentioned the no-spaces-in-filenames point.)

@@ -112,9 +112,24 @@
slope_prev = 1.0 # Initial slope of kNextFunc (aggregate shocks model)
intercept_prev = 0.0 # Initial intercept of kNextFunc (aggregate shocks model)

from pathlib import Path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pathlib is not part of Python prior to v 3.3, so as long as HARK hasn't dropped support for older versions of Python than v3.3, this has to be handled.

# or other countries, they could do so by naming the alternative data file "alternativedata.txt" and save this in the "/home/users/Downloads/" folder


README = Path("/home/users/Downloads/alternativedata.txt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matt touched upon this already, but I'm not sure he mentioned the following: why not just allow for the user to provide a string? No reason to hardcode this. Also, there's no guarantee to even have a /home/users/Downloads folder on other OS's than the one this was developed on (MacOS? unless the user is called users and this is linux... even so, Windows wouldn't appreciate this!)

Edit: oh, and README is probably not a good name for this. A readme is typically a file expected to be read by the user before using the software. This is probably meant as the file saying: "Hey there, read me not the default file", but something to the tune of SurvivalPath or YourPath or whatever is probably preferable (or lower case versions depending on your likings).


README = Path("/home/users/Downloads/alternativedata.txt")
if README.is_file():
print ("File does exist")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a mistake. The file does exist, that was what that if README.is_file(): just tested

if README.is_file():
print ("File does exist")
data_location = os.path.dirname(os.path.abspath(__file__))
f = open(data_location + '/' + 'alternativedata.txt','r')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole purpose of pathlib is to provide object oriented path-handling functionality. I think something like this is more idiomatic

with YourPath.open() as f:
    actuarial_reader = csv.reader(f,delimiter='\t')

@@ -0,0 +1,7 @@
Instructions on modifying the mortality assumptions used in Section 5 of Carroll et al. (2017)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's preferable to put information like this in a docstring, manual and/or inline comment. It must also describe the format.

@pkofod
Copy link
Contributor

pkofod commented Dec 19, 2018

Welcome !

Seems like this is your first open source contribution. Do not get discouraged by the many many comments. Few PRs to larger projects get through without corrections and or discussion. Last year I made a PR for another project that managed to collect almost 147 comments in just over a month and I committed 39 changes 💫 JuliaLang/julia#22603

To the extent that this code is meant to be useful in a fashion beyond the application provided currently, it is of course interesting to modularize the data handling. I think there's a much simpler way to do it though: simply allow the user to provide a path when constructing the AgentType. If the path is provided (as a string, unless we want to add a pathlib2 dependency for python 2.x users (I guess it also works on 3.y<3.3?) use that data, if not use the default.

Now, if you want the users to be able to do this, then you need to document exactly how to do it. What are the columns, etc.

An alternative is simply to require the mortality information to be passed in vector by vector. These vectors can of course be produced from a csv file's columns, but also simulated in memory for educational/academic purposes.

Based on @mnwhite and @llorracc 's exchange above, I think the most important thing to agree upon is: is cstwMPC meant to be generalized like this, or is it meant to provide a very specific implementation and application for illustrative purposes.

@shaunagm shaunagm assigned llorracc and unassigned pkofod May 30, 2019
@llorracc llorracc closed this May 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants