Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review and test template #21

Closed
z-y-huang opened this issue May 9, 2019 · 21 comments
Closed

Review and test template #21

z-y-huang opened this issue May 9, 2019 · 21 comments
Assignees

Comments

@z-y-huang
Copy link
Contributor

Hi all,

The goal of this issue is for all assignees to:

  1. Read through the RA manual
  2. Check out a clean copy of template
  3. Following the RA manual and setup instructions in README.md, build the repository from start to finish
  4. Test template and attempt to break it; report any bugs or errors
  5. Provide feedback and identify room for improvement

Note that the current example scripts are pretty lacking and one of the items on our todos is to flesh them out to be more comprehensive. Let me know if you have any ideas here!

@z-y-huang
Copy link
Contributor Author

z-y-huang commented May 9, 2019

Notes:

  • Per discussions with @DavidRitzwoller, think through correct merge strategy for .log files in .gitattributes. Currently, merge strategy is set to binary (i.e., keep exact version in work tree, but leave file as conflicted). Alternatively, can set merge strategy to ours (i.e. resolve all conflicts by keeping exact version in work tree), but will require all clean checkouts to type git config merge.ours.driver true.

@DavidRitzwoller
Copy link

DavidRitzwoller commented May 9, 2019

Notes:

  • @z-y-huang. There's a file that throws a git-lfs error when we clone in to the repo.
    Screen Shot 2019-05-09 at 3 39 25 PM
  • Update the comment in the config_user_template to name the file config_user.yaml.
  • Initial set-up errors
      1. After executing python check_setup.py
        Screen Shot 2019-05-09 at 4 15 13 PM
      1. After adding future to requirements.txt and executing python check_setup.py
        Screen Shot 2019-05-09 at 4 20 51 PM
      1. We set my local version of tornado==4.5 to be compatible with my local version of python (2.7.3) and got
        Screen Shot 2019-05-09 at 4 27 11 PM

z-y-huang pushed a commit that referenced this issue May 9, 2019
z-y-huang pushed a commit that referenced this issue May 9, 2019
@z-y-huang
Copy link
Contributor Author

@DavidRitzwoller:

Addressed all the comments here except for (3)(c), where I'm not sure if there's a good non-local solution.

@DavidRitzwoller
Copy link

Small note: Let's add Rplots.pdf to the .gitignore.

@DavidRitzwoller
Copy link

Summary of conversation with @z-y-huang:

We wanted to clarify the protocol for the use of output_local, and how this differs from the build directory in the GSlab template.

Suppose we have a project with two active issues and two modules, data and analysis. I'm assigned to one issue and Zong is assigned to the other. In my issue, I change something about how we build our derived data. Zong is working on a set of plots that take the derived data as an input. The derived data are large enough that we do not want to commit. So, our build script should output to output_local, which should then be uploaded to external storage, and our analysis scripts should call the externally stored data.

But I don't want my work editing how we derive our data to interfere with Zong's plots, so until my work has been peer-reviewed, I shouldn't upload my output_local to the external storage. In a sense, we want the derived data stored externally to be a "master" branch. This suggests the following addendum to our Issue/ Pull Request work flow.

  1. Open issue branch
  2. Execute the requirements for a given issue
  3. Send to PR
  4. When PR is cleared, upload and relevant output_local to the external stroage
  5. Merge to master

Should an issue require work that spans both the analysis and data modules, the input path for the derived data in analysis should be changed to the output_local directory in data during development. Prior to merging to master, the derived data in the output_local directory in data should be uploaded to external storage, the input path should be reverted back to point to the external storage, and the analysis module should be rerun with the updated path.

@gentzkow
Copy link
Owner

Thanks @DavidRitzwoller. I like this workflow.

What I had been imagining was a different solution: everything on the external storage is manually versioned. So if there is an external directory called /reponame/analysis/ we don't store files directly there but rather store them in /reponame/analysis/1/, /reponame/analysis/2/, etc. Downstream modules call the versioned directories so if I add /reponame/analysis/3/ this does not break any existing code people might be working on. This has the additional advantage of making it more likely results are replicable in the future; it has the cost of chewing up more storage space.

In any case, I think it's a good idea to use something like your protocol to make sure data only gets "released" to the external drive once an issue is complete. We might allow some flexibility for cases where we want to release a version mid-issue, noting that this should only be when there is a good reason to do so.

z-y-huang pushed a commit that referenced this issue May 15, 2019
@mengsongouyang
Copy link

Hi @z-y-huang,

Here are my notes.

  • In practice, most projects are very big. I would imagine users want to have subdirectories under each module. For example, have /descriptive/ and /estimation/ under /analysis/. This is not difficult to implement. Users only need to create the subdirectories with similar structure and correctly specify the paths. Not sure if it would be better to add some sentences about this in the RA Manual.

  • In the Workflow page, I would prefer to put Merge after Peer Review, because merge is the step after peer review.

  • In the Repository Structure page, I would prefer to put Modules in front of Standard Modules, because Standard Modules are the higher-level structure. Also, I think it would be better to rename the titles of these two sections so that people can distinguish between them.

  • For the order of files/dirs in the Standard Module Structure, I would prefer to put externals.txt, inputs.txt, make.py to the front because inputs.txt and make.py are files that need to edit for every module.

Screen Shot 2019-05-17 at 5 43 46 PM

  • I think it would be better to be consistent in using plural or singular form for input, external, etc. Now, we use /input/ and /external/ for folders, inputs.txt and externals.txt for .txt files.

  • A typo in line 15 here:

Screen Shot 2019-05-17 at 5 20 47 PM

@DavidRitzwoller
Copy link

no progress in past two weeks

@z-y-huang
Copy link
Contributor Author

@mengsongouyang:

I addressed all of your comments. Thanks!

@DavidRitzwoller
Copy link

@z-y-huang. It seems like this thread is going quiet. Do you have any advice for how @mengsongouyang and I should proceed prior to wrapping this up? Would developing some example scripts be a useful contribution?

@z-y-huang
Copy link
Contributor Author

@DavidRitzwoller, @mengsongouyang:

It would be great if you two could try to break the template as much as possible to see if there are any bugs/errors I need to fix.

z-y-huang pushed a commit that referenced this issue Jul 6, 2019
z-y-huang pushed a commit that referenced this issue Jul 7, 2019
z-y-huang pushed a commit that referenced this issue Jul 8, 2019
z-y-huang pushed a commit that referenced this issue Jul 14, 2019
@z-y-huang
Copy link
Contributor Author

Changes implemented:

  • Added run_all.py so repository can be built with a single script
  • Migrated code in make.py scripts to gslab_make library to reduce clutter
  • Cleaned up traceback for Python 3
  • Added more description to error messages

z-y-huang pushed a commit that referenced this issue Jul 16, 2019
z-y-huang pushed a commit that referenced this issue Jul 18, 2019
@z-y-huang
Copy link
Contributor Author

Changes implemented:

  • Added run_module function to clean up run_all.py script
  • Added copy_file to facilitate exporting output locals

z-y-huang added a commit that referenced this issue Aug 13, 2019
* #21: Add Rplots.pdf to .gitignore

* Misc. quality of life changes

* Delete accidental file

* #21: Rename inputs.txt and externals.txt to singular

* #21: Standardize formatting and rebuild template from scratch

* Change traceback symbol

* #21: Simplified make.py, added run_all.py, and started removing redundant traceback for Python 3

* #21: Finish removing redundant traceback for Python 3, update documentation

* #21: Clean up codebase

* #21: Wordsmith error messages

* #21: Update move functions to pass file name for error messaging

* #21: Rename variables for move functions

* #21: Bugfix for new error messages

* #21: Add color to more error messages

* #21: Add run_module function

* #21: Test run_module function; add copy_file function

* #21: Clean up codebase

* #21: Clean up codebase; add try/exception handling
z-y-huang added a commit that referenced this issue Aug 13, 2019
z-y-huang added a commit that referenced this issue Aug 13, 2019
* #21: Add Rplots.pdf to .gitignore

* Misc. quality of life changes

* Delete accidental file

* #21: Rename inputs.txt and externals.txt to singular

* #21: Standardize formatting and rebuild template from scratch

* Change traceback symbol

* #21: Simplified make.py, added run_all.py, and started removing redundant traceback for Python 3

* #21: Finish removing redundant traceback for Python 3, update documentation

* #21: Clean up codebase

* #21: Wordsmith error messages

* #21: Update move functions to pass file name for error messaging

* #21: Rename variables for move functions

* #21: Bugfix for new error messages

* #21: Add color to more error messages

* #21: Add run_module function

* #21: Test run_module function; add copy_file function

* #21: Clean up codebase

* #21: Clean up codebase; add try/exception handling
@DavidRitzwoller
Copy link

no progress in past 2 weeks

@DavidRitzwoller
Copy link

no progress in past 2 weeks

@DavidRitzwoller
Copy link

no progress in past two weeks

2 similar comments
@DavidRitzwoller
Copy link

no progress in past two weeks

@DavidRitzwoller
Copy link

no progress in past two weeks

@gentzkow
Copy link
Owner

gentzkow commented Dec 10, 2019 via email

@DavidRitzwoller
Copy link

Thanks @gentzkow. @mengsongouyang and I completed initial rounds of testing this summer. @z-y-huang is there anything in particular you would like us to look at?

@z-y-huang
Copy link
Contributor Author

@DavidRitzwoller:

I'll close this for now--currently implementing some changes, but may recruit you all again once those are finished.

Summary:
First round of review by GSLab RAs. Incorporated feedback regarding repository structure and workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants