Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shift template compiles from LyX to Overleaf #84

Closed
snairdesai opened this issue Jul 25, 2023 · 42 comments
Closed

Shift template compiles from LyX to Overleaf #84

snairdesai opened this issue Jul 25, 2023 · 42 comments
Assignees

Comments

@snairdesai
Copy link
Contributor

snairdesai commented Jul 25, 2023

The purpose of this issue (#84) is to address a request from @gentzkow following ongoing discussions around our approach to compiling .tex and .lyx documents in the template structure. Currently, we build the ~/paper_slides module using the LyX document processor and a local TeX interpreter, by converting .lyx files to .pdf files.

The LyX program is now infrequently updated and has presented frequent roadblocks as other template dependencies have evolved. To address, this, @jc-cisneros and I will investigate the possibility of substituting from LyX to Overleaf for our standard paper development process. Ideally, this would include the ability to split our Overleaf workflow with pull requests and development branches.

@snairdesai
Copy link
Contributor Author

@ShiqiYang2022; @simonepandit; and @Erick11293: To increase familiarity with template, and get a sense of how we revamp lab development/workflow procedures, @jc-cisneros and I thought it might be a good idea to have you three support us with this task. Please see a description of the proposed deliverable in the comment above. A few notes:

  • My sense is this should be lower priority than any other ongoing issues you have been/will be assigned outside of anything related to template and gslab-make. Within any issues related to template and gslab-make, we should prioritize wrapping this one first (@jc-cisneros and I will also be working on this and other issues here).
  • We can't currently add you as assignees to this repository, as you need to be granted access to gentzkow/template.
    • Let us know if you can make commits on the associated development branch for this issue (84_shift_overleaf). If not, you may need to fork this branch and make edits separately, commenting your proposals in this thread.

Thanks for the help!

@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Jul 25, 2023

Thanks, @snairdesai! I invested in this issue and I suggest this issue can be broken into three tasks as listed:

Task List

@snairdesai @jc-cisneros Feel free to edit and comment your thoughts!

cc: @simonepandit @Erick11293

@snairdesai
Copy link
Contributor Author

Thanks @ShiqiYang2022! The main purpose of this issue is simply to switch our compiler from LyX to Overleaf. Right now, Overleaf allows integration/synchronization with GitHub for premium accounts, but only for the master branch of repositories. Our goal would be to find a way to enable integration for development branches as well. To specify further: in LyX, we can run paper_slides for any branch, and see the outputs immediately populate in that branch. In Overleaf, we currently need to merge back to master before seeing edits populate on any shared Overleaf files.

This is not something that we need to touch code in template yet for (i.e., the bullets you have provided above are not relevant for now). We simply want to investigate the Overleaf to GitHub synchronization further, and see if there are any potential avenues to facilitate development edits. After we have a sense of what can be done, we can test this with template.

@ShiqiYang2022
Copy link
Collaborator

Thanks @snairdesai for the clarification!

I am posting notes and updates on to-do list here, per meeting with @simonepandit @Erick11293 we agreed on the following proposal, listed by priorities:

  1. Investigate on Overleaf-GitHub synchronization. Currently there's four proposals:

(1) Fork a repository from the development branch(so it becomes a master branch). Pull this repository to overleaf and make edits, then push back to the forked branch, merge it back to development branch and then merge back to master.
(2) Create a new repository locally and pull the development branch, push it onto overleaf to make edits, then pull it back to the local repository and push back to development branch.
(3) Split an issue into two sub-issues: issue without and with the change in .tex file. For the former, we can proceed as what we proceeds normally. For the latter, we can directly edit on overleaf and sync to the master branch.
(4) Use the integrations provided by overleaf which enable creating and synchronizing local copies of projects, and then we may need not to use overleaf online. @Erick11293 proposed this and please feel free to add anything here.

Proposals

Pros and Cons with respect to these four proposals, are briefly discussed as follows:

(1) In general, we would like to make the synchronization as easy as possible. So Ideally we should make the development branch directly synchronized to overleaf. But this is not technically applicable currently.
(2) For (1) and (2), they especially benefit much when (a) the issue is big(maybe need tens and hundreds of discussions and replies), and (b) there exists multi-issues proceeding at one time.
(3) For (3), it is efficient when the issue is small, or the edits are mostly/only related to .tex files. But it's not suggested to be implemented when there are multi-tasks.
(4) For (4), we should check whether there's a method to update the integrations regularly, unless it will generate the outdated problem as Lyx does.

Pros and Cons
  1. Switch the .lyx formats in paper_slides/code to .tex formats
  2. Edit and test the output .tex formats to ensure their compatibility in overleaf.

We decide to investigate on 1 primarily and @snairdesai @jc-cisneros Please let us know what's your thoughts for that, thanks!
cc: @simonepandit @Erick11293

@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Jul 26, 2023

Updates of Proposal (1)

I tested the feasibility of implement (1) using my template. When there's an overleaf issue, the actions can be broken down into the following parts:

  1. Open a new repository
My Example
  1. Clone the development branch to the new repository(so it becomes a master branch)
  • First switch to the branch you want to sync with:
    git checkout 7-link-the-issue-related-branch-to-other-repository
  • Add template_overleaf_sync as a new remote repository in local repository:
    git remote add https://github.com/ShiqiYang2022/template_overleaf_sync
  • Push the branch to template_overleaf_sync
    git push template_overleaf_sync 7-link-the-issue-related-branch-to-other-repository
My Example
  1. Pull the master branch into overleaf, make edits and push back
  • In overleaf mainpage, click "New Project" - "Import from Github", and select the new repository "template_overleaf_sync".
  • Make edits, in my example, I edited the title of README.md from "README" to "README(Edited By Shiqi)".
  • Push back. In your project, click "Menu" - "Sync-GitHub" and then "Push Overleaf Changes to Github". Commit an message. You can view my records here.
My Example
  1. Merge the master branch in new repository back to development branch
  • Pull the 7-link-the-issue-related-branch-to-other-repositoryin template_overleaf_sync into local environment
    git fetch template 7-link-the-issue-related-branch-to-other-repository
  • Create a new local branch called my-temp-branch and set it to the recently pulled remote branch:
    git checkout -b my-temp-branch template_overleaf_sync/7-link-the-issue-related-branch-to-other-repository
  • Shift to the development branch and merge the content of my-temp-branch into it, and push back:
    git checkout 7-link-the-issue-related-branch-to-other-repository
    git merge my-temp-branch
    git push origin 7-link-the-issue-related-branch-to-other-repository
  • And then you can view the commit in the development branch here.
My Example

By implement (1), it addressed the issue of Overleaf-GitHub synchronization by adding a new parallel repository and correspondingly a new parallel overleaf project.

@simonepandit @Erick11293 please feel free to test this proposal on your end and comment/discuss on that!
@snairdesai @jc-cisneros any suggestions with respect to this proposal are highly welcomed!

@Erick11293
Copy link

@ShiqiYang2022 as I see you didn't need to fork the repository, right? You make a local copy of the repository and created a new one from the branch that you copy. So, I this approach is more close to (2) than (1).

@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Jul 26, 2023

@ShiqiYang2022 as I see you didn't need to fork the repository, right? You make a local copy of the repository and created a new one from the branch that you copy. So, I this approach is more close to (2) than (1).

Thanks @Erick11293! I understand your point. Currently it is not feasible to directly fork a branch other than the master branch on github, and given that we have our local copy of the repository almost the time, so I make the copy of the local side of development branch to create the remote new repository.

@gentzkow
Copy link
Owner

@ShiqiYang2022 Thanks for the great work here so far. I'm really interested to see where this goes. A couple of thoughts to throw in.

  1. In terms of the broader goals of this issue, we will want to maintain both Lyx and .tex versions of the template. I think the best way to do that will be to move the Lyx version of paper_slides into extensions/lyx.

  2. Your proposal (1)/(2) involving creating a new repository is interesting. Just to make sure I understand the proposed workflow, is the idea that the extra repository would be created by an individual user on their individual Github account and be used solely as a pipeline to Overleaf? I.e., these extra repositories would never show up on the lab Github and they would be invisible to everyone other than the lab user that created them? If so, I like that in principle.

  3. A question is how we handle things on the Overleaf side. My understanding is that a given Overleaf project only has a single branch. So if we have two issues open in parallel, with two associated development branches in Github, it won't be possible to be using a single shared project tin Overleaf for both of them simultaneously. Is the idea that we'd need to create a separate Overleaf project for each issue/dev branch?

@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Jul 28, 2023

@gentzkow Thanks! Replies to your thoughts:

  1. In terms of the broader goals of this issue, we will want to maintain both Lyx and .tex versions of the template. I think the best way to do that will be to move the Lyx version of paper_slides into extensions/lyx.

Agree! Will investigate on that.

  1. Your proposal (1)/(2) involving creating a new repository is interesting. Just to make sure I understand the proposed workflow, is the idea that the extra repository would be created by an individual user on their individual Github account and be used solely as a pipeline to Overleaf? I.e., these extra repositories would never show up on the lab Github and they would be invisible to everyone other than the lab user that created them? If so, I like that in principle.

Yes. This extra repository only appears on individual Github account and only used as a pipeline to overleaf. I think we can set repository visibility to "private" and only invite collaborators, to make sure this is used solely a pipeline.

Will test its feasibility on my end!

  1. A question is how we handle things on the Overleaf side. My understanding is that a given Overleaf project only has a single branch. So if we have two issues open in parallel, with two associated development branches in Github, it won't be possible to be using a single shared project tin Overleaf for both of them simultaneously. Is the idea that we'd need to create a separate Overleaf project for each issue/dev branch?

Yes. Your understanding is correct. Currently Overleaf do not support synchronization across projects. It would indeed be challenging to sync development branches of multiple parallel issues all simultaneously with a single Overleaf project. One workaround, as you mentioned, would be to create a separate Overleaf project for each issue/development branch.

Some challenges that might come with this approach. (1) We need to manage multiple Overleaf projects concurrently, each potentially with its issue-collaborators. This might introduce additional errors/confusions. (2) It may also involve merging conflicts when trying to integrate the changes from the separate Overleaf projects/branches back into a single main branch on GitHub.

For (1), we cannot solve this due to the limitation of overleaf itself. For (2), since Overleaf (and LaTeX in general) allows to split your document into multiple independent .tex files, and combine them together when compiling the final document. Then for most cases, we might be able to have each issue worked on in its own .tex file, to ensure changes don't conflict when they are eventually merged.

@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Aug 1, 2023

Updates of Proposal (2)

I tested the feasibility of implement (2) using my template. When there's an overleaf issue, the actions can be broken down into the following parts:

  1. Clone the master branch of the repository into a new folder at local end, and follow the setup instructions.

  2. Create an parallel Overleaf project from the local master branch, following the instructions in Overleaf's Git integration.

  • Known limitations of git integration on overleaf can be viewed here.
  • Overleaf Git system does not support branching, so I have to use the master branch.
  • We have submodule lib/gslab_make in template, so we cannot directly push the whole project to overleaf. Instead, we need to move the submodule lib/gslab_make somewhere else when integrating to overleaf.
Limitations
  • In step 5 and 7 of the instruction, replace "main" by "master".
  • Set git config pull.rebase false before step 5.
Minor details to be noticed
  1. Make edits on overleaf, and pull the edits to master branch.

  2. Merge the edits in master branch to development branch locally, and push to the development branch to github.

By implement (2), it addressed the issue of Overleaf-GitHub synchronization by creating a new parallel folder containing master branch locally and correspondingly a new parallel overleaf project.

@simonepandit @Erick11293 FYI.

@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Aug 1, 2023

Discussion of Proposal (1) and (2)

Proposal (1) require creating parallel repositories and parallel overleaf projects; and sync by GitHub Synchronization.
Proposal (2) require creating new parallel folders containing master branch locally and parallel overleaf projects; and sync by Git integration.

Some thoughts on the two proposals:

Proposal (1) proposes the creation of an additional parallel repository that only appears on an individual's Github account and is solely used as a pipeline to Overleaf. This ensures any changes or edits made are confined to this specific repository and won't affect the main repository until they are explicitly merged.

Instead, Proposal (2), inherently holds some potential risks due to its continuous reliance on syncing the original repository's master branch with Overleaf. This could potentially lead to inadvertent errors, as the master branch is conventionally seen as the 'safe' branch. Any unintended changes or incorrect edits made while syncing with Overleaf could jeopardize the stability of the master branch, hence leading to potential disruptions in the production environment.

Risks to the stability of the production-ready code

Both Proposal (1) and Proposal (2) do indeed require the creation of a separate Overleaf project for each issue or development branch. This requirement certainly adds an additional layer of complexity and potential for confusion.

Apart from that, Proposal (1) offers a higher level of isolation by creating a new parallel repository solely used as a pipeline to Overleaf. However, it does introduce an extra layer of complexity, as users need to manage an additional repository and ensure proper synchronization between the repositories.

In contrast, Proposal (2) keeps everything within the same repository, which may be simpler.

Workflow simplicity

For proposal (1), the security of this additional repository depends on the security practices of the individual user. This might include how carefully they manage the repository, the robustness of their password, and whether they've enabled two-factor authentication.

For proposal (2), the local repository on the user's machine is as secure as the machine itself. And local repository is private to the user unless it's pushed and publicized onto Github.

Security and Privacy

For proposal (2), there are some known limitations of git integration on overleaf. For example, due to the existence of submodules, the entire project cannot be directly pushed to Overleaf. Proposal (2) also faces other challenges such as lack of support for Git Large File Storage, handling of file renames and movements, and potential loss of tracked changes and comments.

Limitations of Git Integration on Overleaf

@gentzkow
Copy link
Owner

gentzkow commented Aug 7, 2023

Hi @ShiqiYang2022. Sorry for the slow reply on this thread. Thanks for the great work!

On balance given the factors you note above, I think we should set aside option (2). I would also vote we set aside option (3) from the original list.

I continue to think option (1) is worth considering.

The additional thing I'm most keen to explore is option (4) -- i.e., what is the easiest way to sync a local working directory without using Github integration on Overleaf at all. I've played around with Overleaf's Dropbox sync and found it pretty clunky, so I'm thinking that may not be the best option. Curious what other options might be available.

@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Aug 7, 2023

Thanks @gentzkow! What a coincidence because I also want to update the current process on this issue.

On balance given the factors you note above, I think we should set aside option (2). I would also vote we set aside option (3) from the original list.

I agree and will set aside options (2) and (3) for now!

I continue to think option (1) is worth considering.

Yes will continue explore on that with @Erick11293 @simonepandit! Currently we have two remaining tests on (1) to complete:

  • Test 2+ people collaborating edits on overleaf and see whether it sync well.
  • Test making a private repository using template and share among collaborators.

The additional thing I'm most keen to explore is option (4) -- i.e., what is the easiest way to sync a local working directory without using Github integration on Overleaf at all. I've played around with Overleaf's Dropbox sync and found it pretty clunky, so I'm thinking that may not be the best option. Curious what other options might be available.

Yes will explore further on (4) in the original list with @Erick11293 @simonepandit!

For the Overleaf's Dropbox sync, we had a group meeting @snairdesai @jc-cisneros @Erick11293 @simonepandit last Friday to discuss the proposal of Overleaf Dropbox sync issues(also thank @Erick11293 for initial discussion and contribution). We discussed together and found that Dropbox sync has its own strength, mainly for it enables switch across branches and sync to overleaf; but Dropbox sync can be a bit cumbersome and risky in (1) Inconvenience in collaboration of multi-parallel-issues; (2) Introducing an extra layer of complexity; (3) Risk of dropbox sync failures and losing files. I agree to set it aside but just taking notes here in case we need to reconsider this option.

By the way, would it be convenient for you to give me the permission of pushing edits on our template, in case we finished 1 in the three big bullets above and would like to investigate on 2 and 3, thanks!

cc: @snairdesai @jc-cisneros @Erick11293 @simonepandit

@gentzkow
Copy link
Owner

gentzkow commented Aug 7, 2023

Great. Thanks!

I added you to template.

Feel free to keep Dropbox sync in the mix if you all feel it is promising.

@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Aug 8, 2023

Update: Per RA meet, tasks are listed below by priority:

  • @Erick11293 @simonepandit will cross-check proposal (1) on:
    (Per 8.24 meet: @snairdesai @jc-cisneros have the bandwidth to cross-check this with @ShiqiYang2022)
      • Test 2+ people collaborating edits on overleaf and see whether it sync well.
      • Test making a private repository using template and share among collaborators.
  • @ShiqiYang2022 will draft the readable example on (1) based on the comments here.
  • @ShiqiYang2022 will explore ways to sync a local working directory without using Github, following the comments here.

We will work on shifting Lyx to .tex version of the template once preceding steps are finished.

@ShiqiYang2022
Copy link
Collaborator

Note to Self:

Consider using Github Actions for two-way synchronization in (1): the original repository with multiple branches and a secondary "Overleaf-Sync" repository mirroring a specific dev branch.

One workflow can be designed to push changes from a development branch in the original repo to the Overleaf-Sync repo. Conversely, another workflow in the Overleaf-Sync repo can be configured to push its updates back to the appropriate branch in the original repo. Proper git configuration and appropriate permissions using the GITHUB_TOKEN might be crucial for safe and seamless synchronization between the repositories.

@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Aug 16, 2023

Note to Self on ways to sync without using Github:

One way to sync without using git/github might be use python tools. There are some already developed tools available. "Overleaf-Sync" currently is the most-used one for bidirectional synchronization between local files and Overleaf projects. It enables easy sync, PDF downloads, and project listing without requiring Git or Dropbox. (Sep. 5th note: it in essence is a package that make the manual-upload/download steps easier)

I have tested it on my end but unfortunately encountered some unsolved errors. I think maybe we can design a Overleaf-Sync package in our gslab_make if needed.

@arjunsrini
Copy link

Hi gslab 👋 dropping in here with a fun little idea on an alternative to both lyx and Overleaf. I’ve been using the pdflatex command to locally compile .tex deliverables into .pdfs for the Commit Flex project. My idea:

  • compile .tex files locally by having make.py run the shell command pdflatex

Comparison with Overleaf:

  • Live collaboration: Overleaf would allow for live collaboration on editing the .tex file; if we compile .tex files locally, then we could achieve live editing with a variety of other tools (for example, if most collaborators use VS Code as their text editor, it is very easy to collaborate using VS Code’s Live Share)

  • Control over .tex distribution: with Overleaf, we rely on Overleaf's TeXLive distribution, which gives us less control over its version and specific customizations. This is potentially a concern w.r.t. replicability, since Overleaf updates its TeX Live distribution periodically, so occasionally it may be the case that a document that compiled successfully in the past may run into issues if there are breaking changes in newer package versions. We could avoid this by being explicit about package versions or by keeping an archived version of the TeX Live distribution, but these options are more feasible on a local system than on Overleaf.

  • Offline compilation: not a huge deal, but if we rely on Overleaf, we give this up.

I asked our friend GPT-4 about their thoughts on this comparison, and here is their summary:

If your focus is on replicability, control, and independence from external platforms, local compilation might be the better choice.
If collaboration, ease of use, and platform independence are more important, and you're okay with some level of dependency on an external service, the Overleaf-GitHub integration could be beneficial.

Bonus idea:

While I like how clean and organized our .tex deliverables are on Commit Flex, I worry about the space each versioned .pdf takes from Git LFS (expensive 💸). Instead, we could employ a cheaper storage solution for the compiled .pdfs. If this is of interest, I have implemented a basic version of a tool that uses Git Hooks to upload deliverables with the associated commit SHA1 hash to an AWS S3 bucket right before each git push. We would need to figure out how to make the S3 bucket pdf urls public to collaborators only, but I think this is something we may want to do eventually.

cc @ShiqiYang2022 @snairdesai

@gentzkow
Copy link
Owner

@arjunsrini Thanks! Great to have your input here.

We definitely want to support local compiling of .tex. That is supported in the run_latex command in gslab_make, which can be called by make.py. What run_latex does is essentially issue the pdflatex shell command, while doing some logging/housekeeping things at the same time.

@snairdesai I know we'd been debugging some issues w/ run_latex a while back. Can you remind me status on that?

In terms of workflow, I think we want to use run_latex for any compile that is going to be merged back to main at the end of an issue. But I also think we want to be able to use Overleaf for editing / compiling / collaborating during work on an issue. There are many different editors, but Overleaf seems to be the one that people have converged on liking as a default, so it's good to support it.

The issue w/ different tex installations is real and annoying. I don't know of a great solution. Even when we're doing local compiling we're depended on the particular tex installations people have on their local machines.

As far as the PDFs go, we are not concerned about the incremental storage costs on LFS. I think sticking to LFS is going to be more robust than trying to roll our own.

@snairdesai
Copy link
Contributor Author

Thanks @arjunsrini @gentzkow! Re this query:

@snairdesai I know we'd been debugging some issues w/ run_latex a while back. Can you remind me status on that?

At least on projects which I've worked on (and @jc-cisneros reports a similar experience) we have not utilized the run_latex command directly, as we tend to use either Overleaf or LyX (which requires run_lyx instead). We did/do have issues rendering PDFs using run_lyx, and the root of the problem is one you flagged: local TeX distributions.

LyX and TeXLive are not always in alignment, and this has created issues when compiling documents due to incompatibilities. The workaround solution we implemented was for all of @ShiqiYang2022, @Erick11293, and @simonepandit to install the 2022 version, as the 2023 version was not compatible with the latest version of LyX (@jc-cisneros and I both had the 2022 version as well from when we arrived, which continues to work as expected).

We have not found a solution to overcome this issue with the local TeX distributions, and that is part of why this shift to development in Overleaf might help. It is true (as @arjunsrini notes) that we would rely on Overleaf's native TeXLive distribution rather than our own -- but at least this would enable stability across machines in versioning.

@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Aug 25, 2023

@gentzkow

I have explored more on proposal (1) of two-way Github-Overleaf synchronization. To be specific, I made progress in the following 3 aspects:

I tested the collaboration issue of proposal (1) with @snairdesai. We've confirmed that collaborators can edit Overleaf simultaneously, and may also commit their change to the mirror repository of the development branch with Overleaf access alone (i.e., they do not need to access the mirror repo). So, when we collaborate on Overleaf edits, we can make the mirror repo private (and more secure).

One small note: Regardless of who edits the Overleaf, the commit will always belong to the owner of the Overleaf project.

1. Collaboration among multiple users

I explored the automatic two-way synchronization between the development branch and the mirror repository using Github Actions. The Github workflow will trigger on pushes to dev branches, copies changes to the mirror repository, and also will triggers when pushes are made to the mirror repo, in order to merge changes back into the original dev branch.

The specific details of this workflow can be referred here. I tested the synchronization on my end and it worked well.

The strength of this approach is:
(1) It can keep the development branch and the mirror repo synchronized every second; so we can use the development branch for Github-Overleaf synchronization like syncing master branch -- we can skip the step 2&4 in (1).
(2) Its uniformity across different users and development environments. This consistency ensures that automated workflows execute in the same way for everyone involved in the project, effectively minimizing the risk of synchronization failures or environment-specific bugs.

2. Automate the development branch <-> mirror repo synchronization

I created a Wiki to write down all the steps of the proposal (1), as a summary of our thoughts. The wiki now lives in forked template of my personal github and can be found here.

I tested all the steps on my local end to make sure it functions well.

By discussion, @snairdesai @jc-cisneros is happy to cross-check its reproductivity and give feedback.

3. Detailed instructions

I suggest since currently we have produced a relatively complete framework on synchronizing the development branch to Overleaf, maybe we can consider putting our deliverable somewhere? Maybe the appendix of lab-manual/wiki is a suitable place. If this is reasonable to you, I can open a issue and pull request in lab-manual, also let Jesse know and ask for his advice.

I have also explored other ways to sync without using Github. If we need to synchronize to Overleaf without GitHub, we can consider using python tools(maybe design a command in gslab_make), details can be referred in this thread. If we need to collaborate outside overleaf(locally), then achieving live editing with VS Code’s Live Share is a decent idea as @arjunsrini suggested. Also huge thanks @arjunsrini for contributing our lab!


For Next step:

We nearly complete step 1 and currently have step 2 and 3 in this thread to be finished. This issue which started with manageable scope expanded as some new questions arise. At this point maybe it is a good idea to carve off step 2 and 3, those two subparts of upgrading paper_slides from .lyx compile to .tex compile, into a separate issue?

Please let us know your thoughts, thanks!

cc: @snairdesai @jc-cisneros @Erick11293 @simonepandit

@gentzkow
Copy link
Owner

@ShiqiYang2022 Excellent work here! Definitely looks like a feasible option.

My main concerns are (1) the Github integration steps are still fairly elaborate and might be confusing to some coauthors; (2) the automation, while very cool, creates additional failure points that could end up being frustrating for users to debug.

One way or another, I think we can agree that this is the right high-level model:

  1. Each Github development branch corresponds to a separate project in Overleaf
  2. The assignee of the relevant issue creates that Overleaf project at the same time they create the development branch and then invites other collaborators to join
  3. When the issue is completed (or at major milestones along the way) the content is exported back from Overleaf and committed to Git
  4. Once the issue is closed, the Overleaf project is archived

What would you think about the following alternative?

  • User imports the relevant /paper_slides/ directory to Overleaf manually (using New Project -> Upload Project)
  • When ready to commit, User exports the content from Overleaf to the local clone of the development branch manually (using Download -> Source) and then commits

In terms of next steps, I agree carving off (2) and (3) to separate issues is a good idea. On (2), I think we want to keep Lyx in the template but create a separate .tex version in /extensions/.

@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Aug 31, 2023

@gentzkow Thanks very much! Below are the replies:

To your concerns:

(1) the Github integration steps are still fairly elaborate and might be confusing to some coauthors

     Yes, I agree with two hands up. But this might not be a big problem because:

  • Many projects have RAs. Co-authors only need to do little of the procedure, i.e. sync from overleaf to the mirror repo(click menu -> sync -> github, what we currently do in Overleaf projects), and the rest could be done from RA side.(Furthermore, I suggest letting one RA mainly in charge of all of Overleaf related edits to avoid potential sync failure.)
  • The most of steps can be simplified by a safe, successful automation. Further explanation provided in reply of (2).
Replies

(2) the automation, while very cool, creates additional failure points that could end up being frustrating for users to debug.

     Your concern is what I previously had in mind, because I myself spent much time debugging to make automation runs well. But I feel this could also be addressed decently, let me explain this further:

  • First, since the only function of the Github Action I used is to update the changed side to unchanged, so unless there's simultaneous updates on both side, it is less likely to generate failure.
  • Second, one advantage of Github Action is that, when the workflow is triggered, GitHub creates an isolated virtual environment for each job and starts executing the steps defined within the job. That can prevent many failures due to interference, dependency conflicts, environmental inconsistency, and accidental pollution of the workspace; also ensure the reproductivity among different users.
  • Third, the automation process can shift smoothly from one issue to the other without much changes. The repo, and the Automation code are all reusable, so it's one-time-effort to create them in each project. The only modification of the code from shifting one issue to another is to modify the name of development_branch and the name of mirror repo. This could reduce the error caused by unintentional changes, to the lowest.
  • Finally, the automation process only serves a bridge to link dev branch to overleaf. The maintenance of that might not be a big issue, and it's more efficient for RAs than coauthors to implement such sync, and address if there really turned out an error(which might less likely to happen due to the previous provided reasons.
Replies

Admittedly, ideally we should make the development branch directly synchronized to Overleaf. But this is not technically applicable unfortunately, and this version of approach might be one of the resonable solutions till now.


One way or another, I think we can agree that this is the right high-level model:

  1. Each Github development branch corresponds to a separate project in Overleaf
  2. The assignee of the relevant issue creates that Overleaf project at the same time they create the development branch and then invites other collaborators to join
  3. When the issue is completed (or at major milestones along the way) the content is exported back from Overleaf and committed to Git
  4. Once the issue is closed, the Overleaf project is archived

Agree!


What would you think about the following alternative?

  • User imports the relevant /paper_slides/ directory to Overleaf manually (using New Project -> Upload Project)
  • When ready to commit, User exports the content from Overleaf to the local clone of the development branch manually (using Download -> Source) and then commits

This is also what I have considered before. Let me analyze in detail about its pros and cons compared to the proposal we suggested previously:

Pros: Simplicity; easy to understand and implement; and the commit can belong to the real contributor, instead always belong to the owner of the Overleaf project using Github-Overleaf Sync.

Cons:
(1) (Not a concern anymore)Downloading the entire project as a single compressed folder and then decompressing it over your existing local files would mark all files as changed, even if you only modified a single file on Overleaf.
(2) Manual sync hinders change tracking. Manually synchronization, because it requires several manual procedures, we had better reduce the frequency of sync, i.e. only sync at milestones, then the granularity of the change log is lost. It becomes difficult to track incremental changes, and understand the sequence in which changes were made.
(3) Lost of attribution. This approach risks attributing a range of changes to a single commit by one person(commit only milestone), making it harder to dissect who did what and why. On the other hand, if we were using a more automated(and hence enable commits much easier) synchronization approach via GitHub, we have the opportunity to include detailed commit messages that can specify who made changes in every small independent commit, although the owner of commit would always be the owner of Overleaf project.
(4) Some procedures could be complex when uploading: If we upload the directory (using New Project -> Upload Project), we probably cannot overwrite the new folder to the old one. So we only upload this when issue branch created. Then we need to manually upload all the figures, tables, and .tex files edited and changed in this issue to Overleaf. Tracking and uploading those manually could be clumsy and easy to make mistakes. If we try to upload the whole directory within the existing project and overwrite, then it's hard because Overleaf unfortunately cannot update and overwrite folders(some same existing discussion here).
(5) Of course, manual operations are prone to accidental errors.

In terms of next steps, I agree carving off (2) and (3) to separate issues is a good idea.

Perfect! Will open a new issue once this is closed.

On (2), I think we want to keep Lyx in the template but create a separate .tex version in /extensions/.

One small note: I noticed that there's some inconsistency related to your previous #84 (comment) No. 1. Could you please let me know which one of those two you prefer now? My idea is that since we want to replace .lyx to Overleaf, we had better to move .lyx to /extensions/.

There are many thoughts that I want to convey in this thread(so it's a bit long) and thanks for your patience for reading it!

@gentzkow
Copy link
Owner

@ShiqiYang2022 Thanks! This is all very clear and I agree very much.

I have one question on this

(1) Downloading the entire project as a single compressed folder and then decompressing it over your existing local files would mark all files as changed, even if you only modified a single file on Overleaf.

Are you sure about that? My understanding is that Git will recognize two files as the same if they have the same hash, regardless of metadata like date modified on the local system. So I would have thought if we download the entire project as a zip archive and then paste it over the existing local files Git would correctly see unchanged files as unchanged. (At least for text files -- there can be some trickier issues w/ binary files.)

On this

One small note: I noticed that there's some inconsistency related to your previous #84 (comment) No. 1. Could you please let me know which one of those two you prefer now? My idea is that since we want to replace .lyx to Overleaf, we had better to move .lyx to /extensions/.

Good point. Mea culpa! Let's defer to my previous self over my current self -- i.e., stick to what I'd said originally.

@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Aug 31, 2023

@gentzkow Thanks! Replies are below:

I have one question on this

(1) Downloading the entire project as a single compressed folder and then decompressing it over your existing local files would mark all files as changed, even if you only modified a single file on Overleaf.

Are you sure about that? My understanding is that Git will recognize two files as the same if they have the same hash, regardless of metadata like date modified on the local system. So I would have thought if we download the entire project as a zip archive and then paste it over the existing local files Git would correctly see unchanged files as unchanged. (At least for text files -- there can be some trickier issues w/ binary files.)

That's my bad -- sorry! I should make it clearer. What I wanted to convey is: such compress and de-compress will make some files(instead all, that's a mistake) status changed, even you did not modify them.

I test the manual procedure on Newsworthy project, which compiles .tex files, the only edit in .tex file is that, I replaced the author in paper_slides/source/ondeck/condstate.tex from "Luis Armona" to "GSLab Boy". Then I decompressed the project to /paper_slides/ and run git status, the procedure and results are shown below:

SIEPR-C02G50GUML86:github_folders shiqiyang$ unzip /Users/shiqiyang/Downloads/paper_slides.zip -d /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides
Archive:  /Users/shiqiyang/Downloads/paper_slides.zip
replace /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/config_global.yaml? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/config_global.yaml  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/ondeck/condstate.tex  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/paper/newsworthy.bib  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/paper/newsworthy.tex  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/slides/muri_30m.tex  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/SConstruct  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/compile.py  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/compile_nobib.py  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/compile_ondeck.py  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/run.py  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/ondeck/SConscript  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/paper/SConscript  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/slides/muri_30m.pdf  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/slides/figures/MURI_blanc.png  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/slides/figures/ana.jpg  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/slides/figures/ana.png  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/slides/figures/flow.jpg  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/slides/figures/houda.jpg  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/slides/figures/houda.png  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/slides/figures/luis.jpeg  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/slides/figures/maurice.png  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/slides/figures/wildfire.jpg  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/slides/figures/wildfire.pdf  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/source/slides/figures/wildfire.png  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/release/ondeck/condstate.pdf  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/release/paper/newsworthy.pdf  
  inflating: /Users/shiqiyang/Documents/github_folders/newsworthy/paper_slides/release/paper/text.pdf  
SIEPR-C02G50GUML86:github_folders shiqiyang$ cd newsworthy
SIEPR-C02G50GUML86:newsworthy shiqiyang$ git status
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   paper_slides/source/ondeck/condstate.tex
	modified:   paper_slides/source/paper/newsworthy.bib

I found that apart from condstate.tex, newsworthy.bib also shown to be modified, though I made no edits on that file.

I also "Download -> Source"ed the edited files to windows(since some coauthor use Windows), and run git status again. Attached is what I found -- the number of "modified" status files increased a lot.

PS C:\Users\Shiqi Yang\Desktop\GSLab> Expand-Archive -Path "C:\Users\Shiqi Yang\Downloads\paper_slides.zip" "C:\Users\Shiqi Yang\Desktop\GSLab\newsworthy\paper_slides" -Force
PS C:\Users\Shiqi Yang\Desktop\GSLab> git status
fatal: not a git repository (or any of the parent directories): .git
PS C:\Users\Shiqi Yang\Desktop\GSLab> cd newsworthy
PS C:\Users\Shiqi Yang\Desktop\GSLab\newsworthy> git status
On branch master
Your branch is up to date with 'origin/master'.
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   paper_slides/SConstruct
        modified:   paper_slides/compile.py
        modified:   paper_slides/compile_nobib.py
        modified:   paper_slides/compile_ondeck.py
        modified:   paper_slides/config_global.yaml
        modified:   paper_slides/run.py
        modified:   paper_slides/source/ondeck/SConscript
        modified:   paper_slides/source/ondeck/condstate.tex
        modified:   paper_slides/source/paper/SConscript
        modified:   paper_slides/source/paper/newsworthy.bib
        modified:   paper_slides/source/paper/newsworthy.tex
        modified:   paper_slides/source/slides/muri_30m.tex

To summarize, the point I want to convey is that some files, though unchanged, can still shown as "modified" in this procedure. Thanks for the great catch and sorry for the confusion caused!

One small note: I noticed that there's some inconsistency related to your previous #84 (comment) No. 1. Could you please let me know which one of those two you prefer now? My idea is that since we want to replace .lyx to Overleaf, we had better to move .lyx to /extensions/.

Good point. Mea culpa! Let's defer to my previous self over my current self -- i.e., stick to what I'd said originally.

Got it! I will then move .lyx to /extension/.

@gentzkow
Copy link
Owner

gentzkow commented Aug 31, 2023

Fascinating.

On the first instance of git status: What do you get if you git diff the file newsworthy.bib? Maybe it has something to do with a difference in encoding?

I wonder if it might be different once the files have all been exported from Overleaf at least once. Maybe fewer would show as changed when we re-export from Overleaf a second time?

Do any of these files register as changed if you sync them to Github via Overleaf (a la the automated method)?

@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Aug 31, 2023

@gentzkow Thanks!

On the first instance of git status: What do you get if you git diff the file newsworthy.bib? Maybe it has something to do with a difference in encoding?

I wonder if it might be different once the files have all been exported from Overleaf at least once. Maybe fewer would show as changed when we re-export from Overleaf a second time?

The result showed below. The implication of this output is that it compared two files, the file mode changed from executable permissions (100755, original one) to regular read-write permissions (100644, downloaded from Overleaf). If I re-export from Overleaf, the permission still did not change, see the details below.

SIEPR-C02G50GUML86:Documents shiqiyang$ git diff newsworthy/paper_slides/source/paper/newsworthy.bib github_folders/newsworthy/paper_slides/source/paper/newsworthy.bib
diff --git a/newsworthy/paper_slides/source/paper/newsworthy.bib b/github_folders/newsworthy/paper_slides/source/paper/newsworthy.bib
old mode 100755
new mode 100644
SIEPR-C02G50GUML86:Documents shiqiyang$ git diff newsworthy/paper_slides/source/paper/newsworthy.bib /Users/shiqiyang/Downloads/paper_slides_2/source/paper/newsworthy.bib
diff --git a/newsworthy/paper_slides/source/paper/newsworthy.bib b/Users/shiqiyang/Downloads/paper_slides_2/source/paper/newsworthy.bib
old mode 100755
new mode 100644

I also tried to find the difference on Windows, the main difference is line endings, instead of permissions.

Do any of these files register as changed if you sync them to Github via Overleaf (a la the automated method)?

For the automated method, I add the exact same newsworthy.bib file to the my local template, pushed it to my Github template, and tested automate sync between Overleaf and Github with simple text edits. The result turned out to be: only the edited .tex file changed, not the .bib file.

The specific commit of pushing new edits from Overleaf to the mirror repo can be tracked here. The commit pushing new edits from mirror repo back to the original repository can be tracked here. The two-sided automation process can be tracked here(origin -> mirror), and here(mirror -> origin).

One small note, I added newsworthy.bib, instead of other .bib file to ensure all others feature of the .bib file are same. But it in essence is a private .bib file. I will use git reset to delete this after you have checked the automated process, to ensure no history recorded.

@gentzkow
Copy link
Owner

gentzkow commented Sep 1, 2023

Interesting. Do we have a sense of why that .bib file had executable permissions in the first place? If we set it to read-write from the start would that solve the problem (on Mac)?

@ShiqiYang2022
Copy link
Collaborator

@gentzkow Thanks!

Do we have a sense of why that .bib file had executable permissions in the first place?

I traced the history of that .bib file in newsworthy project. In commit 9c16b4d made by previous RA Molly Wharton, the file permission changed(see the change here) from regular read-write permissions (100644) to executable permissions (100755). It seems like she deleted the .bib file by mistake(see the comment here), and when she added it back, the permission changed.

An extra and deeper question is: why permission changed at that specific time at that specific RA in that certain commit? I’m also curious about that, and I did some further research. It's very hard to give a 100% correct answer, because I cannot know what happened to her computer 4 years ago, but I attached some thoughts below that might help.

       I searched for the potential reasons of unintended permission changes, and I think there are five most possible reasons that may lead to the potential permission changes in our lab.

  • Operating System Differences: Windows and Linux/MacOS have different file permission models, which can lead to inconsistencies when files are accessed or modified across multiple OSs. Windows uses Access Control Lists (ACLs) offering fine-grained permissions, while Linux/MacOS use a simpler model based on user, group, and others. Linux/Mac permissions are set using "chmod" and are often represented as a three-digit code, like 755 or 644. Windows permissions are usually configured through graphical interfaces and can't be directly mapped to Linux/Mac permissions.

  • Development Tools and Editors: Tools like VS Code or Sublime Text can have settings or features that may unintentionally alter file permissions during saving or other operations. For example, the atomic_save option in text editors like Sublime Text creates a new temporary file when saving changes. This can lead to unintended changes in file permissions, as the new file may not inherit the original permissions. For further details of this example, please refer here.

  • Docker and Other Container Technologies: Running applications in containers like Docker can result in changes to file permissions, especially when the container shares its filesystem with the host machine. Per conversation with @snairdesai, we stored the file in the docker container during development; and we can store them locally after an image has been built. This might also cause permission change for interacting files local side. To give an example, please refer here.

  • User's umask settings affect permission when checkout new files. umask is a command in Unix-like systems that sets the default permission mask for new files and directories, controlling which permissions are automatically removed when they are created. Setting umask adjusts default file creation permissions, impacting the way new files are tracked within repositories. I played with this on my side, and I found it is possible to degrade permission from executable to read-write, but hard to do such inverse. For further details, this is a small but helpful article I referred.

  • Administrator Actions: Using administrative commands such as sudo may inadvertently change the permissions of files or directories. Here is an reported error from other users for example.

   1. When unintended changes in permission would happen in GSLab?

       My guess is this could not be a deliberate permission change. From the potential causes I listed above, I do not think Docker/Administrator Actions is with high probability to cause this change. umask default settings is also less possible, for it seems only can downgrade the permission instead of upgrade. The change of operation system is also less likely to be blamed, because there's no need for shift OS, if you only delete and re-upload files.

I reached out to our lab alumni (and thanks!) @DavidRitzwoller @mengsongouyang who was Molly's colleague worked on newsworthy and still pursuing their academic career to find whether there's some details they can recall. @DavidRitzwoller mentioned, when Molly, @mengsongouyang, and @DavidRitzwoller initialized the GSLab template for Newsworthy, they were all primarily using Sublime Text, whereas Luis, the previous owner of this newsworthy.bib, was using Atom. So it might be the atomic_save option settings in Sublime Text that cause such problem(see details in the 2nd bullet in 1).

   2. What might happened to Molly's commit?

       I felt I am not satisfied with one single case. So I checked the log files in newsworthy and ad-price-drivers. And I found the upgrading unintended permission change(i.e. 100644 to 100755) happened in many cases in ad-price-drivers. Generally these file could include all kinds of files: .R script(example here), .txt files(example here), .Sconcript file(example here), .yaml file(example here), etc.

One thing that might be worth noticed is that almost all of those commits are related to the ownership shift -- i.e. the owner of file before that commit often differs to the owner of the commit. This might explains the unintended permission change because there are so many places of settings that could be different among users.

   3. The unintended change of executable permission happened in many cases in GSLab

       I think unintended permissions changes can (1) complicate the process of auditing and tracing file history, as well as add an additional layer of complexity to the code review process. (2) negatively affect the portability of the code across different systems, especially when some systems have requirements for file permissions. Taking some actions to avoid this could be beneficial to our lab.

For the way to improve, I think one potential solution is to set git config core.fileMode false to make Git ignore such executable permission changes, details could be referred at this thread. Maybe we can add it into the setup step, to avoid any unintended permission change.

   4. Suggestion and Ways to improve
Some thoughts of permission change

If we set it to read-write from the start would that solve the problem (on Mac)?

Yes, if we git clone files which initially have read-write permission, then upload to Overleaf, edit and download -> source, this problem is solved, i.e, the permissions are always read&write(100644) and the file will not be showed to be edited.


One note to be added to revert one of my claim in this thread:

Do any of these files register as changed if you sync them to Github via Overleaf (a la the automated method)?

For the automated method, I add the exact same newsworthy.bib file to the my local template, pushed it to my Github template, and tested automate sync between Overleaf and Github with simple text edits. The result turned out to be: only the edited .tex file changed, not the .bib file.

I would like to revert my claim here. Such Github-Overleaf sync do change file permissions. This is specified by Overleaf in Known Limitations. I found the newsworthy.bib file I used to sync then do not have executable permission(it's my mistake!).

From my knowledge, we currently do not have easy ways to solve this limitation automatically. This is because, the executable permission change is committed together with other edits. When we commit from Overleaf to Github, those files which lost their executable permission, together with the edited .tex files, are in one single commit already made by Overleaf. I tried some simple approaches but none of them helped. One way for us to solve this is to set this aside, and check and revert permission changes when we trying to merge the dev branch to the master branch.

@gentzkow
Copy link
Owner

gentzkow commented Sep 4, 2023

@ShiqiYang2022 Thanks!

This is very thorough. As a small note of feedback, I'd say maybe even too thorough -- good to run this down, but you may not have needed to put that much time into analyzing the historical change. All I'd meant to ask was whether we had reason to expect many files would have executable permissions in the future that would lead this issue to recur. Sorry that wasn't clear. But very helpful in any case!

So it sounds like the bottom line is that the permission issue is not a strong reason to prefer Github<>Overleaf sync over the manual method I mention here. Do you agree?

If so, does that mean we can cross off (1) in your list of cons for the manual method here?

@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Sep 5, 2023

@gentzkow Thanks!

As a small note of feedback, I'd say maybe even too thorough -- good to run this down, but you may not have needed to put that much time into analyzing the historical change.

Thanks! I will be more concise, I investigated further just for fun and curiosity. Thanks for the reminder!

So it sounds like the bottom line is that the permission issue is not a strong reason to prefer Github<>Overleaf sync over the manual method I mention here. Do you agree?

Agreed! Both required manual checks.

If so, does that mean we can cross off (1) in your list of cons for the manual method here?

Yes. Permission worries can be set aside.
One reminder: for different OS systems, issues of line endings in manual method may generate.


Based on discussions, I suggest it might be time for decision. Some reference listed below for review:

  1. General Framework we argeed: ref link.

  2. Options we still considering: Mirror repo option(with/without automation, ref link), manual option(ref link)

  3. Comparison between two methods: ref link.

  4. Potential deliverables: Mirror repo option ref link, Manual option(not yet, but happy to write one if needed)

  5. Place to put deliverable suggested: ref link.

  6. Some other options but not investigated much(Dropbox, ref link; python packages, ref link)

@gentzkow
Copy link
Owner

gentzkow commented Sep 5, 2023

Great. Thanks! I agree it's time for a decision.

The ideal from my perspective is that we offer both the Manual option and the Mirror option, allowing lab members to pick whichever they prefer on a case-by-case basis.

Am I correct that which option I choose when I am the assignee of an issue should not change the workflow for other lab members in any way?

I.e.,

  1. If the issue only requires me to touch the draft (i.e., no other collaborators need to be involved), I could choose either option, as well as skip Overleaf altogether and use a local text editor, and the results would be the same from other lab members' perspective, up to the fact that the commit history will look a bit different.
  2. If the issue requires multiple people to collaborate on the draft, I can be the one to handle the sync steps. The other collaborators only need to get access to the Overleaf project and make their edits there. They don't even need to know whether I'm doing Manual or Mirror. (Of course, if I'm doing Mirror I could ask them to sync their changes to Github themselves, but this isn't necessary.)

I agree that we should put what we agree on in the lab-manual appendix.

One open loop is how to manage differences in the Tex installation on Overleaf vs. our local machines. We definitely want to make sure the PDFs compile locally. Can you remind me the status of that issue?

@ShiqiYang2022
Copy link
Collaborator

@gentzkow Thanks! Replies are below:

The ideal from my perspective is that we offer both the Manual option and the Mirror option, allowing lab members to pick whichever they prefer on a case-by-case basis.

Am I correct that which option I choose when I am the assignee of an issue should not change the workflow for other lab members in any way?

Yes! I agree with all. Let's keep both.

I agree that we should put what we agree on in the lab-manual appendix.

Great! Then I will open a issue and PR there.

One open loop is how to manage differences in the Tex installation on Overleaf vs. our local machines. We definitely want to make sure the PDFs compile locally. Can you remind me the status of that issue?

I think we currently do not have one issue directly compare .tex compile locally vs. Overleaf. But definitely we should compare that. I will investigate into that in the process of adding local .tex compile into /paper_slides, as part of template.

I guess you were referring the issue of the dis-alignment between LyX and TeXLive. We currently use Lyx as complier. And Lyx currently do not support MacTex 2023, details could be referred per #84 (comment).

@gentzkow
Copy link
Owner

gentzkow commented Sep 5, 2023

Excellent.

I'll leave you to open a new issue for the .tex compiles. For that, we need not worry about the Lyx issues. We should just aim to have a robust system where we can be confident local compilation and compilation on Overleaf will (almost aways) produce the same result.

ShiqiYang2022 added a commit to ShiqiYang2022/template that referenced this issue Sep 7, 2023
@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Sep 7, 2023

Per RA meet, @Erick11293 @simonepandit have bandwidth to test current options for Overleaf <> Github sync. This is the last task to complete before wrap up this issue.

The purpose of this test is to:
(1) Test the reproducibility the two options we proposed between lab users; (2) Point out any places that make you feel confused in the deliverables of mirror repo option (ref link); (3) Ensure the privacy and security of the sync process.

The detailed process is described below:

  1. Fork my template (see here), and create an issue + issue branch.
  2. Sync the project to Overleaf, make edits(anything you like) in paper_slides/tex_files_for_test/main.tex, and sync back. Please try both options:
    • Mirror repo option(ref link), please try both with and without automation method.
    • Manual option(ref link).
  3. Test collaboration on Overleaf, and ideally, the workflow should follow this comment if all went well.
  4. Merge the issue branch to master, but make sure you do not merge the .yml file in automation workflow into master.
  5. Try to reuse Auto codes between issues: open the second issue + issue branch, but use the same mirror repository, change the name of the mirror repo and modify the automation .yml code correspondingly. Repeat the steps 2-4.

This is not urgent, and please let me know for any questions. Huge thanks in advance for your time here!

@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Sep 16, 2023

Note to self: When applying mirror option in my personal project, I found: contrast to 2nd reply, 3th bullet in #84 (comment), the reuse of mirror repository in a different issue is not very smooth. Will investigate on this.

Update Sep 18th: per #84 (comment), if the setup of automation can be simplified, reusing the mirror repo or not does not matter much. Thus this is not worth worrying.

@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Sep 19, 2023

Per meet with @Erick11293, main progress and to-dos:

  1. The test of options is almost completed and goes well. @Erick11293 will comment his test experience in the thread.

  2. Main new finding: The setup of automation in mirror option can be simplified. Previously, we need to manually fill the name of repos and branches, and username + email(in the black boxes of figure below), which is per comment might be one potential sync failure points. We can add commands within same Github Action to retrieve those info.

Screenshot 2023-09-18 at 6 10 57 PM
  • @Erick11293 will update Automation part of Mirror option in instruction (ref link) and propose edits in other findings.
  • @ShiqiYang2022 will cross-check the updated Automation process.
  1. We will add detailed workflow written in comment into instruction. @ShiqiYang2022 will work on this.

@Erick11293
Copy link

Hi @gentzkow @ShiqiYang2022 :

Here the progress on the test:

  • I performed the test for the manual process without further ado.
  • @ShiqiYang2022 and I worked on fully automatizing the process by writing some additional code on bash. The new files can be found here(mirror_repo.txt and commit_back.txt). Now, the users don’t need to specify the names of the repos. The only requirement is correctly setting the mirror repository and creating GitHub secrets for the SSH key and email.
  • From the tests, we checked that we might spend more time improving the guide for the setting process and include some examples.
  • One main threat we found in the test is that we cannot set Github secrets in some repos. @ShiqiYang2022 is exploring the potential solutions to this.

@ShiqiYang2022
Copy link
Collaborator

ShiqiYang2022 commented Oct 9, 2023

@gentzkow

Following #84 (comment), we have tested the reproducibility the two options(mirror and manual) for several rounds between lab users. Below is a short summary:

  • We improved mirror option by further reduce complexity. Now we only need to correctly setup SSH keys and paste .yml file provided follow the instructions, to establish automation process. Details ref: these two comments.
  • I edited the instruction to add details/clarifications based on feedback to avoid sync failure; add manual option; add workflow, notice and known limitation to make it user-friendly. The updated version ref: here.

Since the test is completed, I will close this issue if the recent edits is reasonable to you. I will then open an issue and put the instructions into the lab-manual as we agreed in #84 (comment). Thanks!

Acknowledgements: Thanks very much @Erick11293 for the joint work in the "test -> improve -> test" loop, and many thanks @jc-cisneros for the test as an "outside user"!

@gentzkow
Copy link
Owner

gentzkow commented Oct 9, 2023

@ShiqiYang2022 Sounds great!

I'd suggest that you wrap this issue up then open a separate issue for me to test the final procedure myself. Once that's done we can incorporate any suggestions I have then add to the lab manual.

@ShiqiYang2022
Copy link
Collaborator

Summary + Deliverables

In this issue we tried to shift template complies from lyx to Overleaf. We mainly focused on exploring the option to allow Overleaf workflow on development branches.

The decisions could be found at #84 (comment) and #84 (comment).

The deliverable is this instruction of Github-Overleaf workflow.

Issue #86 follows this issue, focusing on shifting lyx compile to .tex compile in /paper_slides/.
Issue #87 follows this issue, and it's for Matt to test the final deliverable.

cc: @gentzkow @jc-cisneros @snairdesai @Erick11293 @simonepandit.

@ShiqiYang2022
Copy link
Collaborator

Follow up: We added our workflow in lab-manual/wiki. The final state of mirror option is mirror-repo-workflow.pdf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

6 participants