Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

# DEV: Gitpod integration #47790

Open
3 of 5 tasks
noatamir opened this issue Jul 19, 2022 · 9 comments
Open
3 of 5 tasks

# DEV: Gitpod integration #47790

noatamir opened this issue Jul 19, 2022 · 9 comments

Comments

@noatamir
Copy link
Member

noatamir commented Jul 19, 2022

We would like to introduce gitpod integration, as a development environment quick start.

Gitpod can provide new contributors, with quick automated, and ready-to-code development environments. Instead of sending them to read your documentation for setup, how about telling them to click/tap a button, and pick an issue, they can already start to work on their first PR?

It may also be useful for experienced contributors, who work on many projects. They might notice something they can quickly make a PR on, but not have the time to open your contributor guide just now.

A gitpod saves you setup time and gets you to contribute your changes faster.

pandas already has a working Docker image, so making the custom gitpod Docker image was relatively easy. I have prepared a docker image and yml file to get things going. There are still a few more steps to complete the integration setup.

Next steps

  • Open a pandas DockerHub/Quay.io organization account, or use the GitHub container registry, to add the gitpod docker image there
  • Add the gitpod gitpod yml file in the repository root
  • Create a Github Action for prebuilding the gitpod docker image and uploading it to DockerHub,
  • Test the workflow is working correcly and adjust as needed.
  • Write documentation for using Gitpod, as well as guidance on which account to use:
    • SciPy, and Numpy Gitpod Documentation
    • Any maintainer and active OSS contributor can apply for the open source account.
    • New contributors may use the free account which provides 50 hours/month, up to 4 parallel workspaces, and 30mins timeout on inactivity.

Attachments

  • DockerfileGitpod
    • The original pandas docker file, but extended to become a Gitpod! We opted for a Gitpod with prebuilds for fast loading. This requires adding the GitHub action to generate the Docker image each time the repository is updated.
  • gitpod.yml
    • This file still needs tweaking based on where we decide to place the docker images generated. It also has a few configurations for vscode extensions, which can be pre-configured for the gitpod. We can make make a few more tweaks as we finalize the setup.

The docker was tested locally as follows: by replacing $gh_username in the dockerfile with your GitHub username, you should be able to run the DockerfileGitpod with the command docker build . -f DockerfileGitpod (from the working directory the file is located in). It can be fiddly on M1 macs 🚨.

@noatamir noatamir added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 19, 2022
@mroeschke
Copy link
Member

Open a pandas DockerHub/Quay.io organization account, or use the GitHub container registry, to add the gitpod docker image there

Do you know how Gitpod manages image pulls and if there's quotas?

@noatamir
Copy link
Member Author

I asked on their discord. Will get back to you ☺️

@noatamir
Copy link
Member Author

They haven't replied yet ⏳. I also sent an e-mail now.
But based on their pricing page, I suspect that there is no quota since all of their plans include the following:
prebuilds: Enable prebuilds to continuously build your Git branches, so you and your team can always start coding right away.

@noatamir
Copy link
Member Author

And we got a reply!

Thank you for contacting Gitpod. As Gitpod does not host any publicly available Docker images ourselves there wouldn't be any limits you'd be subjected to there. You would need to check with whatever registry you're using to see if they have any limits.

@mroeschke
Copy link
Member

Okay cool!

Dockerhub (free) account has some limits (100/200 pulls per 6 hours should be okay) https://docs.docker.com/docker-hub/download-rate-limit/

The Github Container Registry isn't as clear to me what quotas exist, but it appears we have to pay for storing images? https://docs.github.com/en/billing/managing-billing-for-github-packages/about-billing-for-github-packages

@mroeschke mroeschke added Docs and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 22, 2022
This was referenced Aug 16, 2022
@datapythonista
Copy link
Member

Sorry, a bit late to the party, didn't see this earlier.

I'd personally have this in a third party project. I think a similar discussion happened for VS code stuff, and that was the conclusion. The pandas project is already huge, and the CI huge and very complex. I think it's great that things like this exist if contributors find them useful, but I don't think it should be the pandas core team maintaining them, and the pandas CI and codebase the one bigger, slower, with extra complexity, and with new things that break.

I don't think there is any drawback in using another repo, and we can use one in the pandas-dev org. Even if my preference would be to start in a personal repo first, and move it to the pandas-dev org when the project starts to be mature.

@jorisvandenbossche
Copy link
Member

I think a similar discussion happened for VS code stuff, and that was the conclusion.

I am not directly aware of such discussion (we actually do have some VSCode specific configuration already with .devcontainer.json, so this was added at some point. There is #41721 where indeed you objected further customizing the existing .devcontainer.json setup ).
But the one discussion that I found on the gitpod topic is a previous PR (#34829), where people were actually OK with adding this, the PR only never got merged because of the contributor not further working on it.

As a small anecdote: I helped in two conference sprints the last two weeks, and in the first I had someone contributing using github codespaces, and she repeatedly said how amazing it was being able to directly work on something without to first set up the whole development environment. And in the second there was someone who struggled with the typical "needs Visual Studio Build Tools on Windows -> cannot install this on company laptop without devops involvement", and a setup like gitpod could have helped a lot.
To be clear, I know that this only supports that it would be nice to have such gitpod integration set up, not that it necessarily lives in the main repo. I do think that it will be more accessible (since that is the standard approach) and better integrated if it lives in the main repo though.

@jorisvandenbossche
Copy link
Member

Some issues we have been running into related to not having write permissions outside of the pandas repo / mamba env:

  • Running pre-commit install doesn't work: "PermissionError: [Errno 13] Permission denied: '/home/gitpod/.cache/pre-commit'"
  • Installing an extra package with mamba doesn't work: "Non-writable cache error"

@noatamir
Copy link
Member Author

The install issue in the last comment is addressed by #52700 and already fixed in the Gitpod we deployed to dockerhub today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants