-
Notifications
You must be signed in to change notification settings - Fork 4
Git Workflows
From @cpcloud:
My pandas workflow usually goes something like this:
- Always use
virtualenvwrapper
. Period. Then I doworkon pandas
. - Depending on how much time I have I'll try to prioritize based on some combination of which issues are high-priority and those for which I see an easy fix and those that I think would be fun to work on.
- Once I've selected an issue I'll start hacking on it. I usually don't use any local branches unless I'm implementing something new and there's a bunch of things for that feature that can be worked on separately.
- I use
git stash
all the time. In fact so much so that I have aliases for everygit stash
subcommand. Often times I'll get interrupted and I'll need to stash something so I do. I can then come back to it. If you end up accumulating a crap ton of stashes (which I do) then you can inspect them withgit stash list
followed bygit stash -p stash@{0}
or whatever stash number you want to see the diff of. I'll also occasionally prune the stash via visual inspection and/or my own memory of what's been merged recently or by comparison of stashes toupstream/master
. - Two things that I use a lot are the command
git push -u origin $(git rev-parse --abbrev-ref HEAD)
andhub
. The first command pushes a remote branch with the same name as the one that I made locally to theorigin
remote (my fork). I havegit rev-parse --abbrev-ref HEAD
aliased togcb
which I remember as "git current branch". The other command I use ishub pull-request -b pydata:master
which automatically submits a PR. I can essentially push the remote and the PR in a single step on the command line without having to use the mouse. Pretty neat. -
git reflog
is great for those times when you accidentally typegit reset --hard
and you need to recover something. - Finally I'll rebase using
git rebase upstream/master
to pull in the latest from theupstream
remote (which I have set tohttp://github.com/pydata/pandas.git
) - Throughout the dev process I'm constantly doing
git fetch && git rebase upstream/master
, although recently I've toned that down a bit and I've started to just do it when I need to. - I also use
git log
all the time. Since the standardgit log
is a bit too verbose I use scm_breeze and it has a gorgeous standardgit log
(as long as you've setgit config --global color.ui auto
!)
git merge <a-branch-name>
will take the union of the commits of the branch that you're currently on and the branch named a-branch-name
. git merge
is nice when you want to create a branch off of another branch and then you want to go back to your original branch and bring the changes from the branch you've just made changes in.
For example,
git checkout master
git checkout -b new-branch
# try to not to break all the things!
gvim pandas/core/frame.py
git add .
git commit -m'ENH: made pandas awesomer!'
# make a new branch based off the one you're current on
git checkout -b newer-branch new-branch
# make some more awesome changes to frame.py
git checkout new-branch
git merge newer-branch
This can sometimes leave ugly messages like Merged foo from bar
(but not always).
NOTE: It's worth spending a significant amount of time understanding some of the details of what git
is doing, otherwise you'll always feel a bit lost. This is especially true when making changes to the history via git rebase
or any other history changing mechanism. A Google search for "Git book" will turn up plenty of useful resources.
git rebase
allows you to change the history of a repository. Changning the history is not necessarily a good thing (some people hate it) if you've pushed to a remote that a bunch of people depend on. For pandas pull requests, it's usually okay to rebase since very few people (if any) are going to base code off of your PR. Trolling the pandas repo, you'll see much talk about "I'll squash spam after I zip up the foo bar that does baz". This is just letting the core devs know that you're going to get rid of commits that are very small and/or have commit messages that are not meaningful.
Another thing you probably want to do on occasion is:
git fetch upstream
git rebase upstream/master
If a core developer asks you to rebase on top of master, this is what we mean.
Those commands will speed up the merging of your PR by making you deal with merge conflicts. Your changes are going to be most familiar to you and thus will be resolved faster than if someone who doesn't know why you did what you did tries to resolve the conflicts. Note that if you made changes to pandas/doc/source/release.rst
and changes have occurred upstream then you'll almost always get a merge conflict from that. Those are pretty much unavoidable, so don't be scared when you get one. Just remove the conflict markers keeping your addition to release.rst
along with the lines that created the conflict.
The people around pandas have been very patient and helpful with my (lack of) knowledge about git
especially @y-p. If you're confused about something, even after reading the wiki and searching around for an answer, don't hesitate to ask. We won't bite 😄
Also keep in mind that this is the tip of git
-berg, but this can be a useful jumping off point to clarify a couple of conventions you might see around here.