Skip to content
Phillip Cloud edited this page Sep 6, 2013 · 9 revisions

From @cpcloud:

My pandas workflow usually goes something like this:

  1. Always use virtualenvwrapper. Period. Then I do workon pandas.
  2. Depending on how much time I have I'll try to prioritize based on some combination of which issues are high-priority and those for which I see an easy fix and those that I think would be fun to work on.
  3. Once I've selected an issue I'll start hacking on it. I usually don't use any local branches unless I'm implementing something new and there's a bunch of things for that feature that can be worked on separately.
  4. I use git stash all the time. In fact so much so that I have aliases for every git stash subcommand. Often times I'll get interrupted and I'll need to stash something so I do. I can then come back to it. If you end up accumulating a crap ton of stashes (which I do) then you can inspect them with git stash list followed by git stash -p stash@{0} or whatever stash number you want to see the diff of. I'll also occasionally prune the stash via visual inspection and/or my own memory of what's been merged recently or by comparison of stashes to upstream/master.
  5. Two things that I use a lot are the command git push -u origin $(git rev-parse --abbrev-ref HEAD) and hub. The first command pushes a remote branch with the same name as the one that I made locally to the origin remote (my fork). I have git rev-parse --abbrev-ref HEAD aliased to gcb which I remember as "git current branch". The other command I use is hub pull-request -b pydata:master which automatically submits a PR. I can essentially push the remote and the PR in a single step on the command line without having to use the mouse. Pretty neat.
  6. git reflog is great for those times when you accidentally type git reset --hard and you need to recover something.
  7. Finally I'll rebase using git rebase upstream/master to pull in the latest from the upstream remote (which I have set to http://github.com/pydata/pandas.git)
  8. Throughout the dev process I'm constantly doing git fetch && git rebase upstream/master, although recently I've toned that down a bit and I've started to just do it when I need to.
  9. I also use git log all the time. Since the standard git log is a bit too verbose I use scm_breeze and it has a gorgeous standard git log (as long as you've set git config --global color.ui auto!)

git merge and git rebase

git merge

git merge <a-branch-name> will take the union of the commits of the branch that you're currently on and the branch named a-branch-name. git merge is nice when you want to create a branch off of another branch and then you want to go back to your original branch and bring the changes from the branch you've just made changes in.

For example,

git checkout master
git checkout -b new-branch

# try to not to break all the things!
gvim pandas/core/frame.py

git add .
git commit -m'ENH: made pandas awesomer!'

# make a new branch based off the one you're current on
git checkout -b newer-branch new-branch

# make some more awesome changes to frame.py

git checkout new-branch
git merge newer-branch

This can sometimes leave ugly messages like Merged foo from bar (but not always).

NOTE: It's worth spending a significant amount of time understanding some of the details of what git is doing, otherwise you'll always feel a bit lost. This is especially true when making changes to the history via git rebase or any other history changing mechanism. A Google search for "Git book" will turn up plenty of useful resources.

git rebase

git rebase allows you to change the history of a repository. Changning the history is not necessarily a good thing (some people hate it) if you've pushed to a remote that a bunch of people depend on. For pandas pull requests, it's usually okay to rebase since very few people (if any) are going to base code off of your PR. Trolling the pandas repo, you'll see much talk about "I'll squash spam after I zip up the foo bar that does baz". This is just letting the core devs know that you're going to get rid of commits that are very small and/or have commit messages that are not meaningful.

Another thing you probably want to do on occasion is:

git fetch upstream
git rebase upstream/master

If a core developer asks you to rebase on top of master, this is what we mean.

Those commands will speed up the merging of your PR by making you deal with merge conflicts. Your changes are going to be most familiar to you and thus will be resolved faster than if someone who doesn't know why you did what you did tries to resolve the conflicts. Note that if you made changes to pandas/doc/source/release.rst and changes have occurred upstream then you'll almost always get a merge conflict from that. Those are pretty much unavoidable, so don't be scared when you get one. Just remove the conflict markers keeping your addition to release.rst along with the lines that created the conflict.

The people around pandas have been very patient and helpful with my (lack of) knowledge about git especially @y-p. If you're confused about something, even after reading the wiki and searching around for an answer, don't hesitate to ask. We won't bite 😄

Also keep in mind that this is the tip of git-berg, but this can be a useful jumping off point to clarify a couple of conventions you might see around here.