- Date and time: 26th September 2019, 14:00 - 15:30
- Location: Health Data Research UK, Wellcome Trust, London, UK
- Trainer: Sergio
- Host: Gabriella Rustici
- Course url: https://tinyurl.com/2019HDR
The aim for today is to learn the basics of GitHub so that you can use it for your own projects.
- Background
- What is version control? What is Git? What is GitHub?
- How can you use GitHub? How can it be useful for your work?
- Practical session: working with GitHub
-
GitHub was originally developed to manage the development of large-scale software projects e.g. Unix. Today's major user of GitHub is Microsoft, who recently acquired it.
-
Although designed for software management at first, it is now used for many other purposes and disciplines. Widely-used in academia, industry and government in different contexts.
-
Record and access the history of a project: keep track of versions during project development e.g. the project status 10 days ago
-
It is findable (repositories available online through www.github.com and with embedded search capabilities), accessible (via any internet browser) and interoperable (easy interaction with any operating system - Mac, Linux, Windows).
Version control is the management of changes (a.k.a. revisions) to any types of information
- Effectively "save" your work at important points in time and come back to any of the saved points. You may lose information but can recover and go back from the mistakes as it provides offsite backup in a remote server
- In its simplest form, creating copies and changing file names, e.g. adding v1.0, v1.1, v2.0
- It makes collaboration much easier:
- Using tools that (to some extent) incorporate version control functionality, e.g. Google Drive and Dropbox
- Using dedicated version control tools, e.g. Git anf GitHub
(http://phdcomics.com/comics/archive.php?comicid=1323)
The first version control systems were created by groups writing software and code. Fortunately, they can now be used not only by computer scientists (for developing computer code) but by anyone (for any type of file) 😄
There are two types of version control systems:
(adapted from http://lhzuigao.com/309note.html)
Advantages of distributed (right) over centralised (left) version control systems include:
- If the central repository (server) crashes, it could be recovered / backed up from any of the local repositories created e.g. by the researcher, collaborator or group leader.
- Each person can make changes to their local repositories offline. Then integrate their individual changes in the central repository (server) when connected online.
Git is a distributed version control system to keep track and compare the history of changes made to your scripts and files. It allows groups of people to work on the same documents at the same time, and without stepping on each other's toes. It was created by Linus Torvalds in 2005 for the development of the Linux project. It is free and open source and helps you with:
- Creating repositories to host your projects using the command-line
- Tracking changes to the files and folders within your repositories
GitHub is an online platform to share and showcase your work with collaborators and the wider audience. A tool to help you build projects that are collaborative, well documented, and version-controlled. It provides you with:
- A place to host and backup your repositories online
- A nice web interface to your repositories
- A strategy to collaborate with colleagues
Versions in Git and GitHub are identified by a revision number, e.g. 60363b1, also known as commit. Each revision is associated with a timestamp and the person making the change. Revisions can be compared, restored, and with some types of files, merged.
There are other softwares for version control similar to Git, e.g. svn. Similarly, there are other online platforms similar to GitHub to share and collaborate code, e.g. GitLab.
There are two interfaces to GitHub:
- Github Desktop (available for Mac and Windows)
(https://programminghistorian.org/lessons/getting-started-with-github-desktop)
Today, we will be using GitHub's online interface.
-
To host and share outputs and software:
- Laboratories sharing own research e.g. the Jefferis or Balasubramanian groups
- Software source codes e.g. BWA, a bioinformatics tool to align DNA sequences to reference genomes
- Or even to share slides - see Bérénice Batut work and Stephen J. Eglen slides
-
To create websites using GitHub pages
- Personal research websites, e.g. Mike Love site
- Courses and activities, e.g.
-
To share the contents of a book, e.g. Bioinformatics Data Skills or Happy Git and GitHub for the R user
-
To write a PhD thesis, e.g. A reasoning framework for C4 photosynthesis research based on high-throughput analysis
-
Even to change the law
-
Communication is key as most projects have both experimental and computational leaders
-
Building from the classical ways of sharing - conversations/meetings, email, Dropbox, shared folders ... we want to build an environment where:
- Computational colleagues can share code, figures and tables. Review others work and get credit from their collaborative work
- Experimental colleagues can follow computational developments, access results and learn methods of data analysis
-
And ideally avoiding situations like ...
(http://phdcomics.com/comics.php?f=1689)
- Parhaps a happier lifetime for a research project:
(https://github.com/semacu/20170703_GitHubintheLab_CRUK-CI)
If you want to start creating repositories in GitHub, your first need to open an account:
- Public repositories are free, and can be browsed and downloaded by anyone
- Private repositories have associated costs depending on the number of collaborators - see pricing of plans. The individual pro and team plans cost $7/month and $9/month respectively but they are free if you are a student or an academic (institution).
Alternatively, GitLab uses a different business strategy with free private repositories and cost plans for public ones. There are also other alternatives e.g. Bitbucket.
-
GitHub uses Markdown for text edition, a language with plain text formatting syntax (bold, italics, checkboxes, lists, etc.), to render pages online (like HTML but easier). You can use this syntax in text files (file extension: .md), commit messages, issues, blog posts, and more.
-
Markdown is important because GitHub automatically renders anything written in Markdown. This can be specific files (eg: README), or your comments and issues.
-
Some examples of Markdown syntax are available here.
We have several tutorials:
- Create a GitHub account (+)
- Create your first repository (+)
- Explore your GitHub account and first repository (++)
- Create an issue and a branch (++)
- My first pull request (+++)
- Additional tutorials
If you don't have a GitHub account already:
- Go to https://github.com
- Fill in your Username, Email and Password. Then click on the green button "Sign up for GitHub".
- Choose your personal plan page. Select "Free plan" and then click on "Continue".
- Tailor your experience page. Choose the boxes that apply to you and click on "Submit". Otherwise, just go to "skip this step".
- You have created a GitHub account! 😄
- If you are not already signed in, sign in to GitHub using the Username/Email and Password created before.
- Click on the top-right "avatar icon" and select "Your profile". Have a quick browse through your page.
- Click on the top-right "+" icon and select "New repository". Verify your email address. You should have just received an email from GitHub in the address provided before. Find this email and click on "Verify email address".
- Create a new repository page. Fill in a "Repository name", e.g. "my_first_repository" or "my_analysis_script". Write a short description of your repository e.g. "This is a test repository". For now choose "Public" and select the box to initialize this repository with a README. Finally, click on "Create repository".
- You created your first repository! 🚀
- Click on your top-right "avatar" icon and select "Settings".
- Explore the tabs "Profile", "Account" and "Emails".
- Click on README.md and go to the right pencil "Edit this file". Type anything to change the file, e.g. "GitHub is fun!".
- Scroll down. Introduce a commit change message, e.g. "My first update", and select the radio button "Commit directly to the master branch". Then click on "Commit changes". Voilá!
- To view your history of commits for README.md, click on README.md and then on the "History" button on the right.
- Alternatively, to view your history of commits for your first repository, click on the name of your repository and select the tab depicting a small clock and the number of commits next to it.
Bonus points:
- Try to create a second new file and add some content to it
- In your new repository, have a look at the "Settings" tab, explore "Collaborators" and try adding one of your colleagues
Key glossary:
-
Repository: it can be thought of as a project folder. A repository contains all of the project files, issues, wikis and more. It also stores the history and versions of each file.
-
Commit: equivalent to saving your changes to a file. When you commit you usually include a brief description of the changes you made so you can identify versions later if you want to undo a change.
-
Branch: an identical copy of a project at a particular point in time kept separate from the 'master' branch (primary copy). This keeps your code in the 'master' branch safe while you make changes and experiment with code on the new branch. You can merge your new branch back into the 'master' branch when you want to publish your changes.
-
Master: the default branch in your repository.
-
Collaborator: someone with read and write privileges to a repository as approved by the repository owner.
(https://buildazure.com/introduction-to-git-version-control-workflow/)
Follow steps 5-8 in the following page
(https://www.dataschool.io/simple-guide-to-forks-in-github-and-git/)
Follow steps 1-7 in the following page. No need to complete the Stretch goal.
If you are more interested, try the following later:
- Working locally using GitHub Desktop
- Using GitHub to make a webpage
- Markdown tutorial
Many Thanks for your attention! Enjoy Git and GitHub!
Any questions/suggestions about this workshop or the materials? Just email me at: sermarcue@gmail.com
Blogs:
Books:
Courses:
- Open Scientific Code using Git and GitHub by Yo Yehudi
- A Friendly Introduction to GitHub
- Software Carpentry: Version Control with Git
- Resources to learn Git
- GitHub On Demand Training
- A quick introduction to Git and GitHub
Help:
- GitHub Help
Papers:
- Nature Methods 2018 editorial, Easing the burden of code review
- Perkel 2018:
- Silver 2018 Microsoft’s purchase of GitHub leaves some scientists uneasy
- Russell et al. 2018 A large-scale analysis of bioinformatics code on GitHub
- Perez-Riverol et al. 2016 Ten Simple Rules for Taking Advantage of Git and GitHub
- Perkel 2016 Democratic databases: science on GitHub
- Markowetz 2015 Five selfish reasons to work reproducibly
Videos:
Websites: