PM 566: Introduction to Health Data Science
[I]s the management of changes to documents […] Changes are usually identified by a number or letter code, termed the “revision number”, “revision level”, or simply “revision”. For example, an initial set of files is “revision 1”. When the first change is made, the resulting set is “revision 2”, and so on. Each revision is associated with a timestamp and the person making the change. Revisions can be compared, restored, and with some types of files, merged. – Wikipedia
Have you ever:
Made a change to code, realised it was a mistake and wanted to revert back?
Lost code or had a backup that was too old?
Had to maintain multiple versions of a product?
Wanted to see the difference between two (or more) versions of your code?
Wanted to prove that a particular change broke or fixed a piece of code?
Wanted to review the history of some code?
Wanted to submit a change to someone else’s code?
Wanted to share your code, or let other people work on your code?
Wanted to see how much work is being done, and where, when and by whom?
Wanted to experiment with a new feature without interfering with working code?
In these cases, and no doubt others, a version control system should make your life easier.
– Stackoverflow (by si618)
During this class (and perhaps, the entire program) we will be using Git.
Git is used by most developers in the world.
A great reference about the tool can be found here
More on what’s stupid about git here.
There are several ways to include Git in your work-pipeline. A few are:
Through command line
Through one of the available Git GUIs:
More alternatives here.
git pull
git add [target file]
git add [target file]
git checkout [target file]
git add
git commit -m "Your comments go here."
git commit -a -m "Your comments go here."
git push
.You can always check the current state of your repository with git status
!
Set up your git install with git config
, start by telling who you are
$ git config --global user.name "Juan Perez"
$ git config --global user.email "jperez@treschanchitos.edu"
Try it yourself (5 minutes) (more on how to configure git here)
We will start by working on our very first project. To do so, you are required to start using Git and Github so you can share your code with your team. For this exercise, you need to
PM566-first-project
, tell GitHub to add a README
file, and click “Create repository”.git
git clone
and then paste the URLYou now have a local version of your repository!
Now, let’s make some changes!
git status
now, you’ll see that you have unstaged changes.git add README
or git add --all
. If you check the git status
now, you’ll see that you have staged changes, ready to commit.Note 1: We are assuming that you already installed git in your system.
Note 2: Need a text editor? Checkout this website link.
git commit
command adding a message, e.g.If you check the git status
now, you’ll see that you are 1 commit ahead of the remote repository (GitHub).
git push
. If you check the git status
now, you should see that you are fully up to date.README
file are there!Oops! It seems that I added the wrong file to the tree, you can remove files from the tree using git rm --cached
, for example, imagine that you added the file class-notes.docx
(which you are not supposed to track), then you can remove it using
This will remove the file from the tree but not from your computer. You can go further and ask git to avoid adding .docx files using the .gitignore file
.gitignore
use-caseI like to have my data and code for a project all in the same place, but I don’t want to upload the data to GitHub, as this would exceed the size limit on a repository.
Open (or create) the .gitignore
file in a text editor and add the following line to ignore the directory called data
:
data/
.gitignore
Telling git to ignore files is a good way to make sure you don’t go over your storage limit on GitHub. It’s also just a convenient way to avoid unnecessary clutter. Example based on Pro-Git (link).
# ignore specific file (something.pdf) something.pdf # ignore all .png files *.png # but do track bird.png, even though you're ignoring .png files !bird.png # only ignore the TODO file in the root directory, not subdir/TODO /TODO # ignore all files in any directory named build build/ # ignore doc/notes.txt, but not doc/server/arch.txt doc/*.txt # ignore all .pdf files in the doc/ directory and any of its subdirectories doc/**/*.pdf
Git’s everyday commands, type man giteveryday
in your terminal/command line. and the very nice cheatsheet.
My personal choice for nightstand book: The Pro-git book (free online) (link)
Github’s website of resources (link)
The “Happy Git with R” book (link)
Roger Peng’s Mastering Software Development Book Section 3.9 Version control and Github (link)
Git exercises by Wojciech Frącz and Jacek Dajda (link)
Checkout GitHub’s Training YouTube Channel (link)