How to use Git

This post is basically paraphrasing the Pro Git book and will tend to follow its structure.

Resources for this post:

Pro Git, Second Edition

Intro to Git

Git is a type of version control system (VCS). The purpose of version control is to maintain all the changes to a set of files in such a way that the user can revert back to any change. There are three different types of version control: local, centralized, and distributed. Git falls into the distributed category of version control, which means that each user of a repository has all of the recorded changes to all of the file stored on his local PC. This differs from centralized version control, where each user checks out only the latest copy, as opposed to the entire version history.

Linus Torvalds and the Linux community created Git to use as the version control for the Linux kernel.

Other types of VCS typically maintain a list of deltas to the files. Git stores everything as a snapshot of all the full files.

There are three conceptual areas of your git project: The Git directory/repository, the working directory, and the staging/index area. Once you modify a file that was previously comitted to the repository, it enters the modified state and is apart of the working directory. Once you git add this file, it moves to the staged state and is in the staging area. Only once you git commit does the file move to the committed state and enter the Git repository.

Git Basics

To initialize a repo for an existing or new project, cd into the root directory of the project and issue:

git init

The files in your working directory are either tracked or untracked. Tracked files are those that belonged to the prior snapshot; untracked are those that were not. After doing a git init, all the files in your project are untracked. Use git status to check the status of each file. This command tells you which branch you are on, the untracked files, and the modified files. You must manually add each of the files that you want git to track. This is to prevent git from adding a bunch of compiled files that you wouldn't want, for example, python .pyc files.

In the same root directory that you issued git init, create a file called .gitignore:

vi .gitignore

Here you can add lines such as .pyc or .class to tell Git to ignore the compiled files.

To add a untracked or modified file to your staging area, issue:

git add [filename]

It is important to note that git add will add files to your staging area; it does NOT add them to the committed repo.

If you have made some changes to your working directory that you have not staged yet, use git diff to show you all the changes. Once you git add these files to staging, git diff will no longer show the changes. To see the changes in your staged files that will go into the next commit, use git diff --staged.

To commit your staged files to the repo, use git commit -m "[commit message]". Remember that this will only commit the files that have been added to stage via git add. Any untracked or modified files which have not been staged will not be commited to the repo during a git commit.

If you would rather not git add each file to the staging area and then _git commit _them to the repo, you can simply issue the following command, which will stage all the modified files that are tracked and then commit them to the repo:

git commit -a -m "[commit message]"

If you want to delete a file from your hard drive and notify git about the removal of that file, issue git rm [filename]. This command will remove it from the hard drive (if you haven't removed it already), and it will stage the removal of the file. You must git commit to commit the removal of the file to the repo.

If you want git to stop tracking a certain file, but you do not want to remove it from the file system, issue git rm --cached [filename]. This command will not erase the file from disk, but it will stage the removal of the file. Again, you must git commit to commit the removal of this file to the repository. After this, you will see that the file has now moved to the untracked state.

To rename a file, issue git mv [filename] [new filename]. If you instead rename the file using the local file system commands, git will tell you that the newly renamed file is untracked, and the old file has been deleted. At this point you must git add [new filename] and git rm [old filename] to stage these changes, and git commit to commit the renaming of the file to the repo.

To view the commit history for this repository, issue git log. Without any options, this shows you the commits in descending chronological order, including the SHA1 checksum, the author, the date, and the commit message. Issue git log -p to see the diffs. Issue git log -2, where 2 is any number, to limit the history.

If you commit your staging files and realize that you wanted to use a different commit message than what you provided, you may issue git commit --amend, which will launch an editor and allow you to write a new commit message. The original commit message will be lost.

If you commit your staging files and realize that you forgot to add a modified file that you would have preferred to go into this commit, you can amend the previous commit by issuing:

git add [filename]  
git commit --amend

This will erase the previous commit, and will commit this forgotten file along with the previous files you commited in the erased commit.

To unstage a file, for example if you added a modified file to the staging area and decided to get rid of it, issue:

git reset HEAD [filename]

This will unstage the file and move its state back to modified.

If you modify a file and decide that you want to discard your changed and revert back to the original unmodified state, you may issue the following command:

git checkout -- [filename]

This will permanently discard the changes and revert the file back.

To get information about remote repositories you are working with, use the git remote command. Without options, git remote will dump the shortnames for each remote you have worked with. At the minimum, assuming you cloned a repo, it will dump origin; "origin" is the default shortname that git gives to the original repo you cloned from. To see the corresponding URLs for each repo, issue git remote -v_._

To add a new remote and give it a shortname, issue:

git remote add [shortname] [url]

Now that you've given this remote a shortname, you can use that in place of the URL for future commands, for example git fetch [shortname].

For example, you may have seen commands like this before:

git init  
git remote add origin https://server/repo/foo.git  
git pull origin

This initializes a repo, adds a remote named origin, fetches the changes and merges them. The following is a simple shortcut:

git clone https://server/repo/foo.git

A quick note regarding fetch, merge, and pull. When you git fetch [shortname], Git will pull all the commits from the repo that you don't have, but it does not merge them with your files. You may inspect the new files and decide what to do. If you want to merge them, you have to issue git merge [shortname]. However, _git pull [shortname] _is a shortcut that will first fetch and then merge the changes.

Tags

Tags are typically used to declare release points (v0.1, v0.2, etc.) To list your available tags, issue git tag.

There are two types of tags: lightweight and annotated. A lightweight tag is a pointer to a certain commit. Annotated tags are full blown objects in the git repo.

To create an annotated tag, issue git tag -a [tagname]. To create a lightweight tag, just issue git tag [tagname].

git push does not push created tags to the remote repo. Instead, you have to push them via git push origin [tagname].

Git Branching

What is a branch in Git? A branch is nothing but a movable pointer to a commit. When you first initialize a repo, it creates a branch with the default name of master. As you make commits, the branch points to each successive one. When people say "committing to a branch", they really mean "pointing the branch to the new commit." Incidentally, the name master is not special, it is simply the default name chosen for the first branch in a repo, and most people do not change it.

To create a new branch without switching to it, issue git branch [branch- name]. When a new branch is created, this new pointer is pointing to the current commit that the original branch points to. The HEAD pointer is used by git to point to the local branch you are currently using.

To switch to a new branch, issue git checkout [branch-name]. This moves the HEAD pointer to to the new branch. At this point, your working directory looks identical to your previous branch. If you add or modify some changes and commit to your new branch, and then you git checkout master, you will note that all the modifications in your working directory are gone.

To create a new branch and switch to it in a single command, issue:

git checkout -b [branch-name]

Branches are incredibly cheap to make and delete. You should use them often for any new change that you have, and then merge the branch back in with master, instead of making your changes directly to the master repo.

Git Branching and Merging

The Basic Branching and Merging chapter is an excellent intro to branching and merging. I will attempt to summarize it here.

Pretend that you work with a website and have been assigned some story. You create a new branch and switch to it by issuing git checkout -b storyA. You make some changes to index.html and then your manager pings you with a P1 bug that needs to be fixed. At this point, your changes to index.html haven't been tested and you can't merge them with the master before fixing the bug. What do you do?

You simply switch back to master. Before that, however, make sure to commit your changes on branch storyA. Then, issue git checkout master and you are back on the master branch, with none of your changes to index.html apparent.

You create a new branch and switch to it via git checkout -b bugfix. The bug that you need to fix also happens to be in index.html. You make your changes and run your tests. Once satisfied, commit the changes and prepare to merge them with the master branch.

To merge your bugfix branch with your master branch, checkout your master and then merge them:

git checkout master  
git merge bugfix

If the commit from the bugfix branch was directly ahead of the commit from master, git will inform you that a fast-forward occurred. A fast-forward merge is when git simply moves the pointer from down stream to upstream, and no merge actually occurs. There is no need for the bugfix any more, so you can delete it by issuing git branch -d bugfix__. If you forgot to merge it, this command will fail.

git remote add origin https://server/repo/foo.git  
git pull origin

Now you are ready to switch back to storyA via git checkout storyA. You can finish your changes to index.html and commit them. When you want to merge these changes with master, do so by checking out master and then merging with storyA. At this point, you will not have a fast-foward merge, but you will see this instead:

Merge made by the 'recursive' strategy

You do not have a fast-forward because the storyA branch was not a direct descendant of master. It is no longer a direct descendant of master because of the bugfix branch that you merged earlier, which moved the commit pointer up. Instead, git performs a three-way merge, using the current master commit, the current bugfix commit, and the common ancestor of both of these commits. To do this, git actually creates a new snapshot containing two parents, which is called a merge commit_. _

You are now free to delete the branch by git branch -d storyA.

However, let's assume that your changes in storyA _to _index.html created a merge conflict with the changes in bugfix to index.html. After issuing git merge storyA, git will tell you that

Automatic merge failed; fix conflicts and then commit the result.

When this occurs, the merge process is paused, meaning that git hasn't created the merge conflict. To see which files haven't been merged successfully, issue git status. These files will now be modified by git and contain contents that look like this:

<<<<<<< HEAD  
[Contents of the master commit]  
=======  
[Contents of the storyA commit]  
>>>>>>> story

You must now edit the file and remove all of the git markup, and decide which code to use. After finishing this, add the file to staging and commit it to finish the merge commit:

git add index.html  
git commit -m 'used storyA in merge

You are now free to delete the storyA branch via git branch -d storyA.

Branch Management

Too see the available branches, issue git branch with no arguments. It lists the branches and indicates the branch you are currently on with an asterisk. Too see the last commit on each branch, issue git branch -v.

If you merged some branches but forgot to delete them at the time, you can issue git branch --merged. Any branch here without an asterisk can be deleted fine.

Branching Workflows

There are multiple approaches to designing branching workflows. One common approach is the use of long-running branches. _In this approach, the _master branch consists only of entirely stable, tested, and ready for production code. A parallel branch called development is used to test the stability of further subbranches called topic_s. When a developer gets a new story, he creates a topic branch off the developement. After finishing, he merges the topic into _development and runs the tests. Assuming he wants to push to prod, he will then merge developement into master.

Rebasing

Git has two methods of integrating two branches: merging an rebasing.

When you merge two divergent branches, git performs the three-way merge with the two latest commits on each branch plus the common ancestor of the two branches, which creates a new snapshot and commit.

When you rebase two divergent branches, you take the changes applied in one of the branches and replay it on top of the second branch. To do this, git finds the common ancestor of the two branches, computes the diff between the common ancestor and the branch that you issued _rebase _from and persists them, resets the branch that you issued rebase from to the same commit as the branch you wish to rebase onto, and then applying the changes in the diff to this branch. From here you can fastforward the branch that you rebased onto to the brach you rebased from.

For example, create a new branch foo and switch to it via git checkout -b foo. Modify a file hai.txt and commit your changes. Switch back to master via git checkout master, modify a different file bai.txt, and commit your changes. Your master and foo branch are now diverged. Let's rebase from foo onto master.

git checkout foo  
git rebase master

While rebasing foo onto master, git found the common ancestor of foo and master, _and computed the diff between this common ancestor and _foo, which were the changes to file hai.txt_. Git then reset _foo to the master commit. It then applied the changes from the diff to this newly reset branch. You can cat the bai__.txt file that was modified in the master commit and see that it has been changed in your new foo branch. If you checkout master, you won't see any changes to the hai.txt file that was modified by the original foo branch. You can however now merge master and foo, and it will be a fast-forward merge, as opposed to the three-way merge that would have taken place had you not rebased foo onto master.

git checkout master  
git merge foo

The fact that this performs a fast-forward merge is proof that the foo branch was reset to the master branch during the rebase. Remember that git went to the common ancestor of foo and master and calculated the diff between the common ancestor and foo. It persists this diff, and then resets the foo branch to the master branch, meaning that it pointed the foo commit to the master commit, destroying and evidence of the original foo commit. Now foo and master point to the same commit. Git takes the diff that it had persisted and applies it, or replays it, on to new foo branch, which moves the pointer of foo up. The end result is that foo is now a direct descendent of master, as opposed to a divergent branch as it was originally. This is why a fast foward merge takes place when we merge foo into master after the rebase.

It must be noted that rebasing can irritate people. You should only rebase on your personal commits that have not been publically released.

Remote Repositories

This is a summary of chapter 12 from Version Control with Git, 2nd edition.

Repository Concepts

Comments

Add Comment

Name: Tony Rein

Creation Date: 2016-06-18

Just a note to thank you for "How to Use Git."