Introduction of Git

Recommendation of a Strategy

Wayne Chen
2012.1.6

Git

Why Git?

Linus Torvalds hate CVS & SVN

Speed

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Data sizeSVN occupied 143MB of a 140MB project

Perl, Eclipse, Qt, Ruby on Rails, Android...

Linus Torvalds: linux kernel project

,

Linus Torvalds use diff patch an tar to maintain Linux at first

Then he use BitKeeper -everyone has a reposity

Fully distributed local

143MBmetadate

Key concept

Nearly every operation is local

Not by file name but the hash value

The three states

Snapshots, not differences

Branch is cheap

Nearly every operation is local

After cloning a repository, you saved all of the history.

No network requirement.Except for: clone, pull, push, and fetch.

Not by file name but the hash value

Git use four types of objects to store the whole information, and each of these objects have an unique 20 bytes SHA-1 key to identify it.

What if two identical files with inconsistent file names?

commit object is simple: it specifies the top-level tree for the snapshot ()

A "tag" is a way to mark a specific commit as special in some way. Tag, object

root/ README lib/ mylib.rb

tricks.rb

inc/

, git,object, tree,blob...

The three states

First, the files that were in the last codebase are called tracked. If not, those are called untracked.

All of the tracked files can be divided into modified, staged, and committed.Modified (Working directory): You have just changed a file without doing anything to put it into Git database.

Staged (index): You have put a modified file into Git cache area.

Committed (objects): The data is safely stored in local Git database after taking a snapshot.

What is committed is what is currently in the index, not what is in your working directory.

After cloning repo, everything is tracked.

The three states

Working DirectoryWorking DirectoryModifiedStagedCommittedStaging area(index)Git repository(objects)

git addgit commitgit checkoutWhat things commited is in staged area not workspace

git add, , commit(,code,add,printk,commit)

Snapshots, not differences

Opposite to other VCS, Git is more like a mini file system.

File 2File 3File NFile 2File 3File NFile N-1File 4File 1File 2

File N-1

Time Line

One snapshot

All of thetracked files

commit,commit objectroot treeparent,pointer,cheakoutcodepointer

Branch is cheap

Git is an addressable file system, and branch is a pointer.Create a branch is just as storing a 20 bytes file.

Not trace the file, Git trace the commit.

HEAD file pointing to the branch youre on.

SHA-1ADDRESS, BRANCHPOINTER,POINTER,

GITBRANCHTRACE COMMIT

The HEAD ref is special in that it actually points to another ref. It is a pointer to the currently active branch.

In a common case

An engineer is doing his normal job,Work on a project.

Create a branch in the current codebase he works on.

Work on this branch to implement a new feature.

At this moment, he receives a phone call from a customer in mad who asks him to fix a terrible issue, Revert back to original production branch.

Create a branch to add the fix.

After the solution tested OK, merge the fix back.

Switch back to the branch he worked at first.

Merge new feature to production branch.

Get a repository

First of all, you should get a repository.

$ git init

Create an empty repository in your working folder.

After the first commit, git start to track files.

$ git clone [url]

Establish a working folder and create a .git/ inside.

git pull the whole history data from sever.

git checkout the newest code to your workspace.

Add some modifications

Modified your files, then you can use git add to stage files.git add -A: for any tracked or untracked

git add -u: update tracked files

git add -i: interactive select

Git also handle binary.

Flexible: .gitignoreIn git root, in directory of a project, or commit it.

Commit change

After doing commit, the change you make is safe.If go wrong, please use $ git commit amend

It is not a good habit to use $ git commit -a

workspacestage areaHEAD branchgit diff git diff HEAD git diff --cachedThe better way is,Be used to $ git add after making any change.If the work is done, - $ git diff to check missed - $ git diff -cached to check commit - $ git commit -m

Step One, create branch

Create a branch to work on new feature,

C0C1C2

master

new

$ git checkout -b new

It's the shorthand of:

$ git branch new$ git checkout new

Step Two, commit something

Commit something on this new feature,

C1C2C3

master

new

After the completion of some functions, he like to do one commit,

$ git commit -a -m add a new api

It's the shorthand of:

$ git add -A$ git commit -m add a new api

C0

Step Three, receive an urgent issue

Revert to production version, and create a branch for this urgent issue,

C1C2C3

master

new

For working on issue, you have to save current update, and rollback to stable production branch.

$ git checkout master

and then create a new branch:

$ git checkout -b issue

after fix it,

$ git commit -a -m fixed the issue

C4

issue

Step Four, test fix OK

After passing test, merge the fix to master branch,

C1C2C3

master

new

After the solution tested OK, merge it back to original master branch,

$ git checkout master$ git merge issue

Merge with directly upstream is called fast forward.

C4

issue

Step Five, finish the feature

Delete the issue branch and switch back to the branch of new feature,

C1C2C3

new

Delete existed branch,

$ git branch -d issue

Switch back to work-in-progress branch, and finish it.

$ git checkout new$ git commit -a -m finish it

C4

master

C5

The whole commit steps

C0C1C2C4C3C5C6

Checkout C3 and merge C5

After doing:

$ git checkout master$ git merge new

masternew

Undo

Git provides some of the mechanisms for developers to regret their mind.

The latest commits are no longer needed.

A specific commit is better to be rolled back.

If you like to undo your modifications or give up the data in stage area.

Reset, revert, and checkout is easy to be misused.

Softcommitgit reset --soft HEAD^^git commit -m Fix version

Undo use Reset

ResetLet developer able to reset commit status, stage area, or workspace.

$ git reset --soft HEAD~NReset commit status to the latest Nth commit without changing any files.

$ git reset HEAD~NReset commit status with undo git add command. ex. $ git reset HEAD play.c

$ git reset --hard HEAD~NNot only commits, but also stage area and files.

objectsWorkingdirectoryindexmasterHEAD

softnonehardresetHEAD

Softcommitgit reset --soft HEAD^^git commit -m Fix version

Undo use Checkout

CheckoutMove the HEAD pointer and checkout code.

$ git checkout [file] Checkout staged file to cover the real one.

$ git checkout [branch_name]Use the specific commit version to clean stage area and workspace.

'reset' changes the SHA-1 key of branch, but 'checkout' just moves the HEAD.

softcommit

Undo use Revert

RevertRollback files with creating a rollback commit.

Reset is back and revert is forward.

$ git revert HEAD~NCreate a new commit which revert the latest Nth commit.

$ git revert SHA-1

softcommit

Rescue mechanism

Git store every move of HEAD, so don't worry.$ git reflog show master

$ git reset -- hard master@{N}

But there are still some dangerous events, do not easily use it.$ git reset -hard

$ git checkout HEAD

Branch is cheap!

Git communication

Local RepositoryRemotename:v_aRemotename:v_bRemotename:v_cRemoteRepositorygit://a...RemoteRepositorygit://b...RemoteRepositorygit://c...pushpull

pushpull

pushpull

add remote

If a local repository is exist, you can add a remote (identify remote name and remote repository url) to git pull or git fetch data from remote repository, and use git push to put your contribution on it.What you obtained is the latest branch, and the whole history.Remote name is used to identify the project resource, but tag and branch are used to identify the timing of the project snapshot. Cloneoriginalremoteclone

Look inside .git

Git --- The stupid content trackerUse compressed object which named as SHA-1 to store everything.

objects: Full objects (commits, trees, blobs, tags).

refs: Pointers to all of the branches and tags.

logs: A history of where your branches have been.

Current pointers

All your history is stored in the Git Directory; the working directory is simply a temporary checkout place where you can modify the files until your next commit.

Object folder

objects: stores all of the commit, tag, tree, and blob objects.

00/ 6d/ 9b/ ac/ b0/ Info/ pack/

Loose objects Store the files named like:

a9dca9a0fe0c031c996d308ab8a781ab7f358f which store the objects compressed by zlib.

Packed objects

Store the files named like:

pack-a9dca9a0fe0c031c996d308ab8a781ab7f358f.ixd pack-a9dca9a0fe0c031c996d308ab8a781ab7f358f.pack .pack: The contents of all the objects that were removed from early loose objects. .idx: Offsets into the pack file.

Totally 19 bytes

Refs folder

refs: stores all of the pointers.

head/

remotes/

tag/ After creating a tag, a file named as tag is created here, and the content is the SHA-1 which tag point to. Otherwise, a tag object also created in the object folder. Each folders store the objects fetched from remote branch.

Stores files named as each branch, and the contents are the SHA-1 which branch point to.

Creating a new branch is as quick and simple as writing 41 bytes to a file (40 charactersand a newline). .git/refs/heads

Logs folder

logs: every move will leave it's mark.

Remember $ git reflog show master?

Current status

HEAD: points to the current active branch.

ORIG_HEAD: stores the previous HEAD before doing git pull, git merge.$ git reset --hard ORIG_HEAD

FETCH_HEAD: record the branch you fetched.

index: stores staged data. The next proposed commit snapshot.

Add and commit in low level

$ git addUpdates the index Write to compressed file .git/index

$ git commit

Stores blobs for the changed files Add a loose file to .git/object/

Writes out tree objects (.git/object)

Writes commit objects that reference the top level tree

Modified HEAD and branch pointer (.git/refs/heads & .git/logs/refs/heads)

Store commit msg (.git/COMMIT_EDITMSG)

Compete it with SVN again

Centralized vs DistributedSVN is one repo and lots of clients. GIT is a repo with lots of client repos.

Checkout working copy vs whole repository.

Serial number or Lots of branchCorporate work or distributed version control?

Consistency vs FlexibilitySVN makes everyone working on the same thing.

Ref: Please Stop Bugging Linus Torvalds About Subversion

Gitlocal, , or pushserver, git, user local control

Svn, carelocal, servercode

Git, svn

svn, gitref

Repo

There are over 160 projects involved in Android source.

repo init - to set up clone script

repo sync

repo start - create local branch

repo upload

The definition of open: "mkdir android ; cd android ; repo init -u git://android.git.kernel.org/platform/manifest.git ; repo sync ; make" Andy Rubin

repo init -u ssh://.../manifest.git -b xxx -m android.xml - Verify your SSH public key - Get repo - Get a manifest.xml - Clone the projects listed on manifest.xml - Identify specific project: repo sync kernel/linux - Transmit branch to Gerrit over an SSH connection - Gerrit reviews each commits. It is better to run git rebase -i before repo upload.

Resource

Install gitCygwin or MsysGit

Get repo Curl -l -k http://android.git.kernel.org/repo

ReferenceOfficial: http://git-scm.com/

Git Reference: http://gitref.org/

Pro Git: http://progit.org/book/

Repo: http://source.android.com/source/using-repo.html#init

Get project to practice: http://www.kernel.org/pub/

How Linus Torvalds talk about GIT: http://www.youtube.com/watch?v=4XpnKHJAok8

Thanks!

Engineering

Introduction of Git