Recommendation of a Strategy
Wayne Chen
2012.1.6
Git
Why Git?
Linus Torvalds hate CVS & SVN
Speed
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Data sizeSVN occupied 143MB of a 140MB project
Perl, Eclipse, Qt, Ruby on Rails, Android...
Linus Torvalds: linux kernel project
,
Linus Torvalds use diff patch an tar to maintain Linux at first
Then he use BitKeeper -everyone has a reposity
Fully distributed local
143MBmetadate
Key concept
Nearly every operation is local
Not by file name but the hash value
The three states
Snapshots, not differences
Branch is cheap
Nearly every operation is local
After cloning a repository, you saved all of the history.
No network requirement.Except for: clone, pull, push, and fetch.
Not by file name but the hash value
Git use four types of objects to store the whole information, and each of these objects have an unique 20 bytes SHA-1 key to identify it.
What if two identical files with inconsistent file names?
commit object is simple: it specifies the top-level tree for the snapshot ()
A "tag" is a way to mark a specific commit as special in some way. Tag, object
root/ README lib/ mylib.rb
tricks.rb
inc/
, git,object, tree,blob...
The three states
First, the files that were in the last codebase are called tracked. If not, those are called untracked.
All of the tracked files can be divided into modified, staged, and committed.Modified (Working directory): You have just changed a file without doing anything to put it into Git database.
Staged (index): You have put a modified file into Git cache area.
Committed (objects): The data is safely stored in local Git database after taking a snapshot.
What is committed is what is currently in the index, not what is in your working directory.
After cloning repo, everything is tracked.
The three states
Working DirectoryWorking DirectoryModifiedStagedCommittedStaging area(index)Git repository(objects)
git addgit commitgit checkoutWhat things commited is in staged area not workspace
git add, , commit(,code,add,printk,commit)
Snapshots, not differences
Opposite to other VCS, Git is more like a mini file system.
File 2File 3File NFile 2File 3File NFile N-1File 4File 1File 2
File N-1
Time Line
One snapshot
All of thetracked files
commit,commit objectroot treeparent,pointer,cheakoutcodepointer
Branch is cheap
Git is an addressable file system, and branch is a pointer.Create a branch is just as storing a 20 bytes file.
Not trace the file, Git trace the commit.
HEAD file pointing to the branch youre on.
SHA-1ADDRESS, BRANCHPOINTER,POINTER,
GITBRANCHTRACE COMMIT
The HEAD ref is special in that it actually points to another ref. It is a pointer to the currently active branch.
In a common case
An engineer is doing his normal job,Work on a project.
Create a branch in the current codebase he works on.
Work on this branch to implement a new feature.
At this moment, he receives a phone call from a customer in mad who asks him to fix a terrible issue, Revert back to original production branch.
Create a branch to add the fix.
After the solution tested OK, merge the fix back.
Switch back to the branch he worked at first.
Merge new feature to production branch.
Get a repository
First of all, you should get a repository.
$ git init
Create an empty repository in your working folder.
After the first commit, git start to track files.
$ git clone [url]
Establish a working folder and create a .git/ inside.
git pull the whole history data from sever.
git checkout the newest code to your workspace.
Add some modifications
Modified your files, then you can use git add to stage files.git add -A: for any tracked or untracked
git add -u: update tracked files
git add -i: interactive select
Git also handle binary.
Flexible: .gitignoreIn git root, in directory of a project, or commit it.
Commit change
After doing commit, the change you make is safe.If go wrong, please use $ git commit amend
It is not a good habit to use $ git commit -a
workspacestage areaHEAD branchgit diff git diff HEAD git diff --cachedThe better way is,Be used to $ git add after making any change.If the work is done, - $ git diff to check missed - $ git diff -cached to check commit - $ git commit -m
Step One, create branch
Create a branch to work on new feature,
C0C1C2
master
new
$ git checkout -b new
It's the shorthand of:
$ git branch new$ git checkout new
Step Two, commit something
Commit something on this new feature,
C1C2C3
master
new
After the completion of some functions, he like to do one commit,
$ git commit -a -m add a new api
It's the shorthand of:
$ git add -A$ git commit -m add a new api
C0
Step Three, receive an urgent issue
Revert to production version, and create a branch for this urgent issue,
C1C2C3
master
new
For working on issue, you have to save current update, and rollback to stable production branch.
$ git checkout master
and then create a new branch:
$ git checkout -b issue
after fix it,
$ git commit -a -m fixed the issue
C4
issue
Step Four, test fix OK
After passing test, merge the fix to master branch,
C1C2C3
master
new
After the solution tested OK, merge it back to original master branch,
$ git checkout master$ git merge issue
Merge with directly upstream is called fast forward.
C4
issue
Step Five, finish the feature
Delete the issue branch and switch back to the branch of new feature,
C1C2C3
new
Delete existed branch,
$ git branch -d issue
Switch back to work-in-progress branch, and finish it.
$ git checkout new$ git commit -a -m finish it
C4
master
C5
The whole commit steps
C0C1C2C4C3C5C6
Checkout C3 and merge C5
After doing:
$ git checkout master$ git merge new
masternew
Undo
Git provides some of the mechanisms for developers to regret their mind.
The latest commits are no longer needed.
A specific commit is better to be rolled back.
If you like to undo your modifications or give up the data in stage area.
Reset, revert, and checkout is easy to be misused.
Softcommitgit reset --soft HEAD^^git commit -m Fix version
Undo use Reset
ResetLet developer able to reset commit status, stage area, or workspace.
$ git reset --soft HEAD~NReset commit status to the latest Nth commit without changing any files.
$ git reset HEAD~NReset commit status with undo git add command. ex. $ git reset HEAD play.c
$ git reset --hard HEAD~NNot only commits, but also stage area and files.
objectsWorkingdirectoryindexmasterHEAD
softnonehardresetHEAD
Softcommitgit reset --soft HEAD^^git commit -m Fix version
Undo use Checkout
CheckoutMove the HEAD pointer and checkout code.
$ git checkout [file] Checkout staged file to cover the real one.
$ git checkout [branch_name]Use the specific commit version to clean stage area and workspace.
'reset' changes the SHA-1 key of branch, but 'checkout' just moves the HEAD.
softcommit
Undo use Revert
RevertRollback files with creating a rollback commit.
Reset is back and revert is forward.
$ git revert HEAD~NCreate a new commit which revert the latest Nth commit.
$ git revert SHA-1
softcommit
Rescue mechanism
Git store every move of HEAD, so don't worry.$ git reflog show master
$ git reset -- hard master@{N}
But there are still some dangerous events, do not easily use it.$ git reset -hard
$ git checkout HEAD
Branch is cheap!
Git communication
Local RepositoryRemotename:v_aRemotename:v_bRemotename:v_cRemoteRepositorygit://a...RemoteRepositorygit://b...RemoteRepositorygit://c...pushpull
pushpull
pushpull
add remote
If a local repository is exist, you can add a remote (identify remote name and remote repository url) to git pull or git fetch data from remote repository, and use git push to put your contribution on it.What you obtained is the latest branch, and the whole history.Remote name is used to identify the project resource, but tag and branch are used to identify the timing of the project snapshot. Cloneoriginalremoteclone
Look inside .git
Git --- The stupid content trackerUse compressed object which named as SHA-1 to store everything.
objects: Full objects (commits, trees, blobs, tags).
refs: Pointers to all of the branches and tags.
logs: A history of where your branches have been.
Current pointers
All your history is stored in the Git Directory; the working directory is simply a temporary checkout place where you can modify the files until your next commit.
Object folder
objects: stores all of the commit, tag, tree, and blob objects.
00/ 6d/ 9b/ ac/ b0/ Info/ pack/
Loose objects Store the files named like:
a9dca9a0fe0c031c996d308ab8a781ab7f358f which store the objects compressed by zlib.
Packed objects
Store the files named like:
pack-a9dca9a0fe0c031c996d308ab8a781ab7f358f.ixd pack-a9dca9a0fe0c031c996d308ab8a781ab7f358f.pack .pack: The contents of all the objects that were removed from early loose objects. .idx: Offsets into the pack file.
Totally 19 bytes
Refs folder
refs: stores all of the pointers.
head/
remotes/
tag/ After creating a tag, a file named as tag is created here, and the content is the SHA-1 which tag point to. Otherwise, a tag object also created in the object folder. Each folders store the objects fetched from remote branch.
Stores files named as each branch, and the contents are the SHA-1 which branch point to.
Creating a new branch is as quick and simple as writing 41 bytes to a file (40 charactersand a newline). .git/refs/heads
Logs folder
logs: every move will leave it's mark.
Remember $ git reflog show master?
Current status
HEAD: points to the current active branch.
ORIG_HEAD: stores the previous HEAD before doing git pull, git merge.$ git reset --hard ORIG_HEAD
FETCH_HEAD: record the branch you fetched.
index: stores staged data. The next proposed commit snapshot.
Add and commit in low level
$ git addUpdates the index Write to compressed file .git/index
$ git commit
Stores blobs for the changed files Add a loose file to .git/object/
Writes out tree objects (.git/object)
Writes commit objects that reference the top level tree
Modified HEAD and branch pointer (.git/refs/heads & .git/logs/refs/heads)
Store commit msg (.git/COMMIT_EDITMSG)
Compete it with SVN again
Centralized vs DistributedSVN is one repo and lots of clients. GIT is a repo with lots of client repos.
Checkout working copy vs whole repository.
Serial number or Lots of branchCorporate work or distributed version control?
Consistency vs FlexibilitySVN makes everyone working on the same thing.
Ref: Please Stop Bugging Linus Torvalds About Subversion
Gitlocal, , or pushserver, git, user local control
Svn, carelocal, servercode
Git, svn
svn, gitref
Repo
There are over 160 projects involved in Android source.
repo init - to set up clone script
repo sync
repo start - create local branch
repo upload
The definition of open: "mkdir android ; cd android ; repo init -u git://android.git.kernel.org/platform/manifest.git ; repo sync ; make" Andy Rubin
repo init -u ssh://.../manifest.git -b xxx -m android.xml - Verify your SSH public key - Get repo - Get a manifest.xml - Clone the projects listed on manifest.xml - Identify specific project: repo sync kernel/linux - Transmit branch to Gerrit over an SSH connection - Gerrit reviews each commits. It is better to run git rebase -i before repo upload.
Resource
Install gitCygwin or MsysGit
Get repo Curl -l -k http://android.git.kernel.org/repo
ReferenceOfficial: http://git-scm.com/
Git Reference: http://gitref.org/
Pro Git: http://progit.org/book/
Repo: http://source.android.com/source/using-repo.html#init
Get project to practice: http://www.kernel.org/pub/
How Linus Torvalds talk about GIT: http://www.youtube.com/watch?v=4XpnKHJAok8
Thanks!