Buddhika Siddhisena is the Co-Founder&CTO of THINKCube Systems which specializes in Collaboration Technologies for the Enterprise by integrating widely used and proven FOSS Technologies. Buddhika Siddhisena obtained his B.Sc (Physical Science) from University of Colombo and is also a Member of the British Computer Society (MBCS). Buddhika has also been involved through theinception of several leading FOSS projects, such as Sahana Disaster Management System and Tarprobane GNU/Linux Distribution. He is an active member of the Lanka Linux User Group, Sri Lankan FOSS Community and does a weekly podcast about FOSS in Sinhala over at sinhalenfoss.org.

The smart GIT

11/24/2009 10:33 pm By Buddhika Siddhisena | Articles: 8


Source : http://www.geekherocomic.com/2008/08/15/git-clone/

While a git may be something you call a stupid and annoying person, not so when it comes to Source Code Management (SCM) software. As you will soon find out, git is one of the smartest SCMs' around, and is capturing mind share of programmers all over the world. After all it was written by one of the smartest persons around, to support development for one of the largest Open Source projects involving thousands of programmers - all in the course of a weekend! git started life back in 2005 as a result of Linus Torvalds, the creator of the Linux kernel, was forced to switch away from a proprietary SCM called BitKeeper, decided to write his own because he was not satisfied with other free alternatives. I will leave it up to you, the reader, to read up on the interesting historical events that lead up to git getting built. But before we get into git, lets cover the basics of SCM.


SCM Software

If you've been developing software for a while, chances are you've heard about SCMs or Revision Control Systems (RCS). SCMs are an invaluable tool when it comes to developing software as they can keep track of changes made to it. These changes are called commits and they are tracked in a software code repository. As a result one is able to view the evolution of the software project and when there is a bug or code conflict, SCMs can help resolve it. SCMs also provide a platform for programmers to collaboratively develop software without stepping over each other's code. As the software reaches certain milestones, they can be tagged within the SCM and released. That way someone can easily get a copy (checkout) of the software's source code in the same state it was released, at a later date. SCMs also somewhat facilitate to deviate from the current code base (called the trunk) and experiment writing a new feature or two in a separate branch. Depending on whether the experiment was successful or not, one could then either merge it back to the trunk or discard it. These are just some of the functionality you'd expect from a SCM software.

But not all SCMs were created equal. In the world of SCMs there are primarily two types - centralized and decentralized. In a centralized SCM, the code evolves in a central server to which everyone commits and checkout code from. As a result, programmers should synchronize often with the central SCM server or risk making several conflicting changes with another. It is therefore advised that one should commit back to the SCM after implementing a feature or closing a ticket or as often as possible. Unfortunately, this means the programmer needs to be connected to the central repository more often than not and the central repository should be protected as it is a single point of failure. Giving commit access to this centralized server is a another whole issue as several bad or malicious commits are painful to undo and repair. As you've probably guessed by now, centralised SCMs aren't very smart and git doesn't sound like it would fall into this category (based on the git is smart mantra). Despite these shortcomings centralized SCMs are still the most widely used where CVS and SVN are quite popular.

Git falls under the decentralized and distributed category of SCMs, where every programmer has a full copy (clone) of the repository that they can locally commit to or checkout from at will. In git, one does not have to continuously be thinking what the other programmer is up to. As long as programmers have a rough idea about who is working on what and no two programmers try to edit the same line of the same file, everything will be fine. Even in the unlikely event of a code conflict, it is a lot easier to resolve as there is no central pure code repository to fix. In essence git empowers the programmer by making him/her the master of their own cloned repository. As a result they can work offline without network connectivity to a central repository and experiment through branches without having to rely on the repo admin to create them. The whole social issue with granting commit access to the repository, where the developer must first prove him/her self is no longer a barrier.


Installing GIT

Before we get into git basics, now would be a good time to install git if you haven't already done so. There are several front-end clients for git but in this article we will be covering the most widely used command line utilities. These command line tools can be installed in pretty much any platform and generally provides identical functionality. Git runs on Linux (where it was developed), BSD, Sun Solaris, MacOSX, Windows and even on a jail broken iPhone!

To install on Debian GNU/Linux and Debian derived distributions such as Ubuntu:

$ sudo apt-get install git-core

Similarly on Fedora/RHEL one could use yum and on OpenSUSE, yast to install git. On windows, one could try installing git on top of Cygwin or use msysgit, though I admit I have not tried either. For a how to on setting up git on windows checkout http://tinyurl.com/yam2sb2


Creating a repo and making the first commit

Now that you've installed git lets go ahead and create our first git repository. Creating a git repository is quite simple. In fact any existing directory containing files you'd like to track can be converted into a git repository. But for now lets create a new directory and place a file or two.

1. $ mkdir ~/mygit.repo

2. $ cd ~/mygit.repo


4. $ git init

5. $ git add .

6. $ git commit -m 'Initial commit.'

From lines 1 - 3, we create a directory to hold the repo called mygit.repo in the user's home directory and create two empty files (README and AUTHORS). Lines 4 - 6 initializes the mygit.repo directory as a git repo and adds all the files to it. Instead of adding all files, it is possible to handpick the files to add by listing them one after the other. Finally in line 6 we make our first local commit to the repository, which is appropriately commented as "Initial commit".

Now lets actually go ahead and put some text in to the README and AUTHORS files.

1. $ echo 'This is my first git repo' > README

2. $ echo 'John Doe' > AUTHORS

3. $ git status

# modified: AUTHORS

# modified: README

When you issue git status command, it should show two files have been modified. If you'd like to actually see the changes you've made since the last commit, use the git diff command. Once your satisfied, go ahead and commit the changes.

$ git commit -a -m 'Added details to AUTHORS and README file'

[master a9a7eb9] Added details to AUTHORS and README file

2 files changed, 2 insertions(+), 0 deletions(-)

The above command says to commit all modified files (-a option) but you could have well omitted the -a option and listed the files separately for commit. That way you can group changes made to several files into several logically meaningful commits. Most people group commits so that each commit will implement or fix one functionality, but there is no hard and fast rule to doing it.

You can look at all commit messages made so far by running the command :

$ git log

commit a9a7eb92a08e19bbf31b16320da832f4fb82b383

Author: Buddhika Siddhisena <bud@buds-macbook.(none)>

Date: Sun Nov 22 11:24:35 2009 +0530

Added details to AUTHORS and README file

commit f503536dcaabc94fcdcbc53f1baea12d5c00f0ed

Author: Buddhika Siddhisena <bud@buds-macbook.(none)>

Date: Sun Nov 22 11:10:20 2009 +0530

Initial commit.


Cloning a repository

Unlike in centralized repositories where there is only one master and everything else is a working copy, in git every one is a master or a cloned master :) If you have access to someone else's git repository, then you can clone from them and then start making your own modifications (a bit like introducing mutation to a cloned dolly). For instance, say you felt like improving the Linux kernel. You could easily clone the whole Linux kernel repository on to your hard drive with all the commit history logs of changes ever made to it!

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6 .

But before you rush off to clone the kernel, lets try to clone our previous simple example directory. Git can use a multitude of protocols such as http/https, ssh or git protocol, to access a remote git repository. For the sake of simplicity, I'll assume you are using the ssh protocol as it is easy to setup on any *NIX box. On Windows, you could try to setup a webserver such as Apache for cloning purposes or Apache + WEBDAV if you want full read/write support.

From a remote machine where you want to get a copy of the repo:

$ git clone bud@ .

Replace above with your ssh login user on the remote machine where the repo was created along with its ip address. After entering your password, it should have created a local clone of the remote repo. You can now do whatever you like to this cloned repo without having to worry about the repo you cloned from.


Collaborative development

There is no point to everyone having their own clone and with all this power to modify, if it was difficult to share those modifications with others. With git this can easily be done by either pushing your changes to where you cloned from (origin) or asking the remote person to clone or pull from your repository. From our previous demo repo, lets do a quick modification to the AUTHORS file by adding another name to it and push that change to the origin.

1. $ echo 'Jane Smith' >> AUTHORS

2. $ git commit -a -m 'Added Jane Smith to list of authors.'

3. $ git pull

4. $ git push

In line 1 and 2 we do the minor edit and commit that to the local repository. In line 4, we push all pending commits to the origin machine where the clone was made from, but only after making sure our repository has been synced up with the origin (line 3). Line 3 is required to sync any changes someone else might have made since our last pull/clone to our repository by doing a "fast forward merge". In most cases the fast forward merge should go smoothly without any conflicts but in the unfortunate event, you will have to review and alter the code to resolve it. Git has helper tools called mergetool, which we won't get into in this article due to space/time constraints.

Besides pushing changes, you could also pull changes from another repo. For instance, you could ask someone else such as a peer programmer to do a git clone of your cloned repository and subsequently git pull to get changes you make. That person can then review and push any changes back to you or even directly on to the original origin machine. Git is flexible enough for you to come up with such workflows.


Whats next?

I hope this article has wet your appetite in using git for your next project or even for casually tracking document files. There is a lot more to git than I have explained and so hope you will take the lead to learn more as you begin using it. I will end with a couple of web resources to get you jump started.


Your rating: None Average: 5 (1 vote)