Git

From HaskellWiki
Revision as of 22:06, 2 April 2013 by Henk-Jan van Tuyl (talk | contribs) (Added section "Further reading" and a link to "Understanding the Git Workflow")
Jump to navigation Jump to search

This article is a stub. You can help by expanding it.

WORK IN PROGRESS

This page aims to introduce the concepts behind Git in a "Haskell way".


Introduction

Git is a distributed revision control system, used by many Haskellers. Darcs is also popular, but tends to get slow when projects grow large. GitHub is a site that is used, amongst others, for many open source Haskell projects.


The DAG

Each node of the DAG is uniquely identified by a reference, and represent an immutable history point (commit).


Branches and tags

Branches and tags point to the DAG through a reference. They provide a way to name entry points in the DAG that are meaningful to the user.

Branches contains references that usually change once work is done on the branch.

Tags are essentially the same as a branch except that by design they name a specific reference and usually do not change.


Objects

Kinds of objects

There are 4 kinds of objects: tag, blob, tree and commit.

- a blob contains data.

data Blob = Blob ByteString

- a tree contains name associated by reference with blobs or trees. This represent a filesystem hierarchy, with trees representing directories, and blobs representing files:

data TreeContent = T TreeReference | B BlobReference
data Tree = [ (Name, TreeContent ]

- a tag object is a signed reference with a the signature's author.

type SignatureBlob = ByteString
data Tag = Tag ObjectReference Name SignatureBlob

- a commit is essentially a node of the DAG with associated metadata (who created this commit, at which time). A commit point to a arborescence through a tree reference, and may have parents which are tree references:

data Commit = Commit
        { tree      :: TreeReference
        , parents   :: [TreeReference]
        , author    :: (Name,Time)
        , committer :: (Name, Time)
        , message   :: ByteString
        }

The object store

All the different objects in Git - individual files, entire directory trees, commits and other things - are stored in a repository-wide central store. Each object is identified by computing a SHA-1 hash, which is a function of only the object's contents.

But... doesn't that mean that when I change a single line in a file, a whole new copy is stored?

Yes.

However, every once in a while, git compacts files that are similar together into Packfiles, by storing only their diffs. [TO BE EXPANDED?]

Garbage collection and git reflog

When objects are not reachable from any root (like a branch reference), they become dangling and are subject to garbage collection. However, garbage collection does not kick in immediately.

When making a mistake, it is often helpful to look at commit objects by date independent of whether they are reachable, in order to be able to restore them. You can use git reflog for that.


Further reading