Git

From HaskellWiki

This article is a stub. You can help by expanding it.

WORK IN PROGRESS

This page aims to introduce the concepts behind Git in a "Haskell way".


Introduction

Git is a distributed revision control system, used by many Haskellers. Darcs is also popular, but it tends to get slow when projects grow large. GitHub is a site for Git based projects that is used, amongst others, for many open source Haskell projects.


The DAG

Each node of the DAG is uniquely identified by a reference, and represents an immutable history point (commit).


Branches and tags

Branches and tags point to the DAG through a reference. They provide a way to name entry points in the DAG that are meaningful to the user.

Branches contain references that usually change once work is done on the branch.

Tags are essentially the same as a branch, except that, by design, they name a specific reference and usually do not change.


Objects

Kinds of objects

There are 4 kinds of objects: tag, blob, tree and commit.

- a blob contains data.

data Blob = Blob ByteString

- a tree contains name associated by reference with blobs or trees. This represents a filesystem hierarchy, with trees representing directories, and blobs representing files:

data TreeContent = T TreeReference | B BlobReference
data Tree = [ (Name, TreeContent ]

- a tag object is a reference object, containing the author's signature.

type SignatureBlob = ByteString
data Tag = Tag ObjectReference Name SignatureBlob

- a commit is essentially a node of the DAG with associated metadata (who created this commit, at which time). A commit points to an arborescence through a tree reference, and may have parents which are tree references:

data Commit = Commit
        { tree      :: TreeReference
        , parents   :: [TreeReference]
        , author    :: (Name,Time)
        , committer :: (Name, Time)
        , message   :: ByteString
        }

The object store

All the different objects in Git - individual files, entire directory trees, commits and other things - are stored in a repository-wide central store. Each object is identified by computing a SHA-1 hash, which is a function of only the object's contents.

But... doesn't that mean that when I change a single line in a file, a whole new copy is stored?

Yes.

However, every once in a while, git compacts files that are similar together into Packfiles, by storing only their diffs. [TO BE EXPANDED?]

Garbage collection and git reflog

When objects are not reachable from any root (like a branch reference), they become dangling and are subject to garbage collection. However, garbage collection does not kick in immediately.

When making a mistake, it is often helpful to look at commit objects by date independent of whether they are reachable, in order to be able to restore them. You can use git reflog for that.


Further reading