Git
WORK IN PROGRESS
This page aims to introduce the concepts behind Git in a "Haskell way".
Contents
The DAG
TODO
Branches and tags
TODO
Objects
Kinds of objects
There are 4 kinds of objects: tag, blob, tree and commit.
- a blob contains data.
data Blob = Blob ByteString
- a tree contains name associated by reference with blobs or trees. This represent a filesystem hierarchy, with trees representing directories, and blobs representing files:
data TreeContent = T TreeReference | B BlobReference
data Tree = [ (Name, TreeContent ]
- a tag object is a signed reference with a the signature's author.
type SignatureBlob = ByteString
data Tag = Tag ObjectReference Name SignatureBlob
- a commit is essentially a node of the DAG with associated metadata (who created this commit, at which time). A commit point to a arborescence through a tree reference, and may have parents which are tree references:
data Commit = Commit
{ tree :: TreeReference
, parents :: [TreeReference]
, author :: (Name,Time)
, committer :: (Name, Time)
, message :: ByteString
}
The object store
All the different objects in Git - individual files, entire directory trees, commits and other things - are stored in a repository-wide central store. Each object is identified by computing a SHA-1 hash, which is a function of only the object's contents.
But... doesn't that mean that when I change a single line in a file, a whole new copy is stored?
Yes.
However, every once in a while, git compacts files that are similar together into Packfiles, by storing only their diffs. [TO BE EXPANDED?]
Garbage collection and git reflog
When objects are not reachable from any root (like a branch reference), they become dangling and are subject to garbage collection. However, garbage collection does not kick in immediately.
When making a mistake, it is often helpful to look at commit objects by date independent of whether they are reachable, in order to be able to restore them. You can use git reflog for that.