Difference between revisions of "Git"

From HaskellWiki
Jump to navigation Jump to search
(gc)
(Improved language and added links)
 
(6 intermediate revisions by 3 users not shown)
Line 1: Line 1:
  +
{{Stub}}
  +
 
'''WORK IN PROGRESS'''
 
'''WORK IN PROGRESS'''
   
 
This page aims to introduce the concepts behind [http://git-scm.com/ Git] in a "Haskell way".
 
This page aims to introduce the concepts behind [http://git-scm.com/ Git] in a "Haskell way".
  +
  +
  +
== Introduction ==
  +
  +
[http://git-scm.com/ Git] is a distributed revision control system, used by many Haskellers. [[Darcs]] is also popular, but it tends to get slow when projects grow large. [https://github.com/ GitHub] is a site for Git based projects that is used, amongst others, for many open source [https://github.com/search?q=Haskell&type=&ref=simplesearch Haskell projects].
  +
   
 
== The DAG ==
 
== The DAG ==
   
  +
Each node of the [http://en.wikipedia.org/wiki/Directed_acyclic_graph DAG] is uniquely identified by a reference, and represents an immutable history point (commit).
TODO
 
  +
   
 
== Branches and tags ==
 
== Branches and tags ==
   
  +
Branches and tags point to the DAG through a reference. They provide a way to name entry points in the DAG that are meaningful to the user.
TODO
 
  +
  +
Branches contain references that usually change once work is done on the branch.
  +
  +
Tags are essentially the same as a branch, except that, by design, they name a specific reference and usually do not change.
  +
   
 
== Objects ==
 
== Objects ==
Line 15: Line 29:
 
=== Kinds of objects ===
 
=== Kinds of objects ===
   
  +
There are 4 kinds of objects: tag, blob, tree and commit.
TODO
 
  +
  +
- a blob contains data.
  +
  +
<haskell>
  +
data Blob = Blob ByteString
  +
</haskell>
  +
  +
- a tree contains name associated by reference with blobs or trees. This represents a filesystem hierarchy, with trees representing directories, and blobs representing files:
  +
  +
<haskell>
  +
data TreeContent = T TreeReference | B BlobReference
  +
data Tree = [ (Name, TreeContent ]
  +
</haskell>
  +
  +
- a tag object is a reference object, containing the author's signature.
  +
  +
<haskell>
  +
type SignatureBlob = ByteString
  +
data Tag = Tag ObjectReference Name SignatureBlob
  +
</haskell>
  +
  +
- a commit is essentially a node of the DAG with associated metadata (who created this commit, at which time). A commit points to an [http://en.wikipedia.org/wiki/Arborescence_(graph_theory) arborescence] through a tree reference, and may have parents which are tree references:
  +
  +
<haskell>
  +
data Commit = Commit
  +
{ tree :: TreeReference
  +
, parents :: [TreeReference]
  +
, author :: (Name,Time)
  +
, committer :: (Name, Time)
  +
, message :: ByteString
  +
}
  +
</haskell>
   
 
=== The object store ===
 
=== The object store ===
   
All the different objects in Git - individual files, entire directory trees, commits and other things - are stored in a repository-wide central store. Each object is identified by computing a SHA-1 hash, which is a function of only the object's contents.
+
All the different objects in Git - individual files, entire directory trees, commits and other things - are stored in a repository-wide central store. Each object is identified by computing a [http://en.wikipedia.org/wiki/SHA-1 SHA-1] hash, which is a function of only the object's contents.
  +
  +
=== But... doesn't that mean that when I change a single line in a file, a whole new copy is stored? ===
  +
  +
Yes.
  +
  +
However, every once in a while, git compacts files that are similar together into Packfiles, by storing only their diffs. [TO BE EXPANDED?]
   
 
=== Garbage collection and <tt>git reflog</tt> ===
 
=== Garbage collection and <tt>git reflog</tt> ===
Line 26: Line 78:
   
 
When making a mistake, it is often helpful to look at commit objects by date independent of whether they are reachable, in order to be able to restore them. You can use <tt>git reflog</tt> for that.
 
When making a mistake, it is often helpful to look at commit objects by date independent of whether they are reachable, in order to be able to restore them. You can use <tt>git reflog</tt> for that.
  +
  +
  +
== Further reading ==
  +
  +
* [https://sandofsky.com/blog/git-workflow.html Understanding the Git Workflow]
  +
  +
* [http://en.wikipedia.org/wiki/Git_(software) The Wikipedia article on Git]
  +
  +
  +
[[Category:Tools]]

Latest revision as of 06:26, 3 April 2013

This article is a stub. You can help by expanding it.

WORK IN PROGRESS

This page aims to introduce the concepts behind Git in a "Haskell way".


Introduction

Git is a distributed revision control system, used by many Haskellers. Darcs is also popular, but it tends to get slow when projects grow large. GitHub is a site for Git based projects that is used, amongst others, for many open source Haskell projects.


The DAG

Each node of the DAG is uniquely identified by a reference, and represents an immutable history point (commit).


Branches and tags

Branches and tags point to the DAG through a reference. They provide a way to name entry points in the DAG that are meaningful to the user.

Branches contain references that usually change once work is done on the branch.

Tags are essentially the same as a branch, except that, by design, they name a specific reference and usually do not change.


Objects

Kinds of objects

There are 4 kinds of objects: tag, blob, tree and commit.

- a blob contains data.

data Blob = Blob ByteString

- a tree contains name associated by reference with blobs or trees. This represents a filesystem hierarchy, with trees representing directories, and blobs representing files:

data TreeContent = T TreeReference | B BlobReference
data Tree = [ (Name, TreeContent ]

- a tag object is a reference object, containing the author's signature.

type SignatureBlob = ByteString
data Tag = Tag ObjectReference Name SignatureBlob

- a commit is essentially a node of the DAG with associated metadata (who created this commit, at which time). A commit points to an arborescence through a tree reference, and may have parents which are tree references:

data Commit = Commit
        { tree      :: TreeReference
        , parents   :: [TreeReference]
        , author    :: (Name,Time)
        , committer :: (Name, Time)
        , message   :: ByteString
        }

The object store

All the different objects in Git - individual files, entire directory trees, commits and other things - are stored in a repository-wide central store. Each object is identified by computing a SHA-1 hash, which is a function of only the object's contents.

But... doesn't that mean that when I change a single line in a file, a whole new copy is stored?

Yes.

However, every once in a while, git compacts files that are similar together into Packfiles, by storing only their diffs. [TO BE EXPANDED?]

Garbage collection and git reflog

When objects are not reachable from any root (like a branch reference), they become dangling and are subject to garbage collection. However, garbage collection does not kick in immediately.

When making a mistake, it is often helpful to look at commit objects by date independent of whether they are reachable, in order to be able to restore them. You can use git reflog for that.


Further reading