User:Michiexile/MATH198/Lecture 1

From HaskellWiki

This page now includes additional information based on the notes taken in class. Hopefully this will make the notes reasonably complete for everybody.


Welcome, administratrivia

I'm Mikael Vejdemo-Johansson. I can be reached in my office 383-BB, especially during my office hours; or by email to mik@math.stanford.edu.

I encourage, strongly, student interactions.

I will be out of town September 24 - 29. I will monitor forum and email closely, and recommend electronic ways of getting in touch with me during this week. I will be back again in time for the office hours on the 30th.

Introduction

Why this course?

An introduction to Haskell will usually come with pointers toward Category Theory as a useful tool, though not with much more than the mention of the subject. This course is intended to fill that gap, and provide an introduction to Category Theory that ties into Haskell and functional programming as a source of examples and applications.

What will we cover?

The definition of categories, special objects and morphisms, functors, natural transformation, (co-)limits and special cases of these, adjunctions, freeness and presentations as categorical constructs, monads and Kleisli arrows, recursion with categorical constructs.

Maybe, just maybe, if we have enough time, we'll finish with looking at the definition of a topos, and how this encodes logic internal to a category. Applications to fuzzy sets.

What do we require?

Our examples will be drawn from discrete mathematics, logic, Haskell programming and linear algebra. I expect the following concepts to be at least vaguely familiar to anyone taking this course:

  • Sets
  • Functions
  • Permutations
  • Groups
  • Partially ordered sets
  • Vector spaces
  • Linear maps
  • Matrices
  • Homomorphisms

Good references

On reserve in the mathematics/CS library are:

  • Mac Lane: Categories for the working mathematician
This book is technical, written for a mathematical audience, and puts in more work than is strictly necessary in many of the definitions. When Awodey and Mac Lane deviate, we will give Awodey priority.
  • Awodey: Category Theory
This book is also available as an ebook, accessible from the Stanford campus network. The coursework webpage has links to the ebook under Materials.

Monoids

In order to settle notation and ensure everybody's seen a definition before:

Definition A monoid is a set M equipped with a binary associative operation * (in Haskell: mappend) and an identity element (in Haskell: mempty).

A semigroup is a monoid without the requirement for an identity element.

A function f:MN is a monoid homomorphism if the following conditions hold:

  • f()=
  • f(m*m)=f(m)*f(m)

Examples

  • Any group is a monoid. Thus, specifically, the integers with addition is a monoid.
  • The natural numbers, with addition.
  • Strings L* in an alphabet L is a monoid with string concatenation forming the operation and the empty string the identity.
  • Non-empty strings form a semigroup.

For more information, see the Wikipedia page on Monoids. [[1]].

Awodey: p. 10.

Partially ordered set

Definition A partially ordered set, or a partial order, or a poset is a set P equipped with a binary relation which is:

  • Reflexive: xx for all xP
  • Anti-symmetric: xy and yx implies x=y for all x,yP.
  • Transitive: xy and yz implies xz for all x,y,zP.

If xy or yx, we call x,y comparable. Otherwise we call them incomparable. A poset where all elements are mutually comparable is called a totally ordered set or a total order.

If we drop the requirement for anti-symmetry, we get a pre-ordered set or a pre-order.

If we have several posets, we may indicate which poset we're comparing in byindicating the poset as a subscript to the relation symbol.

A monotonic map of posets is a function f:PQ such that xPy implies f(x)Qf(y).

Examples

  • The reals, natural numbers, integers all are posets with the usual comparison relation. A poset in which all elements are comparable.
  • The natural numbers, excluding 0, form a poset with ab if a|b.
  • Any family of subsets of a given set form a poset with the order given by inclusion.

For more information, see the Wikipedia page on Partially ordered set. [[2]]

Awodey: p. 6. Preorders are defined on page 8-9.

Category

Awodey has a slightly different exposition. Relevant pages in Awodey for this lecture are: sections 1.1-1.4 (except Def. 1.2), 1.6-1.8.

Graphs

We recall the definition of a (directed) graph. A graph G is a collection of edges (arrows) and vertices (nodes). Each edge is assigned a source node and a target node.

sourcetarget

Given a graph G, we denote the collection of nodes by G0 and the collection of arrows by G1. These two collections are connected, and the graph given its structure, by two functions: the source function s:G1G0 and the target function t:G1G0.

We shall not, in general, require either of the collections to be a set, but will happily accept larger collections; dealing with set-theoretical paradoxes as and when we have to. A graph where both nodes and arrows are sets shall be called small. A graph where either is a class shall be called large. 0 If both G0 and G1 are finite, the graph is called finite too.

The empty graph has G0=G1=.

A discrete graph has G1=.

A complete graph has G1={(v,w)|v,wG0}.

A simple graph has at most one arrow between each pair of nodes. Any relation on a set can be interpreted as a simple graph.

  • Show some examples.

A homomorphism f:GH of graphs is a pair of functions f0:G0H0 and f1:G1H1 such that sources map to sources and targets map to targets, or in other words:

  • s(f1(e))=f0(s(e))
  • t(f1(e))=f0(t(e))

By a path in a graph G from the node x to the node y of length k, we mean a sequence of edges (f1,f2,,fk) such that:

  • s(f1)=x
  • t(fk)=y
  • s(fi)=t(fi1) for all other i.

Paths with start and end point identical are called closed. For any node x, there is a unique closed path () starting and ending in x of length 0.

For any edge f, there is a unique path from s(f) to t(f) of length 1: (f).

We denote by Gk the set of paths in G of length k.

Categories

We now are ready to define a category. A category is a graph G equipped with an associative composition operation :G2G1, and an identity element for composition 1x for each node x of the graph.

Note that G2 can be viewed as a subset of G1×G1, the set of all pairs of arrows. It is intentional that we define the composition operator on only a subset of the set of all pairs of arrows - the composable pairs. Whenever you'd want to compose two arrows that don't line up to a path, you'll get nonsense, and so any statement about the composition operator has an implicit "whenever defined" attached to it.

The definition is not quite done yet - this composition operator, and the identity arrows both have a few rules to fulfill, and before I state these rules, there are some notation we need to cover.

Backwards!

If we have a path given by the arrows (f,g) in G2, we expect f:AB and g:BC to compose to something that goes AC. The origin of all these ideas lies in geometry and algebra, and so the abstract arrows in a category are supposed to behave like functions under function composition, even though we don't say it explicitly.

Now, we are used to writing function application as f(x) - and possibly, from Haskell, as f x. This way, the composition of two functions would read g(f(x)).

On the other hand, the way we write our paths, we'd read f then g. This juxtaposition makes one of the two ways we write things seem backwards. We can resolve it either by making our paths in the category go backwards, or by reversing how we write function application.

In the latter case, we'd write x.f, say, for the application of f to x, and then write x.f.g for the composition. It all ends up looking a lot like Reverse Polish Notation, and has its strengths, but feels unnatural to most. It does, however, have the benefit that we can write out function composition as (f,g)f.g and have everything still make sense in all notations.

In the former case, which is the most common in the field, we accept that paths as we read along the arrows and compositions look backwards, and so, if f:AB and g:BC, we write gf:AC, remembering that elements are introduced from the right, and the functions have to consume the elements in the right order.


The existence of the identity map can be captured in a function language as well: it is the existence of a function u:G0G1.

Now for the remaining rules for composition. Whenever defined, we expect associativity - so that h(gf)=(hg)f. Furthermore, we expect:

  1. Composition respects sources and targets, so that:
    • s(gf)=s(f)
    • t(gf)=t(g)
  2. s(u(x))=t(u(x))=x

In a category, arrows are also called morphisms, and nodes are also called objects. This ties in with the algebraic roots of the field.

We denote by HomC(A,B), or if C is obvious from context, just Hom(A,B), the set of all arrows from A to B. This is the hom-set or set of morphisms, and may also be denoted C(A,B).

If a category is large or small or finite as a graph, it is called a large/small/finite category.

A category with objects a collection of sets and morphisms a selection from all possible set-valued functions such that the identity morphism for each object is a morphism, and composition in the category is just composition of functions is called concrete. Concrete categories form a very rich source of examples, though far from all categories are concrete.

Again, the Wikipedia page on Category (mathematics) [[3]] is a good starting point for many things we will be looking at throughout this course.

New Categories from old

As with most other algebraic objects, one essential part of our tool box is to take known objects and form new examples from them. This allows us generate a wealth of examples from the ones that shape our intuition.

Typical things to do here would be to talk about subobjects, products and coproducts, sometimes obvious variations on the structure, and what a typical object looks like. Remember from linear algebra how subspaces, cartesian products (which for finite-dimensional vectorspaces covers both products and coproducts) and dual spaces show up early, as well as the theorems giving dimension as a complete descriptor of a vectorspace.

We'll go through the same sequence here; with some significant small variations.

A category D is a subcategory of the category C if:

  • D0C0
  • D1C1
  • D1 contains 1X for all XD0
  • sources and targets of all the arrows in D1 are all in D0
  • the composition in D is the restriction of the composition in C.

Written this way, it does look somewhat obnoxious. It does become easier though, with the realization - studied closer in homework exercise 2 - that the really important part of a category is the collection of arrows. Thus, a subcategory is a subcollection of the collection of arrows - with identities for all objects present, and with at least all objects that the existing arrows imply.

A subcategory DC is full if D(A,B)=C(A,B) for all objects A,B of D. In other words, a full subcategory is completely determined by the selection of objects in the subcategory.

A subcategory DC is wide if the collection of objects is the same in both categories. Hence, a wide subcategory picks out a subcollection of the morphisms.

The dual of a category is to a large extent inspired by vector space duals. In the dual C* of a category C, we have the same objects, and the morphisms are given by the equality C*(A,B)=C(B,A) - every morphism from C is present, but it goes in the wrong direction. Dualizing has a tendency to add the prefix co- when it happens, so for instance coproducts are the dual notion to products. We'll return to this construction many times in the course.

Given two categories C,D, we can combine them in several ways:

  1. We can form the category that has as objects the disjoint union of all the objects of C and D, and that sets Hom(A,B)= whenever A,B come from different original categories. If A,B come from the same original category, we simply take over the homset from that category. This yields a categorical coproduct, and we denote the result by C+D. Composition is inherited from the original categories.
  2. We can also form the category with objects A,B for every pair of objects AC,BD. A morphism in Hom(A,B,A,B) is simply a pair f:AA,g:BB. Composition is defined componentwise. This category is the categorical correspondent to the cartesian product, and we denot it by C×D.

In these three constructions - the dual, the product and the coproduct - the arrows in the categories are formal constructions, not functions; even if the original category was given by functions, the result is no longer given by a function.

Given a category C and an object A of that category, we can form the slice category C/A. Objects in the slice category are arrows BA for some object B in C, and an arrow ϕ:fg is an arrow s(f)s(g) such that f=gϕ. Composites of arrows are just the composites in the base category.

Notice that the same arrow ϕ in the base category C represents potentially many different arrows in C/A: it represents one arrow for each choice of source and target compatible with it.

There is a dual notion: the coslice category AC, where the objects are paired with maps AB.

Slice categories can be used, among other things, to specify the idea of parametrization. The slice category C/A gives a sense to the idea of objects from C labeled by elements of A.

We get this characterization by interpreting the arrow representing an object as representing its source and a type function. Hence, in a way, the Typeable type class in Haskell builds a slice category on an appropriate subcategory of the category of datatypes.

Alternatively, we can phrase the importance of the arrow in a slice categories of, say, Set, by looking at preimages of the slice functions. That way, an object f:BA gives us a family of (disjoint) subsets of B indexed by the elements of A.

Finally, any graph yields a category by just filling in the arrows that are missing. The result is called the free category generated by the graph, and is a concept we will return to in some depth. Free objects have a strict categorical definition, and they serve to give a model of thought for the things they are free objects for. Thus, categories are essentially graphs, possibly with restrictions or relations imposed; and monoids are essentially strings in some alphabet, with restrictions or relations.

Examples

  • The empty category.
    • No objects, no morphisms.
  • The one object/one arrow category 1.
    • A single object and its identity arrow.
  • The categories 2 and 1+1.
    • Two objects, A,B with identity arrows and a unique arrow AB.
  • The category Set of sets.
    • Sets for objects, functions for arrows.
  • The catgeory FSet of finite sets.
    • Finite sets for objects, functions for arrows.
  • The category PFn of sets and partial functions.
    • Sets for objects. Arrows are pairs (SS,f:ST)PFn(S,T).
    • PFn(A,B) is a partially ordered set. (Sf,f)(Sg,g) precisely if SfSg and f=g|Sf.
    • The exposition at Wikipedia uses the construction here: [[4]].
  • There is an alternative way to define a category of partial functions: For objects, we take sets, and for morphisms ST, we take subsets FS×T such that each element in S occurs in at most one pair in the subset. Composition is by an interpretation of these subsets corresponding to the previous description. We'll call this category PFn.
  • Every partial order is a category. Each hom-set has at most one element.
    • Objects are the elements of the poset. Arrows are unique, with AB precisely if AB.
  • Every monoid is a category. Only one object. The elements of the monoid correspond to the endo-arrows of the one object.
  • The category of Sets and injective functions.
    • Objects are sets. Morphisms are injective functions between the sets.
  • The category of Sets and surjective functions.
    • Objects are sets. Morphisms are surjective functions between the sets.
  • The category of k-vector spaces and linear maps.
  • The category with objects the natural numbers and Hom(m,n) the set of m×n-matrices.
    • Composition is given by matrix multiplication.
  • The category of Data Types with Computable Functions.
    • Our ideal programming language has:
      • Primitive data types.
      • Constants of each primitive type.
      • Operations, given as functions between types.
      • Constructors, producing elements from data types, and producing derived data types and operations.
    • We will assume that the language is equipped with
      • A do-nothing operation for each data type. Haskell has id.
      • An empty type 1, with the property that each type has exactly one function to this type. Haskell has (). We will use this to define the constants of type t as functions 1t. Thus, constants end up being 0-ary functions.
      • A composition constructor, taking an operator f:AB and another operator g:BC and producing an operator gf:AC. Haskell has (.).
    • This allows us to model a functional programming language with a category.
  • The category with objects logical propositions and arrows proofs.
  • The category Rel has objects finite sets and morphisms AB being subsets of A×B. Composition is by (a,c)gf if there is some bB such that (a,b)f,(b,c)g. Identity morphism is the diagonal (a,a):aA.



Homework

For a passing mark, a written, acceptable solution to at least 3 of the 6 questions should be given no later than midnight before the next lecture.

For each lecture, there will be a few exercises marked with the symbol *. These will be more difficult than the other exercises given, will require significant time and independent study, and will aim to complement the course with material not covered in lectures, but nevertheless interesting for the general philosophy of the lecture course.

  1. Prove the general associative law: that for any path, and any bracketing of that path, the same composition results.
  2. Which of the following form categories? Proof and disproof for each:
    • Objects are finite sets, morphisms are functions such that |f1(b)|2 for all morphisms f, objects B and elements b.
    • Objects are finite sets, morphisms are functions such that |f1(b)|2 for all morphisms f, objects B and elements b.
    • Objects are finite sets, morphisms are functions such that |f1(b)|< for all morphisms f, objects B and elements b.
Recall that f1(b)={aA:f(a)=b}.
  1. Suppose u:AA in some category C.
    1. If gu=g for all g:AB in the category, then u=1A.
    2. If uh=h for all h:BA in the category, then u=1A.
    3. These two results characterize the objects in a category by the properties of their corresponding identity arrows completely. Specifically, there is a way to rephrase the definition of a category such that everything is stated in terms of arrows.
  2. For as many of the examples given as you can, prove that they really do form a category. Passing mark is at least 60% of the given examples.
    • Which of the categories are subcategories of which other categories? Which of these are wide? Which are full?
  3. For this question, all parts are required:
    1. For which sets is the free monoid on that set commutative.
    2. Prove that for any category C, the set Hom(A,A) is a monoid under composition for every object A.
For details on the construction of a free monoid, see the Wikipedia pages on the Free Monoid [[5]] and on the Kleene star [[6]].
  1. * Read up on ω-complete partial orders. Suppose S is some set and P is the set of partial functions SS - in other words, an element of P is some pair (S0,f:S0S) with S0S. We give this set a poset structure by (S0,f)(S1,g) precisely if S0S1 and f(s)=g(s)sS0.
    • Show that P is a strict ω-CPO.
    • An element x of S is a fixpoint of f:SS if f(x)=x. Let N be the ω-CPO of partially defined functions on the natural numbers. We define a function ϕ:NN by sending some h:NN to a function k defined by
      1. k(0)=1
      2. k(n) is defined only if h(n1) is defined, and then by k(n)=n*h(n1).
Describe ϕ(nn2) and ϕ(nn3). Show that ϕ is continuous. Find a fixpoint (S0,f) of ϕ such that any other fixpoint of the same function is larger than this one.
Find a continuous endofunction on some ω-CPO that has the fibonacci function F(0)=0,F(1)=1,F(n)=F(n1)+F(n2) as the least fixed point.
Implement a Haskell function that finds fixed points in an ω-CPO. Implement the two fixed points above as Haskell functions - using the ω-CPO fixed point approach in the implementation. It may well be worth looking at Data.Map to provide a Haskell context for a partial function for this part of the task.