HaskellWiki - User contributions [en]

MapReduce with CloudHaskell

2011-11-01T16:31:33Z

Julianporter: /* Storage */

[[Category:Applications]][[Category:Monad]][[Category:Libraries]][[Category:Concurrency]][[Category:Parallel]][[Category:Research]]

==Introduction==

This is documentation of my work on developing a proof-of-concept demonstrator for MapReduce using [[GHC/CloudAndHPCHaskell|CloudHaskell]] to provide a framewok for distributed applications, and the monadic approach to MapReduce described [[MapReduce_as_a_monad|here]].

==Status==

===Storage===

I have developed a very simple distributed storage service that provides what is needed to ship data to each processing node at the start of each round of processing, and then to assemble their outputs ready to form the input for the next round.

* Description of what I have done [http://jpembeddedsolutions.files.wordpress.com/2011/10/storage.pdf here]
* A working application can be found at [http://github.com/Julianporter/Distributed-Haskell git://github.com/Julianporter/Distributed-Haskell.git]

Code to be placed on Hackage when more robust.

===Distributed job scheduler===

Not yet started.

[[User:Julianporter|julianporter]] 18:20, 31 October 2011 (UTC)

MapReduce with CloudHaskell

2011-10-31T23:00:29Z

Julianporter: /* Dstributed job scheduler */

[[Category:Applications]][[Category:Monad]][[Category:Libraries]][[Category:Concurrency]][[Category:Parallel]][[Category:Research]]

==Introduction==

This is documentation of my work on developing a proof-of-concept demonstrator for MapReduce using [[GHC/CloudAndHPCHaskell|CloudHaskell]] to provide a framewok for distributed applications, and the monadic approach to MapReduce described [[MapReduce_as_a_monad|here]].

==Status==

===Storage===

I have developed a very simple distributed storage service that provides what is needed to ship data to each processing node at the start of each round of processing, and then to assemble their outputs ready to form the input for the next round.

* Description of what I have done [http://jpembeddedsolutions.files.wordpress.com/2011/10/storage.pdf here]
* A working application can be found at git://github.com/Julianporter/Distributed-Haskell.git

Code to be placed on Hackage when more robust.

===Distributed job scheduler===

Not yet started.

[[User:Julianporter|julianporter]] 18:20, 31 October 2011 (UTC)

MapReduce as a monad

2011-10-31T18:22:05Z

Julianporter: /* The monad transformer approach */

[[Category:Applications]][[Category:Monad]][[Category:Libraries]][[Category:Concurrency]][[Category:Parallel]][[Category:Research]]

==Introduction==

MapReduce is a general technique for massively parallel programming developed by Google. It takes its inspiration from ideas in functional programming, but has moved away from that paradigm to a more imperative approach.
I have noticed that MapReduce can be expressed naturally, using functional programming techniques, as a form of monad. The standard implementation of MapReduce is the JAVA-based HADOOP framework, which is very complex and somewhat temperamental. Moreover, it is necessary to write HADOOP-specific code into mappers and reducers. My prototype library takes about 100 lines of code and can wrap generic mapper / reducer functions.

Having shown that we can implement MapReduce as a generalised monad, it transpires that in fact, we can generalise this still further and define a <hask>MapReduceT</hask> monad transformer, so there is a MapReduce type and operation associated to any monad. In particular, it turns out that the <hask>State</hask> monad is just the MapReduce type of the monad <hask>Hom a</hask> of maps <hask>h -> a</hask> where <hask>h</hask> is some fixed type.

==Initial Approach==

===Why a monad?===

What the monadic implementation lets us do is the following:
*Map and reduce look the same.
*You can write a simple wrapper function that takes a mapper / reducer and wraps it in the monad, so authors of mappers / reducers do not need to know anything about the MapReduce framework: they can concentrate on their algorithms.
*All of the guts of MapReduce are hidden in the monad's <hask>bind</hask> function
*The implementation is naturally parallel
*Making a MapReduce program is trivial: 
<hask>
... >>= wrapMR mapper >>= wrapMR reducer >>= ...
</hask> 

===Details===
Full details of the implementation and sample code can be found [http://jpembeddedsolutions.wordpress.com/2011/04/02/mapreduce/ here]. I'll just give highlights here.

====Generalised mappers / reducers====
One can generalise MapReduce a bit, so that each stage (map, reduce, etc) becomes a function of signature 
<hask>
a -> ([(s,a)] -> [(s',b)])
</hask> 
where <hask>s</hask> and <hask>s'</hask> are data types and <hask>a</hask> and <hask>b</hask> are key values.

====Generalised Monad====
Now, this is suggestive of a monad, but we can't use a monad ''per se'', because the transformation changes the key and value types, and we want to be able to access them separately. Therefore we do the following.

Let <hask>m</hask> be a <hask>Monad'</hask>, a type with four parameters: <hask>m s a s' b</hask>.

Generalise the monadic <hask>bind</hask> operation to: 
<hask>
m s a s' b -> ( b -> m s' b s'' c ) -> m s a s'' c
</hask> 

See [http://blog.sigfpe.com/2009/02/beyond-monads.html Parametrized monads].

Then clearly the generalised mapper/reducer above can be written as a <hask>Monad'</hask>, meaning that we can write MapReduce as 
<hask>
... >>= mapper >>= reducer >>= mapper' >>= reducer' >>= ...
</hask>

====Implementation details====

<hask>
class Monad' m where
return :: a -> m s x s a
(>>=) :: (Eq b) => m s a s' b -> ( b -> m s' b s'' c ) -> m s a s'' c

newtype MapReduce s a s' b = MR { runMR :: ([(s,a)] -> [(s',b)]) }

retMR :: a -> MapReduce s x s a
retMR k = MR (\ss -> [(s,k) | s <- fst <$> ss])

bindMR :: (Eq b,NFData s'',NFData c) => MapReduce s a s' b -> (b -> MapReduce s' b s'' c) -> MapReduce s a s'' c
bindMR f g = MR (\s ->
let
fs = runMR f s
gs = P.map g $ nub $ snd <$> fs
in
concat $ map (\g' -> runMR g' fs) gs)
</hask> 
The key point here is that <hask>P.map</hask> is a parallel version of the simple <hask>map</hask> function.

Now we can write a wrapper function 
<hask>
wrapMR :: (Eq a) => ([s] -> [(s',b)]) -> (a -> MapReduce s a s' b)
wrapMR f = (\k -> MR (g k))
where
g k ss = f $ fst <$> filter (\s -> k == snd s) ss
</hask> 
which takes a conventional mapper / reducer and wraps it in the <hask>Monad'</hask>. Note that this means that the mapper / reducer functions ''do not need to know anything about the way MapReduce is implemented''. So a standard MapReduce job becomes 
<hask>
mapReduce :: [String] -> [(String,Int)]
mapReduce state = runMapReduce mr state
where
mr = return () >>= wrapMR mapper >>= wrapMR reducer
</hask> 
I have tested the implementation with the standard word-counter mapper and reducer, and it works perfectly (full code is available via the link above).

==The monad transformer approach==

Define the monad transformer type <hask>MapReduceT</hask> by: 

<hask>
newtype (Monad m) => MapReduceT m t u = MR {run :: m t -> m u}
</hask>

with operations 

<hask>
lift :: (Monad m) => m t -> MapReduceT m t t
lift x = MR (const x)

return :: (Monad m) => t -> MapReduceT m t t
return x = lift (return x)

bind :: (Monad m) => MapReduceT m u u -> MapReduceT m t u -> (u -> MapReduceT m u v) -> MapReduceT m t v
bind p f g = MR (\ xs -> ps xs >>= gs xs)
where
ps xs = (f >>> p) -< xs
gs xs x = (f >>> g x) -< xs
</hask>

where <hask> >>> </hask> and <hask> -< </hask> are the obvious arrow operations on <hask>MapeduceT</hask> types.

Then we show in [http://media.jpembeddedsolutions.com/pdf/mrmonad.pdf this paper] that:
* <hask>MapReduce = MapReduceT []</hask> with <hask> (>>=) = bind nub</hask>
* For a suitable choice of <hask>p</hask> the standard <hask>State</hask> monad is <hask>MapReduceT Hom</hask> where

:<hask>
data Hom a b = H {run :: (a -> b)}

return x = H (const x)
f >>= g = H (\ x -> g' (f' x) x)
where
f' = run f
g' x y = run (g x) y
</hask>

==Future Directions==

*My code so far runs concurrently and in multiple threads within a single OS image. It won't work on clustered systems. I have started work in this, see [[MapReduce_with_CloudHaskell|here]].
*Currently all of the data is sent to all of the mappers / reducers at each iteration. This is okay on a single machine, but may be prohibitive on a cluster.

I would be eager for collaborative working on taking this forward.

[[User:Julianporter|julianporter]] 18:10, 31 October 2011 (UTC)

MapReduce with CloudHaskell

2011-10-31T18:20:37Z

Julianporter: New page: Category:ApplicationsCategory:MonadCategory:LibrariesCategory:ConcurrencyCategory:ParallelCategory:Research ==Introduction== This is documentation of my work on d...

[[Category:Applications]][[Category:Monad]][[Category:Libraries]][[Category:Concurrency]][[Category:Parallel]][[Category:Research]]

==Introduction==

This is documentation of my work on developing a proof-of-concept demonstrator for MapReduce using [[GHC/CloudAndHPCHaskell|CloudHaskell]] to provide a framewok for distributed applications, and the monadic approach to MapReduce described [[MapReduce_as_a_monad|here]].

==Status==

===Storage===

I have developed a very simple distributed storage service that provides what is needed to ship data to each processing node at the start of each round of processing, and then to assemble their outputs ready to form the input for the next round.

* Description of what I have done [http://jpembeddedsolutions.files.wordpress.com/2011/10/storage.pdf here]
* A working application can be found at git://github.com/Julianporter/Distributed-Haskell.git

Code to be placed on Hackage when more robust.

===Dstributed job scheduler===

Not yet started.

[[User:Julianporter|julianporter]] 18:20, 31 October 2011 (UTC)

User:Julianporter

2011-10-31T18:13:54Z

Julianporter: /* MapReduce */

=About my work=
==Background==
My particular areas of interest in programming are:
*Functional programming
*Formal modelling / model based programming
*Concurrency / cloud programming
*Embedded systems
I am also establishing a small business developing control systems and software for robots. The key idea is to make the robot part of the cloud rather than a stand-alone device. Further information:
*[http://www.jpembedded.co.uk Company website]
*[http://jpembeddedsolutions.wordpress.com Blog] (also contains research papers on Haskell and functional programming)

==Current projects==

===MapReduce===

I am looking at ways of implementing MapReduce-type algorithms using the functional approach. The key insight is that a generalised MapReduce algorithm is simply the repeated application of a sequence of <hask> >>= </hask> operations in a suitable monad. There are two strands of activity:

*Development of a [[MapReduce_as_a_monad|monadic view of MapReduce]]
*Developing a [[MapReduce_with_CloudHaskell|proof-of-concept demonstrator for monadic MapReduce]], using [[GHC/CloudAndHPCHaskell|CloudHaskell]] as a framework for distributed Haskell applications.

The second activity is undertaken with the support of the authors of CloudHaskell. I would be very happy if others joined in the development effort.

===Catskell===

I'm defining and then coding a language ([[Catskell]]) in the spirit of [http://lolcode.com/ LOLCODE] which is basically a feline-friendly subset of Haskell. My intention is to write a Catskell-to-Haskell translator. This should be a good exercise in making sure I really understand the language.

=About me=

By training I am a mathematician. I have been programming computers of some form or other since the early 1980s. I also have a keen interest in philosophy and music. My personal website is [http://www.porternet.org here].

User:Julianporter

2011-10-31T18:12:43Z

Julianporter: /* MapReduce */

=About my work=
==Background==
My particular areas of interest in programming are:
*Functional programming
*Formal modelling / model based programming
*Concurrency / cloud programming
*Embedded systems
I am also establishing a small business developing control systems and software for robots. The key idea is to make the robot part of the cloud rather than a stand-alone device. Further information:
*[http://www.jpembedded.co.uk Company website]
*[http://jpembeddedsolutions.wordpress.com Blog] (also contains research papers on Haskell and functional programming)

==Current projects==

===MapReduce===

I am looking at ways of implementing MapReduce-type algorithms using the functional approach. The key insight is that a generalised MapReduce algorithm is simply the repeated application of a sequence of <hask> >>= </hask> operations in a suitable monad. There are two strands of activity:

*Development of a [[MapReduce_as_a_monad|monadic view of MapReduce]]
*Developing a [[MapReduce_with_CloudHaskell|proof-of-concept demonstrator for monadic MapReduce]], using [[CloudAndHPCHaskell|CloudHaskell]] as a framework for distributed Haskell applications.

The second activity is undertaken with the support of the authors of CloudHaskell. I would be very happy if others joined in the development effort.

===Catskell===

I'm defining and then coding a language ([[Catskell]]) in the spirit of [http://lolcode.com/ LOLCODE] which is basically a feline-friendly subset of Haskell. My intention is to write a Catskell-to-Haskell translator. This should be a good exercise in making sure I really understand the language.

=About me=

By training I am a mathematician. I have been programming computers of some form or other since the early 1980s. I also have a keen interest in philosophy and music. My personal website is [http://www.porternet.org here].

MapReduce as a monad

2011-10-31T18:10:17Z

Julianporter: New material about monad transformer

[[Category:Applications]][[Category:Monad]][[Category:Libraries]][[Category:Concurrency]][[Category:Parallel]][[Category:Research]]

==Introduction==

MapReduce is a general technique for massively parallel programming developed by Google. It takes its inspiration from ideas in functional programming, but has moved away from that paradigm to a more imperative approach.
I have noticed that MapReduce can be expressed naturally, using functional programming techniques, as a form of monad. The standard implementation of MapReduce is the JAVA-based HADOOP framework, which is very complex and somewhat temperamental. Moreover, it is necessary to write HADOOP-specific code into mappers and reducers. My prototype library takes about 100 lines of code and can wrap generic mapper / reducer functions.

Having shown that we can implement MapReduce as a generalised monad, it transpires that in fact, we can generalise this still further and define a <hask>MapReduceT</hask> monad transformer, so there is a MapReduce type and operation associated to any monad. In particular, it turns out that the <hask>State</hask> monad is just the MapReduce type of the monad <hask>Hom a</hask> of maps <hask>h -> a</hask> where <hask>h</hask> is some fixed type.

==Initial Approach==

===Why a monad?===

What the monadic implementation lets us do is the following:
*Map and reduce look the same.
*You can write a simple wrapper function that takes a mapper / reducer and wraps it in the monad, so authors of mappers / reducers do not need to know anything about the MapReduce framework: they can concentrate on their algorithms.
*All of the guts of MapReduce are hidden in the monad's <hask>bind</hask> function
*The implementation is naturally parallel
*Making a MapReduce program is trivial: 
<hask>
... >>= wrapMR mapper >>= wrapMR reducer >>= ...
</hask> 

===Details===
Full details of the implementation and sample code can be found [http://jpembeddedsolutions.wordpress.com/2011/04/02/mapreduce/ here]. I'll just give highlights here.

====Generalised mappers / reducers====
One can generalise MapReduce a bit, so that each stage (map, reduce, etc) becomes a function of signature 
<hask>
a -> ([(s,a)] -> [(s',b)])
</hask> 
where <hask>s</hask> and <hask>s'</hask> are data types and <hask>a</hask> and <hask>b</hask> are key values.

====Generalised Monad====
Now, this is suggestive of a monad, but we can't use a monad ''per se'', because the transformation changes the key and value types, and we want to be able to access them separately. Therefore we do the following.

Let <hask>m</hask> be a <hask>Monad'</hask>, a type with four parameters: <hask>m s a s' b</hask>.

Generalise the monadic <hask>bind</hask> operation to: 
<hask>
m s a s' b -> ( b -> m s' b s'' c ) -> m s a s'' c
</hask> 

See [http://blog.sigfpe.com/2009/02/beyond-monads.html Parametrized monads].

Then clearly the generalised mapper/reducer above can be written as a <hask>Monad'</hask>, meaning that we can write MapReduce as 
<hask>
... >>= mapper >>= reducer >>= mapper' >>= reducer' >>= ...
</hask>

====Implementation details====

<hask>
class Monad' m where
return :: a -> m s x s a
(>>=) :: (Eq b) => m s a s' b -> ( b -> m s' b s'' c ) -> m s a s'' c

newtype MapReduce s a s' b = MR { runMR :: ([(s,a)] -> [(s',b)]) }

retMR :: a -> MapReduce s x s a
retMR k = MR (\ss -> [(s,k) | s <- fst <$> ss])

bindMR :: (Eq b,NFData s'',NFData c) => MapReduce s a s' b -> (b -> MapReduce s' b s'' c) -> MapReduce s a s'' c
bindMR f g = MR (\s ->
let
fs = runMR f s
gs = P.map g $ nub $ snd <$> fs
in
concat $ map (\g' -> runMR g' fs) gs)
</hask> 
The key point here is that <hask>P.map</hask> is a parallel version of the simple <hask>map</hask> function.

Now we can write a wrapper function 
<hask>
wrapMR :: (Eq a) => ([s] -> [(s',b)]) -> (a -> MapReduce s a s' b)
wrapMR f = (\k -> MR (g k))
where
g k ss = f $ fst <$> filter (\s -> k == snd s) ss
</hask> 
which takes a conventional mapper / reducer and wraps it in the <hask>Monad'</hask>. Note that this means that the mapper / reducer functions ''do not need to know anything about the way MapReduce is implemented''. So a standard MapReduce job becomes 
<hask>
mapReduce :: [String] -> [(String,Int)]
mapReduce state = runMapReduce mr state
where
mr = return () >>= wrapMR mapper >>= wrapMR reducer
</hask> 
I have tested the implementation with the standard word-counter mapper and reducer, and it works perfectly (full code is available via the link above).

==The monad transformer approach==

Define the monad transformer type <hask>MapReduceT</hask> by: 

<hask>
newtype (Monad m) => MapReduceT m t u = MR {run :: m t -> m u}
</hask>

with operations 

<hask>
lift :: (Monad m) => m t -> MapReduceT m t t
lift x = MR (const x)

return :: (Monad m) => t -> MapReduceT m t t
return x = lift (return x)

bind :: (Monad m) => MapReduceT m u u -> MapReduceT m t u -> (u -> MapReduceT m u v) -> MapReduceT m t v
bind p f g = MR (\ xs -> ps xs >>= gs xs)
where
ps xs = (f >>> p) -< xs
gs xs x = (f >>> g x) -< xs
</hask>

where <hask> >>> </hask> and <hask> -< </hask> are the obvious arrow operations on <hask>MapeduceT</hask> types.

Then we show in [http://media.jpembeddedsolutions.com/pdf/mrmonad.pdf this paper] that:
* <hask>MapReduce == MapReduceT []</hask> with <hask> >>= = bind nub</hask>
* For a suitable choice of <hask>p</hask> the standard <hask>State</hask> monad is <hask>MapReduceT Hom</hask> where

:<hask>
data Hom a b = H {run :: (a -> b)}

return x = H (const x)
f >>= g = H (\ x -> g' (f' x) x)
where
f' = run f
g' x y = run (g x) y
</hask>

==Future Directions==

*My code so far runs concurrently and in multiple threads within a single OS image. It won't work on clustered systems. I have started work in this, see [[MapReduce_with_CloudHaskell|here]].
*Currently all of the data is sent to all of the mappers / reducers at each iteration. This is okay on a single machine, but may be prohibitive on a cluster.

I would be eager for collaborative working on taking this forward.

[[User:Julianporter|julianporter]] 18:10, 31 October 2011 (UTC)

MapReduce as a monad

2011-10-31T18:06:08Z

Julianporter: New material about monad transformer

User:Julianporter

2011-10-31T17:47:22Z

Julianporter: /* Current projects */

=About my work=
==Background==
My particular areas of interest in programming are:
*Functional programming
*Formal modelling / model based programming
*Concurrency / cloud programming
*Embedded systems
I am also establishing a small business developing control systems and software for robots. The key idea is to make the robot part of the cloud rather than a stand-alone device. Further information:
*[http://www.jpembedded.co.uk Company website]
*[http://jpembeddedsolutions.wordpress.com Blog] (also contains research papers on Haskell and functional programming)

==Current projects==

===MapReduce===

I am looking at ways of implementing MapReduce-type algorithms using the functional approach. The key insight is that a generalised MapReduce algorithm is simply the repeated application of a sequence of <hask> >>= </hask> operations in a suitable monad. There are two strands of activity:

*Development of a [[MapReduce_as_a_monad|monadic view of MapReduce]]
*Developing a [[MapReduce_with_CloudHaskell|proof-of-concept demonstrator for monadic MapReduce]], using [[CloudHaskell]] as a framework for distributed Haskell applications.

The second activity is undertaken with the support of the authors of CloudHaskell. I would be very happy if others joined in the development effort.

===Catskell===

I'm defining and then coding a language ([[Catskell]]) in the spirit of [http://lolcode.com/ LOLCODE] which is basically a feline-friendly subset of Haskell. My intention is to write a Catskell-to-Haskell translator. This should be a good exercise in making sure I really understand the language.

=About me=

By training I am a mathematician. I have been programming computers of some form or other since the early 1980s. I also have a keen interest in philosophy and music. My personal website is [http://www.porternet.org here].

Catskell

2011-04-03T09:21:40Z

Julianporter: The CATSKELL manifesto

[[Category:Humor]]
==O HAI!==
For too long, the feline programming community has been restricted to imperative languages. Kittehs have been mewing, nay yowling, for a chance to use the paradigm of the future: functional programming.

And now their time has come. For I am pleased to announce the foundation of the Catskell project. My intention is, in the spirit of [http://lolcode.com LOLCODE] to create a feline friendly language which provides the core functionality of a pure functional language. Which might be suspiciously like Haskell if you happen to look at it closely enough . . . And when it is done, I will have proved that I am not just a any old nerd, but a nerd who is a friend to kittehs everywhere.

==O RLY?==
And no, this is no joke. In technical terms, what I intend to do is to:

*Define a language which corresponds with core Haskell and its basic types (atomic types, algebraic types, list), monads (including the IO monad) and basic library functions (folds, map, filter, list manipulation)
*Write a translator, which will convert Catskell code into Haskell, allowing it to be compiled and executed
*(If I’m feeling really brave) An interpreter like GHCI

==SRSLY==

So there. Work has already begun. If you want to take part, get in touch, and we will see what we can do.

==KTHXBAI==

MapReduce as a monad

2011-04-03T09:18:24Z

Julianporter: /* Implementation details */

[[Category:Applications]][[Category:Monad]][[Category:Libraries]][[Category:Concurrency]][[Category:Parallel]][[Category:Research]]

==Introduction==

MapReduce is a general technique for massively parallel programming developed by Google. It takes its inspiration from ideas in functional programming, but has moved away from that paradigm to a more imperative approach.
I have noticed that MapReduce can be expressed naturally, using functional programming techniques, as a form of monad.

The standard implementation of MapReduce is the JAVA-based HADOOP framework, which is very complex and somewhat temperamental. Moreover, it is necessary to write HADOOP-specific code into mappers and reducers. My prototype library takes about 100 lines of code and can wrap generic mapper / reducer functions.

==Why a monad?==

What the monadic implementation lets us do is the following:
*Map and reduce look the same.
*You can write a simple wrapper function that takes a mapper / reducer and wraps it in the monad, so authors of mappers / reducers do not need to know anything about the MapReduce framework: they can concentrate on their algorithms.
*All of the guts of MapReduce are hidden in the monad's <hask>bind</hask> function
*The implementation is naturally parallel
*Making a MapReduce program is trivial: 
<hask>
... >>= wrapMR mapper >>= wrapMR reducer >>= ...
</hask> 

==Details==
Full details of the implementation and sample code can be found [http://jpembeddedsolutions.wordpress.com/2011/04/02/mapreduce/ here]. I'll just give highlights here.

===Generalised mappers / reducers===
One can generalise MapReduce a bit, so that each stage (map, reduce, etc) becomes a function of signature 
<hask>
a -> ([(s,a)] -> [(s',b)])
</hask> 
where <hask>s</hask> and <hask>s'</hask> are data types and <hask>a</hask> and <hask>b</hask> are key values.

===Generalised Monad===
Now, this is suggestive of a monad, but we can't use a monad ''per se'', because the transformation changes the key and value types, and we want to be able to access them separately. Therefore we do the following.

Let <hask>m</hask> be a <hask>Monad'</hask>, a type with four parameters: <hask>m s a s' b</hask>.

Generalise the monadic <hask>bind</hask> operation to: 
<hask>
m s a s' b -> ( b -> m s' b s'' c ) -> m s a s'' c
</hask> 
Then clearly the generalised mapper/reducer above can be written as a <hask>Monad'</hask>, meaning that we can write MapReduce as 
<hask>
... >>= mapper >>= reducer >>= mapper' >>= reducer' >>= ...
</hask>

===Implementation details===

<hask>
class Monad' m where
return :: a -> m s x s a
(>>=) :: (Eq b) => m s a s' b -> ( b -> m s' b s'' c ) -> m s a s'' c

newtype MapReduce s a s' b = MR { runMR :: ([(s,a)] -> [(s',b)]) }

retMR :: a -> MapReduce s x s a
retMR k = MR (\ss -> [(s,k) | s <- fst <$> ss])

bindMR :: (Eq b,NFData s'',NFData c) => MapReduce s a s' b -> (b -> MapReduce s' b s'' c) -> MapReduce s a s'' c
bindMR f g = MR (\s ->
let
fs = runMR f s
gs = P.map g $ nub $ snd <$> fs
in
concat $ map (\g' -> runMR g' fs) gs)
</hask> 
The key point here is that <hask>P.map</hask> is a parallel version of the simple <hask>map</hask> function.

Now we can write a wrapper function 
<hask>
wrapMR :: (Eq a) => ([s] -> [(s',b)]) -> (a -> MapReduce s a s' b)
wrapMR f = (\k -> MR (g k))
where
g k ss = f $ fst <$> filter (\s -> k == snd s) ss
</hask> 
which takes a conventional mapper / reducer and wraps it in the <hask>Monad'</hask>. Note that this means that the mapper / reducer functions ''do not need to know anything about the way MapReduce is implemented''. So a standard MapReduce job becomes 
<hask>
mapReduce :: [String] -> [(String,Int)]
mapReduce state = runMapReduce mr state
where
mr = return () >>= wrapMR mapper >>= wrapMR reducer
</hask> 
I have tested the implementation with the standard word-counter mapper and reducer, and it works perfectly (full code is available via the link above).

==Future Directions==

*My code so far runs concurrently and in multiple threads within a single OS image. It won't work on clustered systems. This is clearly where work should go next.
*Currently all of the data is sent to all of the mappers / reducers at each iteration. This is okay on a single machine, but may be prohibitive on a cluster.

I would be eager for collaborative working on taking this forward.

[[User:Julianporter|julianporter]] 18:32, 2 April 2011 (UTC)

MapReduce as a monad

2011-04-03T09:18:02Z

Julianporter: /* Generalised Monad */

[[Category:Applications]][[Category:Monad]][[Category:Libraries]][[Category:Concurrency]][[Category:Parallel]][[Category:Research]]

==Introduction==

MapReduce is a general technique for massively parallel programming developed by Google. It takes its inspiration from ideas in functional programming, but has moved away from that paradigm to a more imperative approach.
I have noticed that MapReduce can be expressed naturally, using functional programming techniques, as a form of monad.

The standard implementation of MapReduce is the JAVA-based HADOOP framework, which is very complex and somewhat temperamental. Moreover, it is necessary to write HADOOP-specific code into mappers and reducers. My prototype library takes about 100 lines of code and can wrap generic mapper / reducer functions.

==Why a monad?==

What the monadic implementation lets us do is the following:
*Map and reduce look the same.
*You can write a simple wrapper function that takes a mapper / reducer and wraps it in the monad, so authors of mappers / reducers do not need to know anything about the MapReduce framework: they can concentrate on their algorithms.
*All of the guts of MapReduce are hidden in the monad's <hask>bind</hask> function
*The implementation is naturally parallel
*Making a MapReduce program is trivial: 
<hask>
... >>= wrapMR mapper >>= wrapMR reducer >>= ...
</hask> 

==Details==
Full details of the implementation and sample code can be found [http://jpembeddedsolutions.wordpress.com/2011/04/02/mapreduce/ here]. I'll just give highlights here.

===Generalised mappers / reducers===
One can generalise MapReduce a bit, so that each stage (map, reduce, etc) becomes a function of signature 
<hask>
a -> ([(s,a)] -> [(s',b)])
</hask> 
where <hask>s</hask> and <hask>s'</hask> are data types and <hask>a</hask> and <hask>b</hask> are key values.

===Generalised Monad===
Now, this is suggestive of a monad, but we can't use a monad ''per se'', because the transformation changes the key and value types, and we want to be able to access them separately. Therefore we do the following.

Let <hask>m</hask> be a <hask>Monad'</hask>, a type with four parameters: <hask>m s a s' b</hask>.

Generalise the monadic <hask>bind</hask> operation to: 
<hask>
m s a s' b -> ( b -> m s' b s'' c ) -> m s a s'' c
</hask> 
Then clearly the generalised mapper/reducer above can be written as a <hask>Monad'</hask>, meaning that we can write MapReduce as 
<hask>
... >>= mapper >>= reducer >>= mapper' >>= reducer' >>= ...
</hask>

===Implementation details===

<hask>
class Monad' m where
return :: a -> m s x s a
(>>=) :: (Eq b) => m s a s' b -> ( b -> m s' b s'' c ) -> m s a s'' c

newtype MapReduce s a s' b = MR { runMR :: ([(s,a)] -> [(s',b)]) }

retMR :: a -> MapReduce s x s a
retMR k = MR (\ss -> [(s,k) | s <- fst <$> ss])

bindMR :: (Eq b,NFData s'',NFData c) => MapReduce s a s' b -> (b -> MapReduce s' b s'' c) -> MapReduce s a s'' c
bindMR f g = MR (\s ->
let
fs = runMR f s
gs = P.map g $ nub $ snd <$> fs
in
concat $ map (\g' -> runMR g' fs) gs)
</hask> 
The key point here is that <hask>P.map</hask> is a parallel version of the simple <hask>map</hask> function.

Now we can write a wrapper function 
<hask>
wrapMR :: (Eq a) => ([s] -> [(s',b)]) -> (a -> MapReduce s a s' b)
wrapMR f = (\k -> MR (g k))
where
g k ss = f $ fst <$> filter (\s -> k == snd s) ss
</hask> 
which takes a conventional mapper / reducer and wraps it in the <hask>Monad'</hask>. Note that this means that the mapper / reducer functions ''do not need to know anything about the way MapReduce is implemented''. So a standard MapReduce job becomes 
<hask>
mapReduce :: [String] -> [(String,Int)]
mapReduce state = runMapReduce mr state
where
mr = return () >>= wrapMR mapper >>= wrapMR reducer
</hask>
I have tested the implementation with the standard word-counter mapper and reducer, and it works perfectly (full code is available via the link above).

==Future Directions==

*My code so far runs concurrently and in multiple threads within a single OS image. It won't work on clustered systems. This is clearly where work should go next.
*Currently all of the data is sent to all of the mappers / reducers at each iteration. This is okay on a single machine, but may be prohibitive on a cluster.

I would be eager for collaborative working on taking this forward.

[[User:Julianporter|julianporter]] 18:32, 2 April 2011 (UTC)

MapReduce as a monad

2011-04-03T09:17:39Z

Julianporter: /* Why a monad? */

[[Category:Applications]][[Category:Monad]][[Category:Libraries]][[Category:Concurrency]][[Category:Parallel]][[Category:Research]]

==Introduction==

MapReduce is a general technique for massively parallel programming developed by Google. It takes its inspiration from ideas in functional programming, but has moved away from that paradigm to a more imperative approach.
I have noticed that MapReduce can be expressed naturally, using functional programming techniques, as a form of monad.

The standard implementation of MapReduce is the JAVA-based HADOOP framework, which is very complex and somewhat temperamental. Moreover, it is necessary to write HADOOP-specific code into mappers and reducers. My prototype library takes about 100 lines of code and can wrap generic mapper / reducer functions.

==Why a monad?==

What the monadic implementation lets us do is the following:
*Map and reduce look the same.
*You can write a simple wrapper function that takes a mapper / reducer and wraps it in the monad, so authors of mappers / reducers do not need to know anything about the MapReduce framework: they can concentrate on their algorithms.
*All of the guts of MapReduce are hidden in the monad's <hask>bind</hask> function
*The implementation is naturally parallel
*Making a MapReduce program is trivial: 
<hask>
... >>= wrapMR mapper >>= wrapMR reducer >>= ...
</hask> 

==Details==
Full details of the implementation and sample code can be found [http://jpembeddedsolutions.wordpress.com/2011/04/02/mapreduce/ here]. I'll just give highlights here.

===Generalised mappers / reducers===
One can generalise MapReduce a bit, so that each stage (map, reduce, etc) becomes a function of signature 
<hask>
a -> ([(s,a)] -> [(s',b)])
</hask> 
where <hask>s</hask> and <hask>s'</hask> are data types and <hask>a</hask> and <hask>b</hask> are key values.

===Generalised Monad===
Now, this is suggestive of a monad, but we can't use a monad ''per se'', because the transformation changes the key and value types, and we want to be able to access them separately. Therefore we do the following.

Let <hask>m</hask> be a <hask>Monad'</hask>, a type with four parameters: <hask>m s a s' b</hask>, where <hask>s, s'</hask> are data types and <hask>a, b</hask> are key types.

Generalise the monadic <hask>bind</hask> operation to: 
<hask>
m s a s' b -> ( b -> m s' b s'' c ) -> m s a s'' c
</hask> 
Then clearly the generalised mapper/reducer above can be written as a <hask>Monad'</hask>, meaning that we can write MapReduce as 
<hask>
... >>= mapper >>= reducer >>= mapper' >>= reducer' >>= ...
</hask>

===Implementation details===

<hask>
class Monad' m where
return :: a -> m s x s a
(>>=) :: (Eq b) => m s a s' b -> ( b -> m s' b s'' c ) -> m s a s'' c

newtype MapReduce s a s' b = MR { runMR :: ([(s,a)] -> [(s',b)]) }

retMR :: a -> MapReduce s x s a
retMR k = MR (\ss -> [(s,k) | s <- fst <$> ss])

bindMR :: (Eq b,NFData s'',NFData c) => MapReduce s a s' b -> (b -> MapReduce s' b s'' c) -> MapReduce s a s'' c
bindMR f g = MR (\s ->
let
fs = runMR f s
gs = P.map g $ nub $ snd <$> fs
in
concat $ map (\g' -> runMR g' fs) gs)
</hask> 
The key point here is that <hask>P.map</hask> is a parallel version of the simple <hask>map</hask> function.

Now we can write a wrapper function 
<hask>
wrapMR :: (Eq a) => ([s] -> [(s',b)]) -> (a -> MapReduce s a s' b)
wrapMR f = (\k -> MR (g k))
where
g k ss = f $ fst <$> filter (\s -> k == snd s) ss
</hask> 
which takes a conventional mapper / reducer and wraps it in the <hask>Monad'</hask>. Note that this means that the mapper / reducer functions ''do not need to know anything about the way MapReduce is implemented''. So a standard MapReduce job becomes 
<hask>
mapReduce :: [String] -> [(String,Int)]
mapReduce state = runMapReduce mr state
where
mr = return () >>= wrapMR mapper >>= wrapMR reducer
</hask>
I have tested the implementation with the standard word-counter mapper and reducer, and it works perfectly (full code is available via the link above).

==Future Directions==

*My code so far runs concurrently and in multiple threads within a single OS image. It won't work on clustered systems. This is clearly where work should go next.
*Currently all of the data is sent to all of the mappers / reducers at each iteration. This is okay on a single machine, but may be prohibitive on a cluster.

I would be eager for collaborative working on taking this forward.

[[User:Julianporter|julianporter]] 18:32, 2 April 2011 (UTC)

MapReduce as a monad

2011-04-03T09:17:01Z

Julianporter:

[[Category:Applications]][[Category:Monad]][[Category:Libraries]][[Category:Concurrency]][[Category:Parallel]][[Category:Research]]

==Introduction==

MapReduce is a general technique for massively parallel programming developed by Google. It takes its inspiration from ideas in functional programming, but has moved away from that paradigm to a more imperative approach.
I have noticed that MapReduce can be expressed naturally, using functional programming techniques, as a form of monad.

The standard implementation of MapReduce is the JAVA-based HADOOP framework, which is very complex and somewhat temperamental. Moreover, it is necessary to write HADOOP-specific code into mappers and reducers. My prototype library takes about 100 lines of code and can wrap generic mapper / reducer functions.

==Why a monad?==

What the monadic implementation lets us do is the following:
*Map and reduce look the same.
*You can write a simple wrapper function that takes a mapper / reducer and wraps it in the monad, so authors of mappers / reducers do not need to know anything about the MapReduce framework: they can concentrate on their algorithms.
*All of the guts of MapReduce are hidden in the monad's <hask>bind</hask> function
*The implementation is naturally parallel
*Making a MapReduce program is trivial: 
<hask>
... >>= wrapMR mapper >>= wrapMR reducer >>= ...
</hask>

==Details==
Full details of the implementation and sample code can be found [http://jpembeddedsolutions.wordpress.com/2011/04/02/mapreduce/ here]. I'll just give highlights here.

===Generalised mappers / reducers===
One can generalise MapReduce a bit, so that each stage (map, reduce, etc) becomes a function of signature 
<hask>
a -> ([(s,a)] -> [(s',b)])
</hask> 
where <hask>s</hask> and <hask>s'</hask> are data types and <hask>a</hask> and <hask>b</hask> are key values.

===Generalised Monad===
Now, this is suggestive of a monad, but we can't use a monad ''per se'', because the transformation changes the key and value types, and we want to be able to access them separately. Therefore we do the following.

Let <hask>m</hask> be a <hask>Monad'</hask>, a type with four parameters: <hask>m s a s' b</hask>, where <hask>s, s'</hask> are data types and <hask>a, b</hask> are key types.

Generalise the monadic <hask>bind</hask> operation to: 
<hask>
m s a s' b -> ( b -> m s' b s'' c ) -> m s a s'' c
</hask> 
Then clearly the generalised mapper/reducer above can be written as a <hask>Monad'</hask>, meaning that we can write MapReduce as 
<hask>
... >>= mapper >>= reducer >>= mapper' >>= reducer' >>= ...
</hask>

===Implementation details===

<hask>
class Monad' m where
return :: a -> m s x s a
(>>=) :: (Eq b) => m s a s' b -> ( b -> m s' b s'' c ) -> m s a s'' c

newtype MapReduce s a s' b = MR { runMR :: ([(s,a)] -> [(s',b)]) }

retMR :: a -> MapReduce s x s a
retMR k = MR (\ss -> [(s,k) | s <- fst <$> ss])

bindMR :: (Eq b,NFData s'',NFData c) => MapReduce s a s' b -> (b -> MapReduce s' b s'' c) -> MapReduce s a s'' c
bindMR f g = MR (\s ->
let
fs = runMR f s
gs = P.map g $ nub $ snd <$> fs
in
concat $ map (\g' -> runMR g' fs) gs)
</hask> 
The key point here is that <hask>P.map</hask> is a parallel version of the simple <hask>map</hask> function.

Now we can write a wrapper function 
<hask>
wrapMR :: (Eq a) => ([s] -> [(s',b)]) -> (a -> MapReduce s a s' b)
wrapMR f = (\k -> MR (g k))
where
g k ss = f $ fst <$> filter (\s -> k == snd s) ss
</hask> 
which takes a conventional mapper / reducer and wraps it in the <hask>Monad'</hask>. Note that this means that the mapper / reducer functions ''do not need to know anything about the way MapReduce is implemented''. So a standard MapReduce job becomes 
<hask>
mapReduce :: [String] -> [(String,Int)]
mapReduce state = runMapReduce mr state
where
mr = return () >>= wrapMR mapper >>= wrapMR reducer
</hask>
I have tested the implementation with the standard word-counter mapper and reducer, and it works perfectly (full code is available via the link above).

==Future Directions==

*My code so far runs concurrently and in multiple threads within a single OS image. It won't work on clustered systems. This is clearly where work should go next.
*Currently all of the data is sent to all of the mappers / reducers at each iteration. This is okay on a single machine, but may be prohibitive on a cluster.

I would be eager for collaborative working on taking this forward.

[[User:Julianporter|julianporter]] 18:32, 2 April 2011 (UTC)

User:Julianporter

2011-04-03T09:10:30Z

Julianporter: /* Current projects */

=About my work=
==Background==
My particular areas of interest in programming are:
*Functional programming
*Formal modelling / model based programming
*Concurrency / cloud programming
*Embedded systems
I am also establishing a small business developing control systems and software for robots. The key idea is to make the robot part of the cloud rather than a stand-alone device. Further information:
*[http://www.jpembedded.co.uk Company website]
*[http://jpembeddedsolutions.wordpress.com Blog] (also contains research papers on Haskell and functional programming)

==Current projects==

*I have developed a simple implementation of MapReduce using a modified form of monad. It's described [[MapReduce_as_a_monad|here]]. I would be very happy if others joined in the development effort.
*I'm defining and then coding a language ([[Catskell]]) in the spirit of [http://lolcode.com/ LOLCODE] which is basically a feline-friendly subset of Haskell. My intention is to write a Catskell-to-Haskell translator. This should be a good exercise in making sure I really understand the language.

=About me=

By training I am a mathematician. I have been programming computers of some form or other since the early 1980s. I also have a keen interest in philosophy and music. My personal website is [http://www.porternet.org here].

Talk:MapReduce as a monad

2011-04-02T21:16:17Z

Julianporter: /* Monads */

== Monads ==

Why is it mapreduce as a ''monad''? Map just requires Functor, and reduce sounds like `mappend`, so it'd just be MapReduce as a monoid. --[[User:Gwern|Gwern]] 20:31, 2 April 2011 (UTC)

Because the key point is that both Map and Reduce can be seen as monadic functions, and so then MapReduce is just a matter of repeated bind operations. Think of it as a generalised State monad. [[User:Julianporter|julianporter]] 21:16, 2 April 2011 (UTC)

User:Julianporter

2011-04-02T18:37:38Z

Julianporter: /* Background */

MapReduce as a monad

2011-04-02T18:35:54Z

Julianporter: /* Details */

MapReduce as a monad

2011-04-02T18:34:41Z

Julianporter: /* Implementation details */

User:Julianporter

2011-04-02T18:33:40Z

Julianporter: /* Current projects */

=About my work=
==Background==
I have been working in IT for about 20 years. My particular areas of interest in programming are:
*Functional programming
*Formal modelling / model based programming
*Concurrency / cloud programming
*Embedded systems
I am also establishing a small business developing control systems and software for robots. The key idea is to make the robot part of the cloud rather than a stand-alone device. Further information:
*[http://www.jpembedded.co.uk Company website]
*[http://jpembeddedsolutions.wordpress.com Blog] (also contains research papers on Haskell and functional programming)

==Current projects==

*I have developed a simple implementation of MapReduce using a modified form of monad. It's described [[MapReduce_as_a_monad|here]]. I would be very happy if others joined in the development effort.

=About me=

By training I am a mathematician. I have been programming computers of some form or other since the early 1980s. I also have a keen interest in philosophy and music. My personal website is [http://www.porternet.org here].

MapReduce as a monad

2011-04-02T18:32:34Z

Julianporter: A description of a prototype MapReduce library

User:Julianporter

2011-04-02T18:03:00Z

Julianporter: New page: =About my work= ==Background== I have been working in IT for about 20 years. My particular areas of interest in programming are: *Functional programming *Formal modelling / model based pr...

=About my work=
==Background==
I have been working in IT for about 20 years. My particular areas of interest in programming are:
*Functional programming
*Formal modelling / model based programming
*Concurrency / cloud programming
*Embedded systems
I am also establishing a small business developing control systems and software for robots. The key idea is to make the robot part of the cloud rather than a stand-alone device. Further information:
*[http://www.jpembedded.co.uk Company website]
*[http://jpembeddedsolutions.wordpress.com Blog] (also contains research papers on Haskell and functional programming)

==Current projects==

*I have developed a simple implementation of MapReduce using a modified form of monad. It's described [http://jpembeddedsolutions.wordpress.com/2011/04/02/mapreduce/ here]. I'll be posting code, etc on this wiki soon. I would be very happy if others joined in the development effort.

=About me=

By training I am a mathematician. I have been programming computers of some form or other since the early 1980s. I also have a keen interest in philosophy and music. My personal website is [http://www.porternet.org here].