Talk page for "Comprehensive Comprehensions"
This is a discussion page for the paper Comprehensive Comprehensions.
If you are kind enough to read this paper, you may like to jot down any thoughts it triggers off, and see what others have written. This talk-page lets you do just that.
You can identify your entries by preceding them with four tildes. Doing so adds your name, and the date. Thus:
- Simonpj 08:42, 19 April 2007 (UTC) Note from Simon
If you say who you are in this way, we'll be able to acknowledge your help in a revised version of the paper.
MichaelAdams 14:51, 19 June 2007 (UTC)
In theory these operators (order-by and group-by) should generalize to monads and once you do several other design options open up. These two operators could even be unified into a single operator.
Starting with sort-by, I think the monadic version is fairly obvious. Take something like the following code.
do a <- ma ... b <- mb c <- mc sort by (b, c) using foo d <- md ... return (a, b, c, d)
It would de-sugar to:
((do a <- ma ... b <- mb c <- mc return ((b, c), (a, b, c)) ) `foo` fst) >>= \result -> do let (a, _, _) = result (_, b, _) = result (_, _, c) = result d <- md ... return (a, b, c, d)
Where we have:
foo :: forall a. (a -> t) -> m a -> m a
Generalizing Order-by to Group-by
In fact after a few tweaks it turns out that order-by and group-by could operate under the exact same de-sugaring.
Suppose we let the type of foo be:
foo :: (Functor n) => forall a. (a -> t) -> m a -> m (n a)
Notice that I said "m (n a)" and not "m (m a)". The group-by de-sugaring that I'm going to show works for any Functor "n", and if we imagine "n" to be some type like the following then order-by is just a special case of group-by.
type Ident a = a
(It wouldn't actually be valid Haskell code to use a type synonym in that way though, but it conveys the idea.)
The de-sugaring to support this would take something like the following (inspired by arrow syntax).
do a <- ma b <- mb c <- mc foo args $-< (a, b) -- group by (a, b) using (foo args) g <- mg return (a, b, c, d, e, f, g)
It would produce:
(ma >>= \a -> mb >>= \b -> mc >>= \c -> return ((a, b), (a, b, c))) `foo` args >>= \result -> let a = fmap (\(_, (a, _, _)) -> a) result b = fmap (\(_, (_, b, _)) -> b) result c = fmap (\(_, (_, _, c)) -> c) result in md >>= \d -> return (a, b, c, d)
(Note that after the "foo", the types of "a", "b" and "c" have changed. Their types before "foo" get wrapped by "n" after "foo".)
Even more generalization
Once we have done this, another possibility opens up. Notice that the first of each pair in "result" was never used except from within "foo". It doesn't do any work in the result. Now suppose we change the type of foo to:
foo :: (Functor n) => forall a. m (t, a) -> m (n (u, a))
Now "foo" can not only read existing bindings from "t", but it can also create new bindings with "u". (This hard wires the extraction function to always be "fst", but it simplifies the presentation a bit. The other form with a general extraction could still be used.)
The previous syntax could then be extended to something like:
do a <- ma b <- mb c <- mc (d, e, f) <-$ foo args $-< (a, b) g <- mg return (a, b, c, d, e, f, g)
This would then de-sugar to:
(ma >>= \a -> mb >>= \b -> mc >>= \c -> return ((a, b), (a, b, c))) `foo` args >>= \result -> let d = fmap (\((d, _, _), (_, _, _)) -> d) result e = fmap (\((_, e, _), (_, _, _)) -> e) result f = fmap (\((_, _, f), (_, _, _)) -> f) result a = fmap (\((_, _, _), (a, _, _)) -> a) result b = fmap (\((_, _, _), (_, b, _)) -> b) result c = fmap (\((_, _, _), (_, _, c)) -> c) result in mg >>= \g -> return (a, b, c, d, e, f, g)
The above might have gone too far down the generalization road and put to many bells and whistles on the thing, so it may be worth trimming it down. I also haven't given any thought to what applications would need this. I just wanted to consider how far these operations
Just a few little things:
- Page 2, "sortBy is part of the Haskell Prelude" - it's actually in the List module. (I just spotted you've got the same thing in SYB with Class, 7.1)
- The Down trick is very neat, perhaps that should be a part of the standard libraries.
- Page 5, MSFT is in 'quotes', but should be in "quotes".
Your syntax requires four new keywords, at least one of which is already a standard function (group). Plus with the knowledge of the keywords the parse tree is entirely different. From your paper:
order by x >= y using takeWhile
At first reading I parsed >= as the root node, since in Haskell that would be the way it works. Your 'then' syntax in 6.1 seems preferable as it doesn't take any additional keywords.
--Neil Mitchell 16:04, 19 June 2007 (UTC)
The paper also mentions a function "the." I wasn't able to find this function through hoogle or ":t the" in ghci. Perhaps you could add a one line description the way "nub" is described.
- see section 3.4, its a custom function complete with implementation --Neil Mitchell 23:28, 19 June 2007 (UTC)
- Thanks Neil, I missed it.
As a non-expert, the sense I get is that whenever I see "by" in a list comprehension, I should expect functions or expressions that operate on lists and not on individual elements of lists.
When I first started reading the paper, I was going to recommend another extension (for time series) to allow things like moving averages...imagine my surprise when I find exactly the example I was going to suggest :) Falcon 17:10, 19 June 2007 (UTC)
Maybe I didn't get some part of the paper, but is it really necessary to have 'order by' or 'group by' as syntax extensions? Isn't it possible to allow developers to use any function as long it matches the types of 'order' or 'group'? Falcon 18:53, 20 June 2007 (UTC)
The type given for sortBy on page 2 should be sortBy :: (a -> a -> Ordering) -> [a] -> [a]. Botje 18:50, 5 July 2007 (UTC)
Janis 11:10, 17 July 2007 (UTC) :
In the middle of the left side of page nine, I was confused by "... we can relate the operation of f on the two different encodings of tuples discussed above ...". It seems those different encodings are actrually discussed only later ("below" rather than "above"). Or I did miss something here.
Section 4.6 claims that patterns built of tuples are irrefutable. I don't see how this is the case in Haskell, in which tuple types are lifted. It is very well possible to refute an (a,b)-pattern: just match it against undefined.
A comparison with HaskellDB would be interesting; they're obviously rather different, as HaskellDB restricts you to writing queries with a direct translation to SQL, and field values are passed around in a kind of record rather than as explicit Haskell values (since they end up being implemented as bits of SQL). But if you are suggesting that these comprehensions could be used for direct database access, HaskellDB is a relevant "competitor". GaneshSittampalam 07:59, 20 July 2007 (UTC)
- For section 8: there is also GROUP BY in XSLT 2.0.
- I think you should consider showing examples with records. The unextended Haskell examples in chapter 2 become a lot less complex with records. See 
Sjoerd visscher 09:54, 28 July 2007 (UTC)
I think this is quite nifty. I don't really have any comments on the substance of the paper, but I think I can put my nit-picky perfectionist editing skills to good use:
- There are duplicate commas on page 5 (in the definitions of p4 and p6) and page 6 (definition of p4').
- The commas in the example at the end of p.3 don't follow the same style as all the rest of the examples, if you care about that.
- You may want to insert a space after [1,1,2] at the top of page 6.
- Unless I'm missing something, the function runs defined at the end of page 4 doesn't work as described; e.g. runs 2 [1,2,3] = [[1,2],[2,3],,,,,,...], not [[1,2],[2,3]].
- In section 4.1, you mention that the cartesian product of qualifiers is associative; isn't zip associative as well? If so, perhaps you should mention that fact.
- In the LET rule in Figure 2, should the conclusion say \Gamma \vdash let w = e => \Delta ? A let could bind a whole pattern, not just a single variable; and otherwise I don't see what the second condition involving w is for.
- In the GROUP2 rule at the bottom of Figure 2, "by e" should be omitted.
- I may be missing something, but I don't understand the notation used at the top of Figure 3 for the equation (w <- e)_v = w (and by extension, the equation immediately following). w could be any general pattern, so putting w on the right seems strange when the right-hand side should be a tuple of bound variables. Maybe some sort of notation like w_v should be introduced, with the obvious interpretation?
Byorgey 21:32, 7 August 2007 (UTC) (Brent Yorgey)
I am pleasantly surprised to see Philip Wadler tackling list comprehension as a database query language more than 18 years after his first paper on the subject.
Interestingly, I recently considered ways to include an 'order by' construct in a list comprehension language called BioVelo (available at biocyc.org/query.html) to query biological databases.
So far, I have taken the simple approach of applying a sort after the list creation, which is not the most efficient when dealing on tuples where a sort is desired on subgroups; but I thought it was direct enough to be easily understandable.
This solution to include the 'order by' and 'group by' directly as qualifiers is syntactically appealing since it is more in line with list comprehension. Yet, the temptation to generalise the meaning of 'order by' to functions that are not related to sorting is bound to confuse users. Section 6 of the paper, that deals with a different wording, points to a very simple solution to sorting: provide the possibility to apply a function as a qualifier to the output of any generators. I think that the approach presented generalise to this since it may be desirable to sort on subgroups at different points during the construction of the list. (You would be allowed several "order by", "group by", etc.)
By the way: nice winter picture!
Sumimus 22:47, 24 September 2007 (UTC)Mario Latendresse