Difference between revisions of "Accessible layout proposal"
(Initial draft) |
(Adjusted desugaring to deal with existing record syntax) |
||
Line 3: | Line 3: | ||
The character sequences <code>#(</code>, <code>#[</code>, <code>#$</code>, and <code>#</code> would become reserved tokens that start a new layout block with the following desugaring: |
The character sequences <code>#(</code>, <code>#[</code>, <code>#$</code>, and <code>#</code> would become reserved tokens that start a new layout block with the following desugaring: |
||
<pre> |
<pre> |
||
− | #( {a;b;c} == |
+ | #( {a;b;c} ==> (a,b,c) |
− | #[ {a;b;c} == |
+ | #[ {a;b;c} ==> [a,b,c] |
− | epct #$ {a;b;c} == |
+ | epct #$ {a;b;c} ==> epct (a) (b) (c) |
− | # {a;b;c} == |
+ | # {a;b;c} ==> {a,b,c} |
</pre> |
</pre> |
||
where <code>epct</code> represents an expression, pattern, data constructor, type constructor, or class name. The above tokens would be added to the list of tokens that start layout blocks, and appropriate rules would be added to the grammar itself to perform the above desugarings, for example: |
where <code>epct</code> represents an expression, pattern, data constructor, type constructor, or class name. The above tokens would be added to the list of tokens that start layout blocks, and appropriate rules would be added to the grammar itself to perform the above desugarings, for example: |
||
<pre> |
<pre> |
||
fexp -> [fexp] aexp |
fexp -> [fexp] aexp |
||
− | fexp -> fexp "#$" block< |
+ | fexp -> fexp "#$" block<exp> |
</pre> |
</pre> |
||
− | where <code>block< |
+ | where <code>block<q></code> expands to <code>"{" [q (";" q)*] "}"</code> in EBNF. It is important to realise that <code>#$</code> above is not an operator: it's part of the grammar itself, and should be thought of as a syntactic sugar that "happens" before the "real" grammar is used, even though it should be able to be implemented by adding appropriate extra rules to the grammar itself as above. |
+ | |||
+ | Note that <code>#$</code> effectively puts parentheses round each element of the block where parentheses would be needed in the usual grammar so we can convert a block of <code>exp</code> to a sequence of <code>aexp</code> for example. |
||
==Motivation== |
==Motivation== |
||
Line 38: | Line 40: | ||
putStrLn "hello" |
putStrLn "hello" |
||
</haskell> |
</haskell> |
||
− | <code>#</code> |
+ | <code>#</code> allows us to write records: |
<haskell> |
<haskell> |
||
let p = Person { personName = "Zarathustra Aurelio" |
let p = Person { personName = "Zarathustra Aurelio" |
||
− | + | , personAddress = "Ancient Persia" |
|
− | + | , personAge = 4000 |
|
} |
} |
||
Line 49: | Line 51: | ||
personAge = 4000 |
personAge = 4000 |
||
</haskell> |
</haskell> |
||
+ | Remember that <code>#</code> desugars a block of things (sequence separated by semicolons enclosed in braces) into a sequence separated by commas enclosed in braces. We can call the former a '''semicolon-block''' and the latter a '''comma-block'''. |
||
− | The above illustrates the power of choosing a single construct - the block which is a semicolon separated list of elements enclosed in braces - in many different situations in the language, which is a very natural way of thinking if you try to write a recursive descent parser for Haskell, but is not supported by the current method of describing the grammar using a CFG, where piecemeal ad-hoc syntax can slip past undetected due to the need to keep duplicating a mapping from syntax to the same concept ("a (possibly ordered) set of things") in many different places in the grammar. |
||
<code>#(</code> would be useful in import and export lists (and everywhere else tuple syntax is used): |
<code>#(</code> would be useful in import and export lists (and everywhere else tuple syntax is used): |
||
Line 78: | Line 80: | ||
where |
where |
||
</haskell> |
</haskell> |
||
− | + | We could also add some rules to the grammar to allow comma-blocks to be used anywhere that the tuple notation is currently used for non-tuples. Then the example above could be written as: |
|
<haskell> |
<haskell> |
||
module Foo # |
module Foo # |
||
Line 91: | Line 93: | ||
Not only does the above look less cluttered than the current syntax which uses parentheses and commas, it also allows you to easily reorder things without having to bother with that pesky comma that always ends up in the wrong place when you cut and paste... |
Not only does the above look less cluttered than the current syntax which uses parentheses and commas, it also allows you to easily reorder things without having to bother with that pesky comma that always ends up in the wrong place when you cut and paste... |
||
− | Also, we could add rules to allow data types to be defined using blocks instead (or as well as) the <code>|</code> notation: |
+ | Also, we could add rules to allow data types to be defined using comma-blocks instead (or as well as) the <code>|</code> notation: |
<haskell> |
<haskell> |
||
data T = One | Two | Three |
data T = One | Two | Three |
||
Line 97: | Line 99: | ||
could also be written using block notation as: |
could also be written using block notation as: |
||
<haskell> |
<haskell> |
||
− | data T = {One |
+ | data T = {One, Two, Three} |
</haskell> |
</haskell> |
||
which then allows: |
which then allows: |
||
Line 107: | Line 109: | ||
==Implications== |
==Implications== |
||
− | <code>#(</code>, <code>#[</code>, <code>#$</code>, and <code>#</code> would become reserved tokens so <code>#$</code> and <code>#</code> could no longer be used as symbols (operators). Allowing blocks to be used in place of parenthesised import/export/deriving/predicate lists should not have any negative effects because such uses occupy a hitherto unused space in the grammar (as far as I can tell [[User:Brianh|Brianh]] 16:12, 3 February 2007 (UTC)) |
+ | <code>#(</code>, <code>#[</code>, <code>#$</code>, and <code>#</code> would become reserved tokens so <code>#$</code> and <code>#</code> could no longer be used as symbols (operators). Allowing comma-blocks to be used in place of parenthesised import/export/deriving/predicate lists should not have any negative effects because such uses occupy a hitherto unused space in the grammar (as far as I can tell [[User:Brianh|Brianh]] 16:12, 3 February 2007 (UTC)). |
+ | |||
+ | ==Why are there two kinds of block?== |
||
+ | In the above we've had to make a distinction between blocks with comma-separated elements and blocks with semicolon-separated elements. This distinction was necessary because the Haskell98 record syntax requires a comma-separated block. Perhaps if the original grammar had been designed with recursive descent parsing in mind, the concept of "block of things" would have arisen and been given only one syntax, namely semicolon separated elements in braces. However as it is the <code>#</code> rule easily allows us to use semicolon-blocks where comma-blocks are needed as the record example shows. |
Revision as of 11:29, 4 February 2007
Description
The character sequences #(
, #[
, #$
, and #
would become reserved tokens that start a new layout block with the following desugaring:
#( {a;b;c} ==> (a,b,c) #[ {a;b;c} ==> [a,b,c] epct #$ {a;b;c} ==> epct (a) (b) (c) # {a;b;c} ==> {a,b,c}
where epct
represents an expression, pattern, data constructor, type constructor, or class name. The above tokens would be added to the list of tokens that start layout blocks, and appropriate rules would be added to the grammar itself to perform the above desugarings, for example:
fexp -> [fexp] aexp fexp -> fexp "#$" block<exp>
where block
expands to "{" [q (";" q)*] "}"
in EBNF. It is important to realise that #$
above is not an operator: it's part of the grammar itself, and should be thought of as a syntactic sugar that "happens" before the "real" grammar is used, even though it should be able to be implemented by adding appropriate extra rules to the grammar itself as above.
Note that #$
effectively puts parentheses round each element of the block where parentheses would be needed in the usual grammar so we can convert a block of exp
to a sequence of aexp
for example.
Motivation
You might be thinking the above just looks totally far out and mad but the reason for wanting it is that it would allow you to make use of layout to avoid many commas and parentheses. For example consider:
main = do
a <- getChar
bracket_
(enter a)
(exit a)
(do
putChar a
putStrLn "hello")
The programmer has used indentation and newlines to make it clear that bracket_
has 3 arguments. However the compiler can't see this indentation and newlines, because bracket_
does not introduce layout. However, by using #$
, we can eliminate the parentheses:
main = do
a <- getChar
bracket_ #$
enter a
exit a
do
putChar a
putStrLn "hello"
#
allows us to write records:
let p = Person { personName = "Zarathustra Aurelio"
, personAddress = "Ancient Persia"
, personAge = 4000
}
let p = Person # personName = "Zarathustra Aurelio"
personAddress = "Ancient Persia"
personAge = 4000
Remember that #
desugars a block of things (sequence separated by semicolons enclosed in braces) into a sequence separated by commas enclosed in braces. We can call the former a semicolon-block and the latter a comma-block.
#(
would be useful in import and export lists (and everywhere else tuple syntax is used):
module Foo
( Zap(Con1, Con2, Con3)
, mkZap
, count
) where
could instead be written as:
module Foo #(
Zap(Con1, Con2, Con3)
mkZap
count
where
or even:
module Foo #(
Zap #(
Con1
Con2
Con3
mkZap
count
where
We could also add some rules to the grammar to allow comma-blocks to be used anywhere that the tuple notation is currently used for non-tuples. Then the example above could be written as:
module Foo #
Zap #
Con1
Con2
Con3
mkZap
count
where
Not only does the above look less cluttered than the current syntax which uses parentheses and commas, it also allows you to easily reorder things without having to bother with that pesky comma that always ends up in the wrong place when you cut and paste...
Also, we could add rules to allow data types to be defined using comma-blocks instead (or as well as) the |
notation:
data T = One | Two | Three
could also be written using block notation as:
data T = {One, Two, Three}
which then allows:
data T = # One
Two
Three
Implications
#(
, #[
, #$
, and #
would become reserved tokens so #$
and #
could no longer be used as symbols (operators). Allowing comma-blocks to be used in place of parenthesised import/export/deriving/predicate lists should not have any negative effects because such uses occupy a hitherto unused space in the grammar (as far as I can tell Brianh 16:12, 3 February 2007 (UTC)).
Why are there two kinds of block?
In the above we've had to make a distinction between blocks with comma-separated elements and blocks with semicolon-separated elements. This distinction was necessary because the Haskell98 record syntax requires a comma-separated block. Perhaps if the original grammar had been designed with recursive descent parsing in mind, the concept of "block of things" would have arisen and been given only one syntax, namely semicolon separated elements in braces. However as it is the #
rule easily allows us to use semicolon-blocks where comma-blocks are needed as the record example shows.