Difference between revisions of "Accessible layout proposal"

From HaskellWiki
Jump to navigation Jump to search
(Initial draft)
 
(Fixed grammar rule for #$ so lhs is an exp0 not just fexp)
 
(One intermediate revision by the same user not shown)
Line 3: Line 3:
 
The character sequences <code>#(</code>, <code>#[</code>, <code>#$</code>, and <code>#</code> would become reserved tokens that start a new layout block with the following desugaring:
 
The character sequences <code>#(</code>, <code>#[</code>, <code>#$</code>, and <code>#</code> would become reserved tokens that start a new layout block with the following desugaring:
 
<pre>
 
<pre>
#( {a;b;c} === (a,b,c)
+
#( {a;b;c} ==> (a,b,c)
#[ {a;b;c} === [a,b,c]
+
#[ {a;b;c} ==> [a,b,c]
epct #$ {a;b;c} === epct a b c
+
epct #$ {a;b;c} ==> epct (a) (b) (c)
# {a;b;c} === {a;b;c}
+
# {a;b;c} ==> {a,b,c}
 
</pre>
 
</pre>
 
where <code>epct</code> represents an expression, pattern, data constructor, type constructor, or class name. The above tokens would be added to the list of tokens that start layout blocks, and appropriate rules would be added to the grammar itself to perform the above desugarings, for example:
 
where <code>epct</code> represents an expression, pattern, data constructor, type constructor, or class name. The above tokens would be added to the list of tokens that start layout blocks, and appropriate rules would be added to the grammar itself to perform the above desugarings, for example:
 
<pre>
 
<pre>
fexp -> [fexp] aexp
+
exp -> exp0 "#$" block<exp>
fexp -> fexp "#$" block<aexp>
 
 
</pre>
 
</pre>
where <code>block<aexp></code> expands to <code>"{" [aexp (";" aexp)*] "}"</code> in EBNF. It is important to realise that <code>#$</code> above is not an operator: it's part of the grammar itself, and should be thought of as a syntactic sugar that "happens" before the "real" grammar is used, even though it should be able to be implemented by adding appropriate extra rules to the grammar itself as above.
+
where <code>block<q></code> expands to <code>"{" [q (";" q)*] "}"</code> in EBNF. It is important to realise that <code>#$</code> above is not an operator: it's part of the grammar itself, and should be thought of as a syntactic sugar that "happens" before the "real" grammar is used, even though it should be able to be implemented by adding appropriate extra rules to the grammar itself as above.
  +
  +
Note that <code>#$</code> effectively puts parentheses round each element of the block where parentheses would be needed in the usual grammar so we can convert a block of <code>exp</code> to a sequence of <code>aexp</code> for example.
   
 
==Motivation==
 
==Motivation==
Line 38: Line 39:
 
putStrLn "hello"
 
putStrLn "hello"
 
</haskell>
 
</haskell>
<code>#</code> would allow records to be laid out if we allowed records to also be written using semicolons instead of commas:
+
<code>#</code> allows us to write records:
 
<haskell>
 
<haskell>
 
let p = Person { personName = "Zarathustra Aurelio"
 
let p = Person { personName = "Zarathustra Aurelio"
; personAddress = "Ancient Persia"
+
, personAddress = "Ancient Persia"
; personAge = 4000
+
, personAge = 4000
 
}
 
}
   
Line 49: Line 50:
 
personAge = 4000
 
personAge = 4000
 
</haskell>
 
</haskell>
  +
Remember that <code>#</code> desugars a block of things (sequence separated by semicolons enclosed in braces) into a sequence separated by commas enclosed in braces. We can call the former a '''semicolon-block''' and the latter a '''comma-block'''.
The above illustrates the power of choosing a single construct - the block which is a semicolon separated list of elements enclosed in braces - in many different situations in the language, which is a very natural way of thinking if you try to write a recursive descent parser for Haskell, but is not supported by the current method of describing the grammar using a CFG, where piecemeal ad-hoc syntax can slip past undetected due to the need to keep duplicating a mapping from syntax to the same concept ("a (possibly ordered) set of things") in many different places in the grammar.
 
   
 
<code>#(</code> would be useful in import and export lists (and everywhere else tuple syntax is used):
 
<code>#(</code> would be useful in import and export lists (and everywhere else tuple syntax is used):
Line 78: Line 79:
 
where
 
where
 
</haskell>
 
</haskell>
Similarly to the case for records, we could note that import and export lists are really ''sets'' of things, and so perhaps we could allow a simple block to be used instead of using the "tuple" notation. Then the example above could be written as:
+
We could also add some rules to the grammar to allow comma-blocks to be used anywhere that the tuple notation is currently used for non-tuples. Then the example above could be written as:
 
<haskell>
 
<haskell>
 
module Foo #
 
module Foo #
Line 91: Line 92:
 
Not only does the above look less cluttered than the current syntax which uses parentheses and commas, it also allows you to easily reorder things without having to bother with that pesky comma that always ends up in the wrong place when you cut and paste...
 
Not only does the above look less cluttered than the current syntax which uses parentheses and commas, it also allows you to easily reorder things without having to bother with that pesky comma that always ends up in the wrong place when you cut and paste...
   
Also, we could add rules to allow data types to be defined using blocks instead (or as well as) the <code>|</code> notation:
+
Also, we could add rules to allow data types to be defined using comma-blocks instead (or as well as) the <code>|</code> notation:
 
<haskell>
 
<haskell>
 
data T = One | Two | Three
 
data T = One | Two | Three
Line 97: Line 98:
 
could also be written using block notation as:
 
could also be written using block notation as:
 
<haskell>
 
<haskell>
data T = {One; Two; Three}
+
data T = {One, Two, Three}
 
</haskell>
 
</haskell>
 
which then allows:
 
which then allows:
Line 107: Line 108:
   
 
==Implications==
 
==Implications==
<code>#(</code>, <code>#[</code>, <code>#$</code>, and <code>#</code> would become reserved tokens so <code>#$</code> and <code>#</code> could no longer be used as symbols (operators). Allowing blocks to be used in place of parenthesised import/export/deriving/predicate lists should not have any negative effects because such uses occupy a hitherto unused space in the grammar (as far as I can tell [[User:Brianh|Brianh]] 16:12, 3 February 2007 (UTC)). A possible parsing challenge would be allowing semicolons to be used as the separator for record field lists (thus turning a record field list into a standard block of fields) while at the same time also allowing commas to be used here for backwards compatibility, though this should hopefully not be insurmountable.
+
<code>#(</code>, <code>#[</code>, <code>#$</code>, and <code>#</code> would become reserved tokens so <code>#$</code> and <code>#</code> could no longer be used as symbols (operators). Allowing comma-blocks to be used in place of parenthesised import/export/deriving/predicate lists should not have any negative effects because such uses occupy a hitherto unused space in the grammar (as far as I can tell [[User:Brianh|Brianh]] 16:12, 3 February 2007 (UTC)).
  +
  +
==Why are there two kinds of block?==
  +
In the above we've had to make a distinction between blocks with comma-separated elements and blocks with semicolon-separated elements. This distinction was necessary because the Haskell98 record syntax requires a comma-separated block. Perhaps if the original grammar had been designed with recursive descent parsing in mind, the concept of "block of things" would have arisen and been given only one syntax, namely semicolon separated elements in braces. However as it is the <code>#</code> rule easily allows us to use semicolon-blocks where comma-blocks are needed as the record example shows.

Latest revision as of 02:11, 5 February 2007

Description

The character sequences #(, #[, #$, and # would become reserved tokens that start a new layout block with the following desugaring:

    #( {a;b;c}        ==>     (a,b,c)
    #[ {a;b;c}        ==>     [a,b,c]
    epct #$ {a;b;c}   ==>     epct (a) (b) (c)
    # {a;b;c}         ==>     {a,b,c}

where epct represents an expression, pattern, data constructor, type constructor, or class name. The above tokens would be added to the list of tokens that start layout blocks, and appropriate rules would be added to the grammar itself to perform the above desugarings, for example:

    exp -> exp0 "#$" block<exp>

where block expands to "{" [q (";" q)*] "}" in EBNF. It is important to realise that #$ above is not an operator: it's part of the grammar itself, and should be thought of as a syntactic sugar that "happens" before the "real" grammar is used, even though it should be able to be implemented by adding appropriate extra rules to the grammar itself as above.

Note that #$ effectively puts parentheses round each element of the block where parentheses would be needed in the usual grammar so we can convert a block of exp to a sequence of aexp for example.

Motivation

You might be thinking the above just looks totally far out and mad but the reason for wanting it is that it would allow you to make use of layout to avoid many commas and parentheses. For example consider:

    main = do
              a <- getChar
              bracket_
                  (enter a)
                  (exit a)
                  (do
                      putChar a
                      putStrLn "hello")

The programmer has used indentation and newlines to make it clear that bracket_ has 3 arguments. However the compiler can't see this indentation and newlines, because bracket_ does not introduce layout. However, by using #$, we can eliminate the parentheses:

    main = do
              a <- getChar
              bracket_ #$
                  enter a
                  exit a
                  do
                      putChar a
                      putStrLn "hello"

# allows us to write records:

    let p = Person { personName = "Zarathustra Aurelio"
                   , personAddress = "Ancient Persia"
                   , personAge = 4000
                   }

    let p = Person # personName = "Zarathustra Aurelio"
                     personAddress = "Ancient Persia"
                     personAge = 4000

Remember that # desugars a block of things (sequence separated by semicolons enclosed in braces) into a sequence separated by commas enclosed in braces. We can call the former a semicolon-block and the latter a comma-block.

#( would be useful in import and export lists (and everywhere else tuple syntax is used):

    module Foo
        ( Zap(Con1, Con2, Con3)
        , mkZap
        , count
        ) where

could instead be written as:

    module Foo #(
        Zap(Con1, Con2, Con3)
        mkZap
        count
        where

or even:

    module Foo #(
        Zap #(
            Con1
            Con2
            Con3
        mkZap
        count
        where

We could also add some rules to the grammar to allow comma-blocks to be used anywhere that the tuple notation is currently used for non-tuples. Then the example above could be written as:

    module Foo #
        Zap #
            Con1
            Con2
            Con3
        mkZap
        count
        where

Not only does the above look less cluttered than the current syntax which uses parentheses and commas, it also allows you to easily reorder things without having to bother with that pesky comma that always ends up in the wrong place when you cut and paste...

Also, we could add rules to allow data types to be defined using comma-blocks instead (or as well as) the | notation:

    data T = One | Two | Three

could also be written using block notation as:

    data T = {One, Two, Three}

which then allows:

    data T = # One
               Two
               Three

Implications

#(, #[, #$, and # would become reserved tokens so #$ and # could no longer be used as symbols (operators). Allowing comma-blocks to be used in place of parenthesised import/export/deriving/predicate lists should not have any negative effects because such uses occupy a hitherto unused space in the grammar (as far as I can tell Brianh 16:12, 3 February 2007 (UTC)).

Why are there two kinds of block?

In the above we've had to make a distinction between blocks with comma-separated elements and blocks with semicolon-separated elements. This distinction was necessary because the Haskell98 record syntax requires a comma-separated block. Perhaps if the original grammar had been designed with recursive descent parsing in mind, the concept of "block of things" would have arisen and been given only one syntax, namely semicolon separated elements in braces. However as it is the # rule easily allows us to use semicolon-blocks where comma-blocks are needed as the record example shows.