Difference between revisions of "Performance/Floating point"
(+cat) 
(Haskell code markup) 

Line 1:  Line 1:  
{{Performance infobox}} 
{{Performance infobox}} 

[[Category:PerformanceFloating point]] 
[[Category:PerformanceFloating point]] 

−  == Don't use < 
+  == Don't use <hask>Float</hask> == 
−  < 
+  <hask>Float</hask>s (probably 32bits) are almost always a bad idea, unless you Really Know What You Are Doing. Use <hask>Double</hask>s. There's rarely a speed disadvantage—modern machines will use the same floatingpoint unit for both. With <hask>Double</hask>s, you are much less likely to hang yourself with numerical errors. 
−  One time when < 
+  One time when <hask>Float</hask> might be a good idea is if you have a ''lot'' of them, say a giant array of <hask>Float</hask>s. An unboxed array of <hask>Float</hask> (see [[Performance/Arrays]]) takes up half the space in the heap compared to an unboxed array of <hask>Double</hask>. However, boxed <hask>Float</hask>s will only take up less space than boxed <hask>Double</hask>s if you are on a 32bit machine (on a 64bit machine, a <hask>Float</hask> still takes up 64 bits). 
The speed claims may not be true due to Doubles not necessarily being 
The speed claims may not be true due to Doubles not necessarily being 

Line 12:  Line 12:  
== GHCspecific advice == 
== GHCspecific advice == 

−  On x86 (and other platforms with GHC prior to version 6.4.2), use the <tt>fexcessprecision</tt> flag to improve performance of floatingpoint intensive code (up to 2x speedups have been seen). This will keep more intermediates in registers instead of memory, at the expense of occasional differences in results due to unpredictable rounding. See the [http://www.haskell.org/ghc/docs/latest/html/users_guide/optionsoptimise.html#optionsf GHC documentation] for more details. Switching on GCCs 
+  On x86 (and other platforms with GHC prior to version 6.4.2), use the <tt>fexcessprecision</tt> flag to improve performance of floatingpoint intensive code (up to 2x speedups have been seen). This will keep more intermediates in registers instead of memory, at the expense of occasional differences in results due to unpredictable rounding. See the [http://www.haskell.org/ghc/docs/latest/html/users_guide/optionsoptimise.html#optionsf GHC documentation] for more details. Switching on GCCs <tt>ffastmath</tt> and <tt>O3</tt> can also help (use <tt>optcffastmath</tt> and <tt>optcO3</tt>). 
Where available, the <tt>optcmarch=pentium4 optcmfpmath=sse</tt> flags may also help. 
Where available, the <tt>optcmarch=pentium4 optcmfpmath=sse</tt> flags may also help. 

−  +  Note that the <tt>fexcessprecision</tt> flag may make programs behave oddly, 

−  e.g. after falling an < 
+  e.g. after falling an <hask>if x < 0</hask> branch you may find that <hask>x</hask> is now not less than zero, as it has been written out to memory and thus some precision lost in the mean time. 
Latest revision as of 08:34, 15 June 2007
Haskell Performance Resource
Constructs: Techniques: 
Don't use Float
Float
s (probably 32bits) are almost always a bad idea, unless you Really Know What You Are Doing. Use Double
s. There's rarely a speed disadvantage—modern machines will use the same floatingpoint unit for both. With Double
s, you are much less likely to hang yourself with numerical errors.
One time when Float
might be a good idea is if you have a lot of them, say a giant array of Float
s. An unboxed array of Float
(see Performance/Arrays) takes up half the space in the heap compared to an unboxed array of Double
. However, boxed Float
s will only take up less space than boxed Double
s if you are on a 32bit machine (on a 64bit machine, a Float
still takes up 64 bits).
The speed claims may not be true due to Doubles not necessarily being aligned as the machine wishes. We could do with some benchmarking on various platforms to see what's what.
GHCspecific advice
On x86 (and other platforms with GHC prior to version 6.4.2), use the fexcessprecision flag to improve performance of floatingpoint intensive code (up to 2x speedups have been seen). This will keep more intermediates in registers instead of memory, at the expense of occasional differences in results due to unpredictable rounding. See the GHC documentation for more details. Switching on GCCs ffastmath and O3 can also help (use optcffastmath and optcO3).
Where available, the optcmarch=pentium4 optcmfpmath=sse flags may also help.
Note that the fexcessprecision flag may make programs behave oddly,
e.g. after falling an if x < 0
branch you may find that x
is now not less than zero, as it has been written out to memory and thus some precision lost in the mean time.