I think there is another caveat in that last example which is supposed to run in constant space.
The lines (in the last code example)
x' <- readSTRef x y' <- readSTRef y writeSTRef x y' writeSTRef y (x'+y')
will build up a long unevaluated sum (1+1+2+3+5+8+..) in the STRef, which takes up stack space. When compiled I get a stack overflow when running fibST 1100000 (1.1 million) with a stack size of 8MB. There might be some hidden strictness in that "The ST monad provides support for strict state threads." but that isn't explained in this page.
Forcing evaluation with seq stops the stack overflow from happening, e.g.
x' `seq` writeSTRef y (x'+y')