UnicodeByteString

From HaskellWiki
Revision as of 16:43, 24 September 2007 by JohanTibell (talk | contribs) (Added motiviation section.)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This draft proposal for a new Unicode layer on top of ByteString is still being written.

Motivation

ByteString provides a faster and more memory efficient data type than [Word8] for processing raw bytes. By creating a Unicode layer on top of ByteString that deals in units of characters instead of units of bytes we can achieve similar performance improvements over String for text processing. A Unicode data type also removes the error prone process of keeping track of strings encoded as raw bytes stored in ByteStrings. Using functions such as length on a Unicode string just works even though different encodings use different numbers of bytes to represent a character.

Specification

Open Issues

References

  1. http://www.python.org/dev/peps/pep-0358/ - PEP 3116 -- New I/O
  2. http://python.org/dev/peps/pep-3116/ - PEP 358 -- The "bytes" Object