UnicodeByteString

From HaskellWiki
Revision as of 16:43, 24 September 2007 by JohanTibell (talk | contribs) (Added motiviation section.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

This draft proposal for a new Unicode layer on top of ByteString is still being written.

Motivation

ByteString provides a faster and more memory efficient data type than [Word8] for processing raw bytes. By creating a Unicode layer on top of ByteString that deals in units of characters instead of units of bytes we can achieve similar performance improvements over String for text processing. A Unicode data type also removes the error prone process of keeping track of strings encoded as raw bytes stored in ByteStrings. Using functions such as length on a Unicode string just works even though different encodings use different numbers of bytes to represent a character.

Specification

Open Issues

References

  1. http://www.python.org/dev/peps/pep-0358/ - PEP 3116 -- New I/O
  2. http://python.org/dev/peps/pep-3116/ - PEP 358 -- The "bytes" Object