UnicodeByteString
Revision as of 16:43, 24 September 2007 by JohanTibell (talk | contribs) (Added motiviation section.)
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
This draft proposal for a new Unicode layer on top of ByteString
is still being written.
Motivation
ByteString
provides a faster and more memory efficient data type than [Word8]
for processing raw bytes. By creating a Unicode layer on top of ByteString
that deals in units of characters instead of units of bytes we can achieve similar performance improvements over String
for text processing. A Unicode data type also removes the error prone process of keeping track of strings encoded as raw bytes stored in ByteString
s. Using functions such as length
on a Unicode string just works even though different encodings use different numbers of bytes to represent a character.
Specification
Open Issues
References
- http://www.python.org/dev/peps/pep-0358/ - PEP 3116 -- New I/O
- http://python.org/dev/peps/pep-3116/ - PEP 358 -- The "bytes" Object