UnicodeByteString
Revision as of 16:43, 24 September 2007 by JohanTibell (talk | contribs) (Added motiviation section.)
This draft proposal for a new Unicode layer on top of ByteString
is still being written.
Motivation
ByteString
provides a faster and more memory efficient data type than [Word8]
for processing raw bytes. By creating a Unicode layer on top of ByteString
that deals in units of characters instead of units of bytes we can achieve similar performance improvements over String
for text processing. A Unicode data type also removes the error prone process of keeping track of strings encoded as raw bytes stored in ByteString
s. Using functions such as length
on a Unicode string just works even though different encodings use different numbers of bytes to represent a character.
Specification
Open Issues
References
- http://www.python.org/dev/peps/pep-0358/ - PEP 3116 -- New I/O
- http://python.org/dev/peps/pep-3116/ - PEP 358 -- The "bytes" Object