-
Notifications
You must be signed in to change notification settings - Fork 0
Strings
BASE has 2 primary classes for handling strings: String
and WideString
. When you develop a new project it is highly recommended that you use Unicode through out.
For String the assumption is that encoding is UTF-8. But you can pass it any byte string - but it is then up to you to handle the encoding yourself. Having \0 inside the string is also fine. However, anything after the first \0 will often be discarded. E.g. when called system APIs.
For WideString
the encoding is UCS-4. Note that it doesn't use wchar. This is to ensure that we get exactly one character per code. This causes some difference for Windows which uses UTF-16 for encoding of std::wstring
which can result in 2 codes per character.
For String
you have to remember to handle that size/length of string is not the same as number of characters in the string. You can use WideString for code where this is important.
String
and WideString
has all the relevant casts so you do not need to explicitly do conversions. However, if you need to convert from std::string
and std::wstring
you will need to use the StdString
class.
String
and WideString
are so common that these do not use template implementation so make it easier to look at compiler errors, stack traces, and similar.
WideString
supports ISO codes up to 0x7ffffff. But when integrating with other applications you may need to ensure that you only pass Unicode codes up to 0x10ffff.
String
and WideString
use reference counting internally which makes copy-by-value no-cost. If you modify the content a copy will be made automatically. This also means that you should not request the internal buffer for mutable access if you only need non-modifying access or you will get an expensive copy.
Another benefit is that since copying of strings does NOT cause any exceptions; You can preserve noexcept
on many of your methods.