Skip to content

Strings

René Fonseca edited this page Feb 3, 2020 · 8 revisions

BASE has 2 primary classes for handling strings: String and WideString. When you develop a new project it is highly recommended that you use Unicode through out.

For String the assumption is that encoding is UTF-8. But you can pass it any byte string - but it is then up to you to handle the encoding yourself. Having \0 inside the string is also fine. However, anything after the first \0 will often be discarded. E.g. when called system APIs.

For WideString the encoding is UCS-4. Note that it doesn't use wchar. This is to ensure that we get exactly one character per code. This causes some difference for Windows which uses UTF-16 for encoding of std::wstring which can result in 2 codes per character.

For String you have to remember to handle that size/length of string is not the same as number of characters in the string. You can use WideString for code where this is important.

String and WideString has all the relevant casts so you do not need to explicitly do conversions. However, if you need to convert from std::string and std::wstring you will need to use the StdString class.

String and WideString are so common that these do not use template implementation so make it easier to look at compiler errors, stack traces, and similar.

WideString supports ISO codes up to 0x7ffffff. But when integrating with other applications you may need to ensure that you only pass Unicode codes up to 0x10ffff.

Implementation

String and WideString use reference counting internally which makes copy-by-value no-cost. If you modify the content a copy will be made automatically. This also means that you should not request the internal buffer for mutable access if you only need non-modifying access or you will get an expensive copy.

Another benefit is that since copying of strings does NOT cause any exceptions; You can preserve noexcept on many of your methods.

Clone this wiki locally