Skip to content
This repository has been archived by the owner on Sep 21, 2020. It is now read-only.

Is UTF-8 supported? #5

Open
magiblot opened this issue Jul 13, 2018 · 10 comments
Open

Is UTF-8 supported? #5

magiblot opened this issue Jul 13, 2018 · 10 comments

Comments

@magiblot
Copy link

I wrote a very simple Turbo Vision application to check the support for UTF-8. I wrote the strings "Alt-X Ñ" and "F10 Ç" in the source code and saved it as UTF-8 text. The LANG environment variable refers to a UTF-8 encoding. Then, the compiled program with g++ 8.1.1 looks like this in an X11 window:
screenshot_20180713_144043

And like this in the console:
mpv-shot0020

Am I missing any command-line option or is this an issue from the source code? If the later, what part of the source code is implied in this?

Lots of thanks.

@windoze
Copy link
Owner

windoze commented Jul 14, 2018

Be honest I've no idea now, the code was taken from http://sourceforge.net/projects/tvision, and it's pretty ancient. I guess it doesn't support UTF-8 very well, especially when it uses ISO8859-1 symbols for borders and glyph.
I will take a look into the source code and get back to you.
Thanks.

@magiblot
Copy link
Author

magiblot commented Jul 15, 2018 via email

@set-soft
Copy link

Hi!
The code doesn't support Unicode. And it isn't a simple detail.
You should convert your strings to a code page, ISO 8859-1 (aka latin1) for example.
I wrote a couple of classes that can handle some Unicode, but is experimental and never did it for the rest of classes.
Regards, SET

@magiblot
Copy link
Author

Oh, how unfortunate!
Well, that's what there is. Thanks for your reply!

@magiblot
Copy link
Author

magiblot commented Aug 6, 2020

In case anyone may be interested, I managed to add Unicode support in my fork of Turbo Vision and released a text editor making use of it.

@windoze
Copy link
Owner

windoze commented Aug 6, 2020

Wow that's awesome, looks like a nice codebase for a TUI version of VSCode.
Tried on my mac but besides the ncurses/ncursesw issue it still doesn't compile because of some strange static_assert failures, from any perspective the class TCellAttribs should be trivial and standard_layout but Apple clang doesn't agree.
I'll dig deeper tomorrow.

@magiblot
Copy link
Author

magiblot commented Aug 6, 2020

Well, you can just disable the static asserts and not worry about it. We can investigate what went wrong in another moment.

@magiblot
Copy link
Author

magiblot commented Aug 6, 2020

Compilation issues under clang have been fixed in both tvision and turbo.

@set-soft
Copy link

In case anyone may be interested, I managed to add Unicode support in my fork of Turbo Vision and released a text editor making use of it.

You did a very good work!!! Congratulations!!!
I just compiled the examples and tried it. The way it works on a terminal emulator is very impressive.
I'll take a deeper look.

@magiblot
Copy link
Author

magiblot commented Aug 22, 2020

¡Gracias Salvador!

I would say the key points in Unicode support are:

  • Relying in UTF-8 instead of UTF-32/wchar_t. This avoids having to break the API and existing code. The most common operations involving strings are copy and parameter passing; it is only in very specific places that text encoding matters. The UTF-8 Everywhere Manifesto is very right.
  • Abstracting common encoding-depending operations on strings. The TText interface offers this functionality. For instance, the equivalent of ptr++ is ptr += TText::next(ptr). This requires little refactoring and even results in code making less assumptions.
  • Backward compatibility with Borland C++. And especially because I like to avoid #ifdefs as much as possible, this requirement leads to keeping things simple and having to do more abstractions.
  • Modern language features: I find it very important to be able to do a lot of stuff by typing little. C++ has today a lot more tools than when Borland developed Turbo Vision or you began to write your port.
  • Unicode in modern systems: being able to switch codepages is not an interesting feature nowadays, but it was when you wrote your port. A more complex solution could have been necessary if single-byte encodings were still a thing in modern linux.
  • Getting rid of fixed-length arrays. How many times have I ran across char buf[256]? At first it may look like a greater array will be necessary for greater display sizes, but it turns out that those arrays are not necessary at all in most cases. The TDrawBuffer interface only accepted null-terminated strings and so these arrays were used to generate substrings. It is hilarious to see how these arrays, which restrict functionality and are very prone to stack overruns, can be easily replaced with string views (std::string_view in the STL, or TStringView in my port of Turbo Vision), resulting in even simpler code. See this commit, for example. This is relevant to Unicode support because with fixed-length arrays you would have to make sure not to split multibyte characters when copying text around. Not developing a string view class was probably the biggest mistake of the engineers at Borland.

Other comments:

  • By offering backward compatibility, the legacy code needs less changes in general. For instance, adding commonly used STL classes into the global namespace (e.g. this header) prevents us from having to deal with any further workarounds. Another key feature is the implementation of Borland's filesystem operations.
  • Unlike earlier versions, Turbo Vision 2.0 has 32-bit support and does not require purging DOS dependencies (far pointers, video interrupts, etc.). It is even a Win32 console app! Backward compatibility would have not been possible without this. It looks like your port derives from Turbo Vision 1.03 instead, so breaking compatibilty was necessary from the very beginning.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants