Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variables with non-UTF8 characters in their name get the wrong name after transmission #32

Open
Timendus opened this issue Sep 21, 2021 · 6 comments · May be fixed by #37
Open

Variables with non-UTF8 characters in their name get the wrong name after transmission #32

Timendus opened this issue Sep 21, 2021 · 6 comments · May be fixed by #37

Comments

@Timendus
Copy link
Owner

Things like θ, small L and probably Pic / Str get messed up. The same may be true the other way around for the result from getDirectory.

As described by @debrouxl in Timendus/ticalc.link#8:

dealing with file name encoding / tokenization matters for both display and transfer functionality, for instance (and probably not limited to):

  • sending to a 84+ through DUSB data from a 83+ / 84+ program stored in a 83+ format with which the 84+ is compatible. For instance, the names of Pic0-9 (reasonably frequently used in games or school programs), GDB0-9 and Str0-9 are tokenized.
  • sending or receiving FlashApps performing language localization: at least the Spanish language localization FlashApp's name contains a special international character, n tilde;
  • sending or receiving variables containing Greek character names other than θ, some of which are definitely valid in file names at least on the TI-68k series. I've just created two variables named (greek beta) and (greek gamma) in TIEmu, then used TILP to perform a dirlist through the virtual cable and DBUS protocol: TILP displayed the expected appropriate greek letters.

The files of libticonv most relevant to you are charset.cc and filename.cc: https://github.com/debrouxl/tilibs/tree/experimental2/libticonv/trunk/src .

To do: write bi-directional mapping functions between TI encoding and UTF-8.

@debrouxl
Copy link
Collaborator

Between multiple TI encodings and UTF-8, even :)
From day 1, and even without implementing support for the TI-68k series or the older TI-Z80 models, the design needs to allow for multiple charsets, due to the classic DBUS encodings and the newer DUSB / CARS encodings, both relevant to the newer TI-Z80 & TI-eZ80 models.

Eventually, for ticalc-usb+ticalc.link to become a high-quality alternative, you'll have to implement nearly all of the layers provided by libti* anyway ;)

@Timendus Timendus transferred this issue from Timendus/ticalc.link Sep 27, 2021
@Timendus
Copy link
Owner Author

I've been spending some time on this issue today, without making much progress. After porting over some of charset.cc and trying a couple of things, I can now parse theta properly as θ instead of [ on the PC side of things. That's nice, but now I'm not sure what to send to the calculator to get it to show the right thing too.

I'm assuming I have to return the name back to the byte stream as found in the file, and not touch it*. But that should be exactly what I'm already doing, so that doesn't seem to be the answer. Or there's a bug in my logic somewhere that is transforming the data where it shouldn't. Can someone confirm that the format in which the names of variables are stored in (for example) the *.8xg file is indeed the TI format that the calculator expects, and not UTF8 or some other weirdness?

*) unless we're opening a TI-83+ file and sending it to a TI-84+, if I understand your remark above correctly. In that case I assume the correct way would be: convert from TI-83+ format to UTF16 as an intermediary and then from UTF16 to Ti-84+ format.

@Timendus
Copy link
Owner Author

Timendus commented Nov 18, 2021

Adding insult to injury, θ is represented as 91 in the file, which is also what it says both here:

https://github.com/debrouxl/tilibs/blob/a4a638df4494aa8d80819e485c4e3316a158f1ef/libticonv/trunk/src/charset.cc#L689

(0x3b8 being the 91th element, zero indexed) and here:

https://github.com/debrouxl/tilibs/blob/a4a638df4494aa8d80819e485c4e3316a158f1ef/libticonv/trunk/src/charset.cc#L771

Yet still, sending 91 to my test TI-84+ renders not θ but [ in the PRGM menu.

@Timendus
Copy link
Owner Author

Sending 0x3b8 renders θ though... 😂
What the hell. Does this mean that these conversions are just from "insane TI file format" to "normal space, including on the calculator"? I thought they were a mapping from calculator charset to UTF16, as it says on the tin..?

@Timendus
Copy link
Owner Author

Wait... the conversion is just for the "nonusb" part..? So for theta 91 is the value to send through the link port, but through the USB port it expects 0x03b8 in UTF16..?

@debrouxl
Copy link
Collaborator

Although it's not quite perfect ( debrouxl/tilibs#12 ), for checking the sequences of bytes flowing through the cable, the output of libti*'s packet logging code, in ~/.ticables after closing TILP, is better than capturing raw USB packets with the likes of usbmon, USBPcap or other similar software, and viewing the packets in Wireshark or similar - an approach which does not work for the virtual cables supported by TILP and TilEm/TIEmu, anyway.

@Timendus Timendus linked a pull request Nov 22, 2021 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants