-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-language support design #16
Comments
I think if we are serious about i18n, we should implement Unicode support because it clearly has some advantages. Otherwise, we can simply support GBK and forget about this topic until the next need (if any) arises. @RottenBlock any thoughts? I don't know much about Sys0Decompiler and I have no experience translating games, so I may be overlooking something. @silas1037 If we decide to implement Unicode support, are you willing to use it in your zh translation? |
Yeah, Of course.
I think it is nothing better than system3.ini~. And now I am still confusing about Nise~tower's unpack and repack because of a little unsupport between compiler and emulator. |
I agree that unicode support is best. silas1037 isn't the first to come asking about localization support in other languages, though they are the first to go the extra mile to implement it! It would be better to provide support for everyone if we're willing to take those steps. I think you're right that command line arguments / system3.ini will work without the need for a language tag. I don't think there's any huge problems on Sys0Decompiler's side, and I have the luxury of simply adding a Unicode radio button to the layout and so don't even have to worry about command line arguments. I don't think you're overlooking much. There are some languages that don't cooperate nicely with monospace fonts, but the end result is more "ugly" than "broken." Here's a Reddit thread arguing about it, for what that's worth. Devangari is apparently especially bad. But even in those cases, supporting unicode is still a necessary first step. One problem that might be unavoidable: are there any latter-day System 3.5 games that send their text through a DLL? Those might take additional work, if you haven't already handled them. Sorry, I haven't experimented much with xsystem35. "It would be nice if the compiler had some abstract notation for Gaiji, so that users don't have to care about the actual representation in the target character encoding." At present, Sys0Decompiler has gaiji turned into hexidecimal notation during decompilation, e.g. "0xEBBA", and turned back during compilation. It could be a little clearer what's going on, though, especially in a Unicode environment. Unicode uses the notation U+#### to refer to character codes, so I'm half-tempted to change it to G+####. |
Okay, sounds like we all agreed that Unicode without a language tag is the best option. I'll give it a shot this weekend.
Yeah text rendering is a hard problem. Even English has a problem -- system3-sdl2 can render proportional (non-monospace) fonts, but still draws characters one-by-one, so kerning isn't working.
xsystem35 has its own implementation of the DLLs, and the Unicode mode is supported.
G+#### sounds good. |
I've implemented the Unicode support in (Very hacky) compiler change is here: kichikuou/Sys0Decompiler@4f303f6 With these, I was able to recompile and run Rance 4.1 zh version in Unicode. How to test:
Caveats:
|
Good! I will test it soon, hopefully. EDIT: I have tested it on new compiled game successfully. Good job. |
@silas1037 Thank you for testing! I merged the changes into the @RottenBlock Are you interested in integrating the Unicode output support code to Sys0Decompiler? |
Yes, I'd like to integrate these updates soon, but I'm busy for the next few days. I'm sorry if that slows you down, you're making serious progress! |
Sorry this is going to slowly. After working on the decompiler for a few days, I finally got around to testing the code you put up, which I probably should have done from the start. Unfortunately, I'm getting invalid characters when system3-sdl tries to run compiled code. Just to be sure, I tried using your version of Sys0Decompiler and the results are the same. Here's what I'm doing: I've compiled Little Vampire to include the word "兰斯" (Rance in Simplified Chinese) in both the opening text and the startup menu (i.e. using both a page file and in the AG00 file). In both cases, it comes out looking like this: Any ideas what's going wrong? I'll probably have to send you my code, but I'll wait to hear which files you want instead of sending everything in a big wave. Maybe the problem is that I haven't set the -encoding config to a value, but the program currently doesn't use the -encoding config. Or maybe I'm misunderstanding? |
The string looks like the result of decoding Did you specify |
System1 and 2 are not solved well too. I fail to start Alice Yakata2 and Little Vampire in unicode. |
Ah, my bad, for some reason I was convinced the -encoding parameter wasn't hooked up yet. Not sure how I expected it to work, really. So that's entirely on me. That said, now I'm getting this: So the second character is correct but the first has become a dot, for some reason? Again, both in the text and the menu. |
font issue. You can use simhei.ttf for chs |
Oh, of course, thank you! |
Do you see any error messages? |
No, it just crashed. Game option is sure. Maybe I could send you the pack. |
Ah another thing I forgot to mention. Sys0Decompiler in #16 (comment) had a bug that AG00.DAT was not generated correctly. It's fixed in kichikuou/Sys0Decompiler@0fb6d95, please try it if you haven't yet. |
Here's the updated version of Sys0Decompiler so far. It includes some features from 0.7.6 (including the removal of the REV tag) and can compile to UTF8, complete with updated GUI. The decompile form is also updated, but decompiling still doesn't work. https://www.mediafire.com/file/evlkrp0u0k6wead/Sys0Decompiler_0.8_%2528WIP%2529.zip/file |
Thank you, solved. |
Great progress! It seems "Text Output Encoding" decompile option is not working (output is Shift-JIS regardless of the selection). This is not a bug, but I noticed that Mugen Houyou fails to (re)compile in UTF-8 because a page overflows the 64KB address space. Messages take much more space in UTF-8, especially a hiragana consumes only 1 byte in SJIS when stored as a hankaku, but consumes 3 bytes in UTF-8, regardless of zenkaku or hankaku. This is unfortunate, but I don't think it's fixable. |
An idea. Is that possible to achieve backlog function for system3 ? |
Ah, yes, I haven't gotten around to that yet, but I can probably do it before anything else!
Yes, I saw that in Rance 4.1 when it tries to compile the Dangerous Tengu Legend novella. Because the novella is so unusual, I was hoping it wouldn't affect other games, but I guess Mugen Hoyou clinches it. I'm afraid the most I can do is add an error message, but I'll add that all the same. |
It seems that 0xEBAB (12) is not showed in SaveLoad Page, but normal in main message. system3-sdl2's problem, as I put 1995 ADISK into it and bug still exists. |
I think the reason for the above is because system3-sdl2 no longer support the original ShiftJIS gaiji range, even when running in ShiftJIS mode. Or am I mistaken? |
But the new ADISK is compiled by 0.8 compiler, it should be converted to U+XX. |
EDIT: Nevermind this. |
I think the issue is that a save data created at 12:45pm is displayed as This is not an issue of system3-sdl2 but a bug of Rance4.1 script. In |
Oh right, that! That's fixed in my version of rance41 and 42, so if silas1037 wants, they can check that for my fix. silas, check that and see if it fixes the problem you posted on the wiki, as well, please! |
Great, it comes to normal ♪ |
Now game titles and language-dependent string constants can be overridden by system3.ini (see this commit message for details). With this and the Unicode support, I believe it's no longer necessary to modify the system3-sdl2 code for translation. @RottenBlock I'll keep |
Sure, that sounds good to me. The decompiler should be fully compatible now! I'd appreciate any tests anyone wants to run. I just need to update the manuals at this point to account for UTF-8 and the like. https://www.mediafire.com/file/lsmmyce9d362ql7/Sys0Decompiler+0.8+(WIP2).zip/file Concerning the max page size we discussed up the thread: I assumed that it caps out at a full 65535 bytes, does that sound correct? It's possible I've forgotten some small detail that might change things by a byte or two. |
Yay decompilation of Unicode game worked! AFAICT system3-sdl2 should be able to handle 65535-byte pages. The error message for the max page size worked for Rance 4.1, but for Mugen Houyou it raised an unhandled labelMap[strLabel].DestinationAddress = Convert.ToUInt16(outputStream.Position);// curAddress; Maybe the check for Other than that, it's working perfectly so far. :) |
Thanks, I think I've got that now, I've added checks to every compile-side ToInt16 in the program: https://www.mediafire.com/file/lsmmyce9d362ql7/Sys0Decompiler+0.8+(WIP2).zip/file Edit: Somehow I forgot to check for escape characters in messages. This has also been fixed. |
Confirmed that the unhandled exception in Mugen Houyou has been fixed, thanks! |
https://www.mediafire.com/file/6llezbie6koe84s/Sys0Decompiler_0.8_Source.zip/file Additional fixes in this one. Officially launched it over at the wiki. |
Congrats on the release! It seems manuals are not included in the binary distribution ( |
I've updated I think we can close this. @RottenBlock @silas1037 Thank you for your cooperation! |
Background
Originally System1-3 only supported Japanese characters, encoded in Shift-JIS. ASCII characters are included in Shift-JIS, but they were interpreted as commands rather than text messages.
SysEng (by @RottenBlock) enabled ASCII characters in messages, by enclosing them with quotation marks (
'
or"
). It was merged to system3-sdl2.@silas1037 is working on GBK character encoding support, for Simplified Chinese translation. The code change is fairly simple, thanks to the similarity between GBK and Shift-JIS.
Now let's step back a bit, and explore possible alternative designs for multi-language support.
Unicode vs non-unicode
Unicode
xsystem35-sdl2 and xsys35c have complete UTF-8 support, as described in "Unicode mode" document. I think a similar approach is possible in system3-sdl2.
Pros:
Cons:
Non-unicode
Instead of using Unicode, we may support different encodings for each character set (Shift-JIS for Japanese, GBK for Simplified Chinese, Big5 for Traditional Chinese, etc.).
Pros:
is_[12]byte_message()
. Many non-unicode encodings have similar structure so could be supported in a similar way.Cons:
0xeb9f
-0xebfc
and0xec40
-0xec9e
are used for Gaiji in Shift-JIS, but it overlaps with regular character ranges in GBK.Language tag
Currently, there is no way to declare the character encoding used in ADISK.DAT. For the interpreter (or decompiler), the character encoding has to be provided separately, e.g. by a command-line flag. Should we have a language / character-encoding tag in the head of the scenario file, like the (now deprecated)
REV
tag of SysEng?This may not be so beneficial, since many games require the correct game ID to work properly, and the character encoding can be determined from the game ID.
Compiler / decompiler
I'm not familiar with Sys0Decompiler internal, but it seems simply changing the encoding name was enough for GBK compilation, and it would possibly work for UTF-8 or other encodings too. Decompilation would be straightforward as well, if the character encoding is known.
It would be nice if the compiler had some abstract notation for Gaiji, so that users don't have to care about the actual representation in the target character encoding.
Zenkaku-Hankaku conversion
System3 games store certain Zenkaku (2-byte) characters as Hankaku (1-byte) characters, to save precious floppy disk space. There is no reason to perform such conversion in non-Shift-JIS encodings.
The text was updated successfully, but these errors were encountered: