Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 encoding issues (OpenLyrics XML to ProPresenter) #59

Open
dkomrska opened this issue Mar 4, 2024 · 14 comments
Open

UTF-8 encoding issues (OpenLyrics XML to ProPresenter) #59

dkomrska opened this issue Mar 4, 2024 · 14 comments

Comments

@dkomrska
Copy link

dkomrska commented Mar 4, 2024

Dear all,
I am migrating from OpenSong 3.1.0 to ProPresenter 7.16.1 and need to convert my song library.
I have experienced issues with UTF-8 encoding in the verses.

The songs are exported as OpenLyrics XML
test.xml.zip

File converted using LyricConverted to ProPresenter 6 looks OK. The CCLIAuthor and other metadata are correctly converted, no issues with UTF-8.
test.pro6.zip

Once opened in ProPresenter 7, the UTF-8 characters in verses are broken.
Snímek obrazovky 2024-03-04 v 13 24 20

Any ideas how to overcome the issue?
Is the problem related to the conversion or to ProPresenter 7 import of the document?

@ChrisMBarr
Copy link
Owner

So you are saying opening those converted files in ProPresenter 6 works as expected?

@dkomrska
Copy link
Author

dkomrska commented Mar 4, 2024

Hi @FiniteLooper,
So far I was trying to open the file in ProPresenter 7.
But you gave me an idea to try version 6.
Version 6.5.3 is unable to import the file correctly.
Once the file is imported, warning icon is shown next to its name. When the warning is clicked, "Incompatible File" message is shown.
Snímek obrazovky 2024-03-04 v 14 26 03

The encoding is wrong same way as in version 7.
Snímek obrazovky 2024-03-04 v 14 28 02

@ChrisMBarr
Copy link
Owner

Hmm, ok that's odd.

Yes I have no support at all for Pro7 since the file format is totally different, so importing a Pro6 file is the only way to make this work in Pro7. So that does look to be an issue with how encoding is handled, I think you are correct.

I have no idea when I'll get to take a look at this but it's now on my radar!

@ChrisMBarr
Copy link
Owner

I am taking a look at this now. It appears that LyricConverter has no issue reading the file correctly because I can use "display slides" which appears to show everything correctly

image

and I can convert it to a plain text file, and it appears to be formatted correctly

Title: 1 Bože, tebe chválíme
ccliNo: 11109
Author: Neznámý autor
Song Book: Kancionál CB


v1:
Bože, tebe chválíme, před tebou se skláníme 
s neskonalou radostí, se srdečnou vděčností,
neb nám sebe dáváš znáti. Něžně miluješ nás vždy,
otec milý v Kristu jsi, dětmi ráčíš nás v něm zváti.

v2:
Chválíme tě s vroucností, žes nám poslal z výsosti 
Syna svého drahého,
hřích náš vložils na něho a
my svobodni jsme nyní. Vzhůru, srdce, vznes se, vznes,
vesele ať zní tvůj ples z Božího tu dobrodiní!

v3:
Chválíme tě uctivě, vděčně, vroucně, ohnivě. Díky tobě vzdáváme, nad tvou láskou jásáme,
vždyť jsme lid tvůj vykoupený. Tys nás v poušti nenechal, tys nám život věčný dal. Bože náš, buď velebený!

v4:
Z Ducha znovuzrozeni, s tebou v Kristu spojeni, spásy máme jistotu, tvoji známe dobrotu, tebe, Bože, velebíme.
Díky tobě vzdáváme, že jsme tvoji, jásáme, tebe, Otče, vděčně ctíme.

I will be looking into the conversion/encoding process for how ProPresenter works next

@ChrisMBarr
Copy link
Owner

I ran your original OpenLyrics file through my tests and found no issues. It correctly encodes and decodes the UTF-8 characters. Then I realized I never tried reproducing the issue by opening the file you sent.

I was able to open the test.pro6 file you attached in ProPresenter 6 just fine.

image

So I opened ProPresenter 7 and imported the same test.pro6 file you sent, and again I had no issues at all.

image

PP7 converts files to a .pro format, so here is the working file it generated for me, you can test it out: test 1.zip


Honestly I'm not sure what to tell you here. Everything works for me and I cannot reproduce your issue. I'm not sure what is going on because I am able to import the file you sent me just fine, and when I generate the same file from the original OpenLyrics file you sent me, they are identical other than the timestamp of when the files were created.

The only thing that seems different between us is that I am using the Windows versions of ProPresenter and you are using Mac. I do not own a Mac so I can't test that... but I would be very surprised if their software on different platforms didn't open/import the same file types.

@ChrisMBarr ChrisMBarr closed this as not planned Won't fix, can't repro, duplicate, stale Mar 12, 2024
@piotrtobolski
Copy link

@ChrisMBarr can you please reopen this issue? I think I found what is the problem.

It looks like ProPresenter 7 for Mac uses the RTFData key to read the slide data and it is incorrect. In the RTF header is states that file is encoded in ansicpg1252 AKA Windows-1252 but the text in the RTF data is encoded in UTF-8. Source: https://github.com/ChrisMBarr/ProPresenter-Parser/blob/main/src/utils.ts#L19

I verified this by replacing the base64 encoded data in pro6 file with the windows-1252 encoded version and it gets correctly imported in ProPresenter 7.

Windows may be using some other key, e.g. WinFlowData and it can work correctly.

@ChrisMBarr ChrisMBarr reopened this Aug 8, 2024
@ChrisMBarr
Copy link
Owner

Ah interesting, ok. I do not have a Mac (but I've been meaning to get one...) I'll see if I can look into this on the weekend

@ChrisMBarr
Copy link
Owner

@piotrtobolski I began to look at this tonight.... but I'm kind of at a loss here. In the ProPresenter-Parser project I am simply build an RTF string in javascript: https://github.com/ChrisMBarr/ProPresenter-Parser/blob/main/src/v6/builder.ts#L214-L219

and then a little rather down I encode that as Base64: https://github.com/ChrisMBarr/ProPresenter-Parser/blob/main/src/v6/builder.ts#L260

I am using this project https://github.com/dankogai/js-base64 to handle the Base64 stuff so it can work both in the browser and Node. I am not seeing any way to specify encoding. It is possible to specify the overall file encoding the the final output document, but that's just something for ProPresent to read: https://github.com/ChrisMBarr/ProPresenter-Parser/blob/main/src/v6/builder.ts#L85

@piotrtobolski
Copy link

I think some additional dependency is required like https://github.com/ashtuchkin/iconv-lite

Then you could do something like this:

const rtfText = Utils.formatRtf(
      text,
      this.options.slideTextFormatting.fontName,
      this.options.slideTextFormatting.textSize,
      Utils.normalizeColorToRgbObj(this.options.slideTextFormatting.textColor)
    );
...
const rtfBuffer = iconv.encode(rtfText, 'windows-1252');
...
        { '@rvXMLIvarName': 'RTFData', '#text': Base64.fromUint8Array(rtfBuffer) },

But this isn't perfect solution. This unfortunately has an issue that is the windows-1252 doesn't contain all required characters for all languages. In my case (Polish) I would need to have it all converted to windows-1250 :-/ I can't find any good rtf-writing libraries I need to investigate it further. I guess I can try to import my files on Windows, export them and then import on a Mac.

@piotrtobolski
Copy link

Importing your file (test1.zip) didn't help so I guess importing it on Windows and exporting won't help. It keeps the incorrectly encoded rtf data in the file.

@piotrtobolski
Copy link

I guess we could try asking for output encoding during the file import (e.g. windows-1252 or windows-1250) and then use iconv-lite to encode the rtf (and change encoding in the header)

@ChrisMBarr
Copy link
Owner

@piotrtobolski Give this a try again. Someone brought an RTF formatting/encoding issue up to me on the ProPresenter parser, and that's now been updated here as well. It's here if you're curious: ChrisMBarr/ProPresenter-Parser#72

@piotrtobolski
Copy link

That didn't change anything in terms of how songs are encoded and didn't make any difference for me, sorry

@ChrisMBarr
Copy link
Owner

Bah, ok sorry - I was hoping that would fix it. oh well. Sorry I just haven't had time to look into this yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants