Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No warnings found in CI #185

Closed
giulianorasper opened this issue Aug 15, 2021 · 8 comments
Closed

No warnings found in CI #185

giulianorasper opened this issue Aug 15, 2021 · 8 comments

Comments

@giulianorasper
Copy link

giulianorasper commented Aug 15, 2021

There seems to be an issue where no warnings are found when using the tool in CI.
Running java -jar textidote.jar --check de --output html PuE.tex > language_report.html locally
in the root of the repo works and warnings are found. In the CI log, it does not state that the file was skipped, but no warnings are found either.
Explicitly using the correct file name "PuE.tex" in the CI script instead of the variable has no effect.

gitlab-ci.txt

@sylvainhalle
Copy link
Owner

Could it be a file encoding problem? I recall this issue from last year:

#120 (comment)

If the encoding of the file does not match what TeXtidote expects, nothing is being read and that would explain the absence of warnings.

@sylvainhalle
Copy link
Owner

To debug the issue, may I suggest you try this command:

java -jar textidote.jar --clean --check de --output html PuE.tex > cleaned.txt

If cleaned.txt is empty, then we'd have a hint about what is going on.

@giulianorasper
Copy link
Author

I just created a minimal example for this issue in this repository.
The pipeline of this repository generated the following artifacts.
These artifacts.zip were generated on my local system (Ubuntu in WSL).
Notably, in the cleaned.txt generated by GitLab CI, the characters 'ü' and 'ß' have been replaced with '?' whereas this did not happen in the locally generated version.

@sylvainhalle
Copy link
Owner

Thanks for providing these artifacts. I opened the files in a hex editor to see how the characters have been encoded. Here is what I found:

Source file (main.tex):

  • ü: C3BC -> UTF-8
  • ß: C39F -> UTF-8

CI (cleaned.txt):

  • ü: 3F -> "?"
  • ß: 3F -> "?"

Local (cleaned.txt):

  • ü: FC -> latin-1
  • ß: DF -> latin-1

I am a bit puzzled by what I see. The source file is a valid UTF-8 document. When processed locally, it ends up as a file transcoded into latin-1 (visible by the fact that the two characters end up with a different hex value). I don't know how this is possible, as TeXtidote always assumes the default encoding of the OS it runs in. Finally, when it is run in the CI pipeline, the characters are garbled --indicating again that the program does not assume UTF-8 as the input encoding. However, looking at your CI configuration, I see that you use a Debian OS, so UTF-8 input should not be a problem.

A workaround for your problem would be to explicitly TeXtidote to use UTF-8, by adding the --encoding UTF-8 command line switch when you call it. Tell me if this changes something.

@giulianorasper
Copy link
Author

giulianorasper commented Aug 19, 2021

Thanks for the help so far! As suggested, I added --encoding UTF-8 parameter in the CI script.
However, this did not affect the resulting CI artifacts.

To confirm that the main.tex is not altered by Git in some unexpected way when pushing / pulling, I also tried downloading my local main.tex version as part of the pipeline on another branch which yielded the same results.

@sylvainhalle
Copy link
Owner

This may not be related, but I see that the calls to TeXtidote mix the --clean option with the --check option. These two are mutually exclusive: calling clean only cleans the document and exits before performing any other verification.

@ComanderKai77
Copy link
Contributor

@giulianorasper
Did you find a solution?

@sylvainhalle
Copy link
Owner

Will close this due to lack of information to fix the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants