Introduce MS Windows CI #425

mgautierfr · 2024-08-26T13:10:42Z

No description provided.

codecov · 2024-08-26T13:14:01Z

Codecov Report

Attention: Patch coverage is 39.28571% with 34 lines in your changes missing coverage. Please review.

Project coverage is 26.13%. Comparing base (06b9916) to head (5a8c6df).
Report is 11 commits behind head on main.

Files with missing lines	Patch %	Lines
src/zimcheck/zimcheck.cpp	41.66%	0 Missing and 28 partials ⚠️
src/zimdump.cpp	0.00%	3 Missing ⚠️
src/zimsplit.cpp	0.00%	2 Missing ⚠️
src/metadata.cpp	66.66%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #425      +/-   ##
==========================================
- Coverage   27.01%   26.13%   -0.88%     
==========================================
  Files          26       26              
  Lines        2458     2437      -21     
  Branches     1339     1362      +23     
==========================================
- Hits          664      637      -27     
+ Misses       1305     1292      -13     
- Partials      489      508      +19

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

kelson42 · 2024-08-27T13:53:29Z

Considering that writer is available everywhere in libzim, why do we still have anything "without writer"?

mgautierfr · 2024-08-27T13:59:36Z

This is zimwriterfs. We don't have libmagic on Windows.
The first commit of this PR (so the name) was about deactivating tests associated to zimwriterfs (and needed dependencies) when trying to compile without it (on Windows).

Since then, compilation of zim-tools on Windows was a bit more complex than expected so the PR is also more complex.

mgautierfr · 2024-08-27T17:28:37Z

This Pr need port of zimcheck to docoptcpp to be "finished". (Will do)

@veloman-yunkan Please can you have a look to 5186d50 (#425)
I don't have a damn idea why we have this behavior.

veloman-yunkan · 2024-08-27T20:27:01Z

src/metadata.cpp

- return icu::UnicodeString::fromUTF8(utf8EncodedString).length();
+ // For some unknown reason implicite convertion from std::string to icu::StringPiece
+ // is broken on Windows.
+ // Constructors are definde in stringpiece.h as
+ // ```
+ // StringPiece(const std::string& str)
+ // : ptr_(str.data()), length_(static_cast<int32_t>(str.size())) { }
+ // StringPiece(const char* offset, int32_t len) : ptr_(offset), length_(len) { }
+ // ```
+ // However using the first constructor ends with a corrupted StringPiece (wrong ptr)
+ // and using second one works. Don't ask me why
+ // This is broken
+ // icu::StringPiece stringPiece(utf8EncodedString);
+ // This is not
+ icu::StringPiece stringPiece(utf8EncodedString.data(), static_cast<int32_t>(utf8EncodedString.size()));
+ return icu::UnicodeString::fromUTF8(stringPiece).length();


The only explanation that I could find is because of the following template constructor of StringPiece:

/** * Constructs from some other implementation of a string piece class, from any * C++ record type that has these two methods: * * \code{.cpp} * * struct OtherStringPieceClass { * const char* data(); // or const char8_t* * size_t size(); * }; * * \endcode * * The other string piece class will typically be std::string_view from C++17 * or absl::string_view from Abseil. * * Starting with C++20, data() may also return a const char8_t* pointer, * as from std::u8string_view. * * @param str the other string piece * @stable ICU 65 */ template <typename T, typename = typename std::enable_if< (std::is_same<decltype(T().data()), const char*>::value #if defined(__cpp_char8_t) || std::is_same<decltype(T().data()), const char8_t*>::value #endif ) && std::is_same<decltype(T().size()), size_t>::value>::type> StringPiece(T str) : ptr_(reinterpret_cast<const char*>(str.data())), length_(static_cast<int32_t>(str.size())) {}

If overload resolution for some reason chooses that constructor over StringPiece(const std::string& str) then the string is passed into it by value, and the pointer is bound to the data of a temporary object that is destroyed after the completion of the StringPiece constructor. That explanation is valid for an explicit stringPiece variable from your commented out example, since the temporary object definitely doesn't survive beyond that point. I am less sure if it can also be valid for the original single liner implementation, as I thought that the temporary objects are required to survive until the end of the full expression in the context of which they were created. But it can be a bug in the compiler (on top of the other bug that chooses the wrong overload).

Interesting idea but not conclusive.
I have removed this constructor in the stringpiece header and same error occurs.

I am less sure if it can also be valid for the original single liner implementation, as I thought that the temporary objects are required to survive until the end of the full expression in the context of which they were created. But it can be a bug in the compiler (on top of the other bug that chooses the wrong overload).

I confirm to you that single liner version is not helping.
A single line "fix" is return icu::UnicodeString::fromUTF8(utf8EncodedString.data()).length(); but we may truncate binary data containing \0 so I prefer be explicit about the data size.

mgautierfr · 2024-08-28T13:23:55Z

Failing tests on Windows seems to be related to docopt/docopt.cpp#49
The workaround is to tell docopt to use boost regex instead of std::regex.

But we don't have boost in kiwix-build...

mgautierfr · 2024-08-28T19:10:35Z

kiwix-build now build docoptcpp with boost.regex on Windows.

veloman-yunkan · 2024-08-29T11:06:04Z

src/zimcheck/zimcheck.cpp

+ -a --all run all tests. Default if no flags are given.
+ -0 --empty Empty content
+ -c --checksum Internal CheckSum Test
+ -i --integrity Low-level correctness/integrity checks
+ -m --metadata MetaData Entries
+ -f --favicon Favicon
+ -p --main Main page
+ -r --redundant Redundant data check
+ -u --url_internal URL check - Internal URLs
+ -x --url_external URL check - External URLs
+ -d --details Details of error
+ -b --progress Print progress report
+ -j --json Output in JSON format
+ -h --help Displays Help
+ -v --version Displays software version
+ -l --redirect_loop Checks for the existence of redirect loops
+ -w=<nb_thread> --threads=<nb_thread> count of threads to utilize [default: 1]


Maybe it makes sense to mention in the commit message that short options are now accepted only in lower case (or make the short options UPPER CASE in the usage string, like before, and emphasize that they are accepted only as UPPER CASE).

There is We lost parsing case insensitive options on the way..
But I can be more explicit.

I wonder why we need upper case option ?
Almost all tools I know use lower case option and never uppercase.

There is We lost parsing case insensitive options on the way..

Oops, didn't notice it.

I wonder why we need upper case option ? Almost all tools I know use lower case option and never uppercase.

I agree. But the old usage string documented upper case short options. If zimcheck were a popular tool with usage in a lot of scripts this change won't be welcome.

If zimcheck were a popular tool with usage in a lot of scripts this change won't be welcome.

I agree. But not sure it is such a popular tool :)
@kelson42 @rgaudin is it ok for you ?

May I ask the technical reason why we have such a move in this PR?

So, if this is confirmed that lowercase args would be better from the user perspective and habits AND we have no reasonable technical alternative, then I guess we can move forward like this but we need to make a major release!

And I have just check and it seens to NOT work.
But we can keep the upper case short option only (as it was told in previous usage string). But lower case short option would not work.

At least we can preserve support for upper case short options (because that's how they were documented) instead of switching to lower case options.

Lets keep uppercase short option only then

src/zimcheck/zimcheck.cpp

kelson42 · 2024-08-30T10:31:16Z

@mgautierfr This PR seems to brake packaging for Ubuntu jammy and focal because they are working on main, see https://github.com/openzim/zim-tools/actions/runs/10540978671

mgautierfr · 2024-08-30T12:19:39Z

Last commit fix compilation with older docoptcpp version found in jammy and focal distribution.

veloman-yunkan

Let's squash the last commit ("Move back to upper case short options.") into "Port zimcheck to docoptcpp.") and the PR can be merged.

mgautierfr · 2024-09-02T12:04:39Z

Done

Compilation of zim-tools is broken on Windows. But let's setup the CI to validate the PR.

Meson already handle werror and wall, let's use it.

And anyway the argument is storing the output `std::string::size` which is a `size_t`

As specified in isprint documentation[1], it is undefined behavior if input cannot represented as unsigned char. Let's convert it as suggested in documentation. [1] https://en.cppreference.com/w/cpp/string/byte/isprint

As explained in comment, I don't know the root cause of all of this. If you have an idea you are welcomed !!!!

We don't have getopt on Windows. Let's move command line parsing to docoptcpp as we already use it. We lost parsing case insensitive options on the way.

Older version of docopt doesn't define Options. Let's define it using `using` syntax (as done in recent version of docoptcpp)

mgautierfr force-pushed the fix_no_writer_test branch 2 times, most recently from 51cc0a0 to e8f6b7c Compare August 26, 2024 15:47

mgautierfr marked this pull request as draft August 27, 2024 09:20

mgautierfr force-pushed the fix_no_writer_test branch from e8f6b7c to f38687a Compare August 27, 2024 13:46

mgautierfr changed the title ~~Correctly compile tests if build without writer (on Windows)~~ Correctly compile tests if build without zimwriterfs (on Windows) Aug 27, 2024

mgautierfr changed the title ~~Correctly compile tests if build without zimwriterfs (on Windows)~~ Compile zim-tools on Windows Aug 27, 2024

veloman-yunkan reviewed Aug 27, 2024

View reviewed changes

mgautierfr force-pushed the fix_no_writer_test branch from d2347e8 to 6c9892e Compare August 28, 2024 09:48

mgautierfr force-pushed the fix_no_writer_test branch from 6c9892e to 73eb7c2 Compare August 28, 2024 16:09

mgautierfr mentioned this pull request Aug 28, 2024

Fix dependencies for zim-tools kiwix/kiwix-build#736

Merged

mgautierfr marked this pull request as ready for review August 28, 2024 19:10

mgautierfr requested a review from veloman-yunkan August 28, 2024 19:10

veloman-yunkan reviewed Aug 29, 2024

View reviewed changes

kelson42 added this to the 3.5.0 milestone Aug 29, 2024

kelson42 changed the title ~~Compile zim-tools on Windows~~ Introduce MS Windows CI Aug 29, 2024

mgautierfr mentioned this pull request Sep 1, 2024

Compile zim-tools and kiwix-tools on Windows github CI. kiwix/kiwix-build#728

Merged

mgautierfr requested review from veloman-yunkan and kelson42 September 1, 2024 08:15

veloman-yunkan reviewed Sep 2, 2024

View reviewed changes

mgautierfr force-pushed the fix_no_writer_test branch from 7c80070 to 47f9acc Compare September 2, 2024 12:04

veloman-yunkan approved these changes Sep 2, 2024

View reviewed changes

mgautierfr added 10 commits September 2, 2024 12:18

Add Windows CI.

df5032f

Compilation of zim-tools is broken on Windows. But let's setup the CI to validate the PR.

Correctly compile tests if build without writer (on Windows)

dcf0ca2

Do not set gcc specific option.

d6d67c2

Meson already handle werror and wall, let's use it.

ssize_t is not defined on Windows.

72fabb3

And anyway the argument is storing the output `std::string::size` which is a `size_t`

Remove unused import of dirent.h

37aa9ef

Correctly convert char to unsigned char before calling isprint.

a124217

As specified in isprint documentation[1], it is undefined behavior if input cannot represented as unsigned char. Let's convert it as suggested in documentation. [1] https://en.cppreference.com/w/cpp/string/byte/isprint

Fix getTextLength on Windows

d524aa8

As explained in comment, I don't know the root cause of all of this. If you have an idea you are welcomed !!!!

Fix include of unistd.h on Windows

7745ce7

Port zimcheck to docoptcpp

86315e0

We don't have getopt on Windows. Let's move command line parsing to docoptcpp as we already use it. We lost parsing case insensitive options on the way.

Support older version of docopt.

5a8c6df

Older version of docopt doesn't define Options. Let's define it using `using` syntax (as done in recent version of docoptcpp)

kelson42 force-pushed the fix_no_writer_test branch from 47f9acc to 5a8c6df Compare September 2, 2024 12:18

kelson42 merged commit fd23466 into main Sep 2, 2024
11 checks passed

kelson42 deleted the fix_no_writer_test branch September 2, 2024 12:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce MS Windows CI #425

Introduce MS Windows CI #425

mgautierfr commented Aug 26, 2024

codecov bot commented Aug 26, 2024 •

edited

Loading

kelson42 commented Aug 27, 2024

mgautierfr commented Aug 27, 2024

mgautierfr commented Aug 27, 2024

veloman-yunkan Aug 27, 2024

mgautierfr Aug 28, 2024

mgautierfr commented Aug 28, 2024

mgautierfr commented Aug 28, 2024

veloman-yunkan Aug 29, 2024

mgautierfr Aug 29, 2024

veloman-yunkan Aug 29, 2024 •

edited

Loading

mgautierfr Aug 30, 2024 •

edited

Loading

kelson42 Aug 30, 2024

kelson42 Aug 30, 2024

mgautierfr Aug 30, 2024 •

edited

Loading

veloman-yunkan Aug 30, 2024

kelson42 Aug 31, 2024

mgautierfr Sep 1, 2024

kelson42 commented Aug 30, 2024 •

edited

Loading

mgautierfr commented Aug 30, 2024

veloman-yunkan left a comment

mgautierfr commented Sep 2, 2024

Introduce MS Windows CI #425

Introduce MS Windows CI #425

Conversation

mgautierfr commented Aug 26, 2024

codecov bot commented Aug 26, 2024 • edited Loading

Codecov Report

kelson42 commented Aug 27, 2024

mgautierfr commented Aug 27, 2024

mgautierfr commented Aug 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgautierfr commented Aug 28, 2024

mgautierfr commented Aug 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

veloman-yunkan Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

mgautierfr Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgautierfr Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kelson42 commented Aug 30, 2024 • edited Loading

mgautierfr commented Aug 30, 2024

veloman-yunkan left a comment

Choose a reason for hiding this comment

mgautierfr commented Sep 2, 2024

codecov bot commented Aug 26, 2024 •

edited

Loading

veloman-yunkan Aug 29, 2024 •

edited

Loading

mgautierfr Aug 30, 2024 •

edited

Loading

mgautierfr Aug 30, 2024 •

edited

Loading

kelson42 commented Aug 30, 2024 •

edited

Loading