Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce MS Windows CI #425

Merged
merged 10 commits into from
Sep 2, 2024
Merged

Introduce MS Windows CI #425

merged 10 commits into from
Sep 2, 2024

Conversation

mgautierfr
Copy link
Collaborator

No description provided.

Copy link

codecov bot commented Aug 26, 2024

Codecov Report

Attention: Patch coverage is 39.28571% with 34 lines in your changes missing coverage. Please review.

Project coverage is 26.13%. Comparing base (06b9916) to head (5a8c6df).
Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
src/zimcheck/zimcheck.cpp 41.66% 0 Missing and 28 partials ⚠️
src/zimdump.cpp 0.00% 3 Missing ⚠️
src/zimsplit.cpp 0.00% 2 Missing ⚠️
src/metadata.cpp 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #425      +/-   ##
==========================================
- Coverage   27.01%   26.13%   -0.88%     
==========================================
  Files          26       26              
  Lines        2458     2437      -21     
  Branches     1339     1362      +23     
==========================================
- Hits          664      637      -27     
+ Misses       1305     1292      -13     
- Partials      489      508      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mgautierfr mgautierfr force-pushed the fix_no_writer_test branch 2 times, most recently from 51cc0a0 to e8f6b7c Compare August 26, 2024 15:47
@mgautierfr mgautierfr marked this pull request as draft August 27, 2024 09:20
@kelson42
Copy link
Contributor

Considering that writer is available everywhere in libzim, why do we still have anything "without writer"?

@mgautierfr mgautierfr changed the title Correctly compile tests if build without writer (on Windows) Correctly compile tests if build without zimwriterfs (on Windows) Aug 27, 2024
@mgautierfr
Copy link
Collaborator Author

This is zimwriterfs. We don't have libmagic on Windows.
The first commit of this PR (so the name) was about deactivating tests associated to zimwriterfs (and needed dependencies) when trying to compile without it (on Windows).

Since then, compilation of zim-tools on Windows was a bit more complex than expected so the PR is also more complex.

@mgautierfr mgautierfr changed the title Correctly compile tests if build without zimwriterfs (on Windows) Compile zim-tools on Windows Aug 27, 2024
@mgautierfr
Copy link
Collaborator Author

This Pr need port of zimcheck to docoptcpp to be "finished". (Will do)

@veloman-yunkan Please can you have a look to 5186d50 (#425)
I don't have a damn idea why we have this behavior.

Comment on lines -51 to +65
return icu::UnicodeString::fromUTF8(utf8EncodedString).length();
// For some unknown reason implicite convertion from std::string to icu::StringPiece
// is broken on Windows.
// Constructors are definde in stringpiece.h as
// ```
// StringPiece(const std::string& str)
// : ptr_(str.data()), length_(static_cast<int32_t>(str.size())) { }
// StringPiece(const char* offset, int32_t len) : ptr_(offset), length_(len) { }
// ```
// However using the first constructor ends with a corrupted StringPiece (wrong ptr)
// and using second one works. Don't ask me why
// This is broken
// icu::StringPiece stringPiece(utf8EncodedString);
// This is not
icu::StringPiece stringPiece(utf8EncodedString.data(), static_cast<int32_t>(utf8EncodedString.size()));
return icu::UnicodeString::fromUTF8(stringPiece).length();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only explanation that I could find is because of the following template constructor of StringPiece:

  /**
   * Constructs from some other implementation of a string piece class, from any
   * C++ record type that has these two methods:
   *
   * \code{.cpp}
   *
   *   struct OtherStringPieceClass {
   *     const char* data();  // or const char8_t*
   *     size_t size();
   *   };
   *
   * \endcode
   *
   * The other string piece class will typically be std::string_view from C++17
   * or absl::string_view from Abseil.
   *
   * Starting with C++20, data() may also return a const char8_t* pointer,
   * as from std::u8string_view.
   *
   * @param str the other string piece
   * @stable ICU 65
   */
  template <typename T,
            typename = typename std::enable_if<
                (std::is_same<decltype(T().data()), const char*>::value
#if defined(__cpp_char8_t)
                    || std::is_same<decltype(T().data()), const char8_t*>::value
#endif
                ) &&
                std::is_same<decltype(T().size()), size_t>::value>::type>
  StringPiece(T str)
      : ptr_(reinterpret_cast<const char*>(str.data())),
        length_(static_cast<int32_t>(str.size())) {}

If overload resolution for some reason chooses that constructor over StringPiece(const std::string& str) then the string is passed into it by value, and the pointer is bound to the data of a temporary object that is destroyed after the completion of the StringPiece constructor. That explanation is valid for an explicit stringPiece variable from your commented out example, since the temporary object definitely doesn't survive beyond that point. I am less sure if it can also be valid for the original single liner implementation, as I thought that the temporary objects are required to survive until the end of the full expression in the context of which they were created. But it can be a bug in the compiler (on top of the other bug that chooses the wrong overload).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea but not conclusive.
I have removed this constructor in the stringpiece header and same error occurs.

I am less sure if it can also be valid for the original single liner implementation, as I thought that the temporary objects are required to survive until the end of the full expression in the context of which they were created. But it can be a bug in the compiler (on top of the other bug that chooses the wrong overload).

I confirm to you that single liner version is not helping.
A single line "fix" is return icu::UnicodeString::fromUTF8(utf8EncodedString.data()).length(); but we may truncate binary data containing \0 so I prefer be explicit about the data size.

@mgautierfr
Copy link
Collaborator Author

Failing tests on Windows seems to be related to docopt/docopt.cpp#49
The workaround is to tell docopt to use boost regex instead of std::regex.

But we don't have boost in kiwix-build...

@mgautierfr
Copy link
Collaborator Author

kiwix-build now build docoptcpp with boost.regex on Windows.

@mgautierfr mgautierfr marked this pull request as ready for review August 28, 2024 19:10
Comment on lines 54 to 70
-a --all run all tests. Default if no flags are given.
-0 --empty Empty content
-c --checksum Internal CheckSum Test
-i --integrity Low-level correctness/integrity checks
-m --metadata MetaData Entries
-f --favicon Favicon
-p --main Main page
-r --redundant Redundant data check
-u --url_internal URL check - Internal URLs
-x --url_external URL check - External URLs
-d --details Details of error
-b --progress Print progress report
-j --json Output in JSON format
-h --help Displays Help
-v --version Displays software version
-l --redirect_loop Checks for the existence of redirect loops
-w=<nb_thread> --threads=<nb_thread> count of threads to utilize [default: 1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes sense to mention in the commit message that short options are now accepted only in lower case (or make the short options UPPER CASE in the usage string, like before, and emphasize that they are accepted only as UPPER CASE).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is We lost parsing case insensitive options on the way..
But I can be more explicit.

I wonder why we need upper case option ?
Almost all tools I know use lower case option and never uppercase.

Copy link
Collaborator

@veloman-yunkan veloman-yunkan Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is We lost parsing case insensitive options on the way..

Oops, didn't notice it.

I wonder why we need upper case option ? Almost all tools I know use lower case option and never uppercase.

I agree. But the old usage string documented upper case short options. If zimcheck were a popular tool with usage in a lot of scripts this change won't be welcome.

Copy link
Collaborator Author

@mgautierfr mgautierfr Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If zimcheck were a popular tool with usage in a lot of scripts this change won't be welcome.

I agree. But not sure it is such a popular tool :)
@kelson42 @rgaudin is it ok for you ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask the technical reason why we have such a move in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if this is confirmed that lowercase args would be better from the user perspective and habits AND we have no reasonable technical alternative, then I guess we can move forward like this but we need to make a major release!

Copy link
Collaborator Author

@mgautierfr mgautierfr Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I have just check and it seens to NOT work.
But we can keep the upper case short option only (as it was told in previous usage string). But lower case short option would not work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least we can preserve support for upper case short options (because that's how they were documented) instead of switching to lower case options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep uppercase short option only then

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/zimcheck/zimcheck.cpp Show resolved Hide resolved
src/zimcheck/zimcheck.cpp Show resolved Hide resolved
@kelson42 kelson42 added this to the 3.5.0 milestone Aug 29, 2024
@kelson42 kelson42 changed the title Compile zim-tools on Windows Introduce MS Windows CI Aug 29, 2024
@kelson42
Copy link
Contributor

kelson42 commented Aug 30, 2024

@mgautierfr This PR seems to brake packaging for Ubuntu jammy and focal because they are working on main, see https://github.com/openzim/zim-tools/actions/runs/10540978671

@mgautierfr
Copy link
Collaborator Author

Last commit fix compilation with older docoptcpp version found in jammy and focal distribution.

Copy link
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's squash the last commit ("Move back to upper case short options.") into "Port zimcheck to docoptcpp.") and the PR can be merged.

@mgautierfr
Copy link
Collaborator Author

Done

Compilation of zim-tools is broken on Windows.
But let's setup the CI to validate the PR.
Meson already handle werror and wall, let's use it.
And anyway the argument is storing the output `std::string::size` which is a `size_t`
As specified in isprint documentation[1], it is undefined behavior if
input cannot represented as unsigned char.

Let's convert it as suggested in documentation.

[1] https://en.cppreference.com/w/cpp/string/byte/isprint
As explained in comment, I don't know the root cause of all of this.

If you have an idea you are welcomed !!!!
We don't have getopt on Windows.
Let's move command line parsing to docoptcpp as we already use it.

We lost parsing case insensitive options on the way.
Older version of docopt doesn't define Options.

Let's define it using `using` syntax (as done in recent version of docoptcpp)
@kelson42 kelson42 merged commit fd23466 into main Sep 2, 2024
11 checks passed
@kelson42 kelson42 deleted the fix_no_writer_test branch September 2, 2024 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants