-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce MS Windows CI #425
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #425 +/- ##
==========================================
- Coverage 27.01% 26.13% -0.88%
==========================================
Files 26 26
Lines 2458 2437 -21
Branches 1339 1362 +23
==========================================
- Hits 664 637 -27
+ Misses 1305 1292 -13
- Partials 489 508 +19 ☔ View full report in Codecov by Sentry. |
51cc0a0
to
e8f6b7c
Compare
e8f6b7c
to
f38687a
Compare
Considering that writer is available everywhere in libzim, why do we still have anything "without writer"? |
This is zimwriterfs. We don't have libmagic on Windows. Since then, compilation of zim-tools on Windows was a bit more complex than expected so the PR is also more complex. |
This Pr need port of zimcheck to docoptcpp to be "finished". (Will do) @veloman-yunkan Please can you have a look to |
return icu::UnicodeString::fromUTF8(utf8EncodedString).length(); | ||
// For some unknown reason implicite convertion from std::string to icu::StringPiece | ||
// is broken on Windows. | ||
// Constructors are definde in stringpiece.h as | ||
// ``` | ||
// StringPiece(const std::string& str) | ||
// : ptr_(str.data()), length_(static_cast<int32_t>(str.size())) { } | ||
// StringPiece(const char* offset, int32_t len) : ptr_(offset), length_(len) { } | ||
// ``` | ||
// However using the first constructor ends with a corrupted StringPiece (wrong ptr) | ||
// and using second one works. Don't ask me why | ||
// This is broken | ||
// icu::StringPiece stringPiece(utf8EncodedString); | ||
// This is not | ||
icu::StringPiece stringPiece(utf8EncodedString.data(), static_cast<int32_t>(utf8EncodedString.size())); | ||
return icu::UnicodeString::fromUTF8(stringPiece).length(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only explanation that I could find is because of the following template constructor of StringPiece
:
/**
* Constructs from some other implementation of a string piece class, from any
* C++ record type that has these two methods:
*
* \code{.cpp}
*
* struct OtherStringPieceClass {
* const char* data(); // or const char8_t*
* size_t size();
* };
*
* \endcode
*
* The other string piece class will typically be std::string_view from C++17
* or absl::string_view from Abseil.
*
* Starting with C++20, data() may also return a const char8_t* pointer,
* as from std::u8string_view.
*
* @param str the other string piece
* @stable ICU 65
*/
template <typename T,
typename = typename std::enable_if<
(std::is_same<decltype(T().data()), const char*>::value
#if defined(__cpp_char8_t)
|| std::is_same<decltype(T().data()), const char8_t*>::value
#endif
) &&
std::is_same<decltype(T().size()), size_t>::value>::type>
StringPiece(T str)
: ptr_(reinterpret_cast<const char*>(str.data())),
length_(static_cast<int32_t>(str.size())) {}
If overload resolution for some reason chooses that constructor over StringPiece(const std::string& str)
then the string is passed into it by value, and the pointer is bound to the data of a temporary object that is destroyed after the completion of the StringPiece
constructor. That explanation is valid for an explicit stringPiece
variable from your commented out example, since the temporary object definitely doesn't survive beyond that point. I am less sure if it can also be valid for the original single liner implementation, as I thought that the temporary objects are required to survive until the end of the full expression in the context of which they were created. But it can be a bug in the compiler (on top of the other bug that chooses the wrong overload).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea but not conclusive.
I have removed this constructor in the stringpiece header and same error occurs.
I am less sure if it can also be valid for the original single liner implementation, as I thought that the temporary objects are required to survive until the end of the full expression in the context of which they were created. But it can be a bug in the compiler (on top of the other bug that chooses the wrong overload).
I confirm to you that single liner version is not helping.
A single line "fix" is return icu::UnicodeString::fromUTF8(utf8EncodedString.data()).length();
but we may truncate binary data containing \0
so I prefer be explicit about the data size.
d2347e8
to
6c9892e
Compare
Failing tests on Windows seems to be related to docopt/docopt.cpp#49 But we don't have boost in kiwix-build... |
6c9892e
to
73eb7c2
Compare
kiwix-build now build docoptcpp with boost.regex on Windows. |
src/zimcheck/zimcheck.cpp
Outdated
-a --all run all tests. Default if no flags are given. | ||
-0 --empty Empty content | ||
-c --checksum Internal CheckSum Test | ||
-i --integrity Low-level correctness/integrity checks | ||
-m --metadata MetaData Entries | ||
-f --favicon Favicon | ||
-p --main Main page | ||
-r --redundant Redundant data check | ||
-u --url_internal URL check - Internal URLs | ||
-x --url_external URL check - External URLs | ||
-d --details Details of error | ||
-b --progress Print progress report | ||
-j --json Output in JSON format | ||
-h --help Displays Help | ||
-v --version Displays software version | ||
-l --redirect_loop Checks for the existence of redirect loops | ||
-w=<nb_thread> --threads=<nb_thread> count of threads to utilize [default: 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it makes sense to mention in the commit message that short options are now accepted only in lower case (or make the short options UPPER CASE in the usage string, like before, and emphasize that they are accepted only as UPPER CASE).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is We lost parsing case insensitive options on the way.
.
But I can be more explicit.
I wonder why we need upper case option ?
Almost all tools I know use lower case option and never uppercase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is
We lost parsing case insensitive options on the way.
.
Oops, didn't notice it.
I wonder why we need upper case option ? Almost all tools I know use lower case option and never uppercase.
I agree. But the old usage string documented upper case short options. If zimcheck were a popular tool with usage in a lot of scripts this change won't be welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask the technical reason why we have such a move in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if this is confirmed that lowercase args would be better from the user perspective and habits AND we have no reasonable technical alternative, then I guess we can move forward like this but we need to make a major release!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I have just check and it seens to NOT work.
But we can keep the upper case short option only (as it was told in previous usage string). But lower case short option would not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least we can preserve support for upper case short options (because that's how they were documented) instead of switching to lower case options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets keep uppercase short option only then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@mgautierfr This PR seems to brake packaging for Ubuntu jammy and focal because they are working on |
Last commit fix compilation with older docoptcpp version found in jammy and focal distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's squash the last commit ("Move back to upper case short options.") into "Port zimcheck to docoptcpp.") and the PR can be merged.
7c80070
to
47f9acc
Compare
Done |
Compilation of zim-tools is broken on Windows. But let's setup the CI to validate the PR.
Meson already handle werror and wall, let's use it.
And anyway the argument is storing the output `std::string::size` which is a `size_t`
As specified in isprint documentation[1], it is undefined behavior if input cannot represented as unsigned char. Let's convert it as suggested in documentation. [1] https://en.cppreference.com/w/cpp/string/byte/isprint
As explained in comment, I don't know the root cause of all of this. If you have an idea you are welcomed !!!!
We don't have getopt on Windows. Let's move command line parsing to docoptcpp as we already use it. We lost parsing case insensitive options on the way.
Older version of docopt doesn't define Options. Let's define it using `using` syntax (as done in recent version of docoptcpp)
47f9acc
to
5a8c6df
Compare
No description provided.