feat: unicode support #804

bindreams · 2022-11-19T15:10:13Z

Fixes #14 as discussed.

Some notes:

Since necessary facilities were already in place, I also added support for wide strings as options.
I had difficulties coming up with a good API for this. It would not be a good idea to conditionally throw away argc and argv in favor of GetCommandLineW, because argv passed in to parse may have been already modified. Instead, I figured out a way to retrieve "original" argv on other systems and made a parallel interface: app.parse(argc, argv) stays the same and app.parse() uses "original as launched" args.
I exposed the following useful functions to the user:
- argc() and argv() on all systems for "original" args.
- widen and narrow in case the user uses a wide-string API with our UTF-8 string.
- to_path on all systems which creates a filesystem::path, widening the string on Windows.
I had to remove a test which checks existence of a file with Unicode in the name because:
1. File needs to exist beforehand and not get created during test - Unicode can get corrupted the same way during creation and during checking, the test will erroneously pass.
2. CI chokes trying to copy a unicode-named file to the build directory.
Some design decisions which I considered, let me know if you prefer these over what's in the PR:
- app.parse(CLI::system_args) instead of app.parse(), where CLI::system_args is value of a tag type CLI::system_args_t (feel free to suggest a better name for the tag)
- Initialize argc() and argv() as a global variable before main. This is already being done for macOS. Would improve syntax from (e.g. from CLI::argv()[1] to CLI::argv[1]). Would remove any performance costs during main but will push them on users who don't care about Unicode or convert Unicode themselves.
Both cpplint and clang-tidy raised a lot of warnings in preexisting code when running locally. Maybe I am doing something wrong? I had to exclude build/include_subdir from cpplint just to catch actual warnings in terminal.

Also, I made some benchmarks to measure the performance impact of supporting Unicode, since this was a concern. Measuring with google benchmark, all benchmarks receive <program-name> --option1 1234 --option2 "hello world" as commandline:

simple_main - create a CLI::App, options, and parse the old way. Just for reference when judging performance.
get_global_args - call CLI::argv() (without caching into a static variable)
parse_normal - app.parse(argc, argv), i.e. old speed
parse_global - app.parse(), i.e. new speed

Windows:

Run on (24 X 3748.57 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x12)
  L1 Instruction 32 KiB (x12)
  L2 Unified 512 KiB (x12)
  L3 Unified 32768 KiB (x2)
----------------------------------------------------------
Benchmark                Time             CPU   Iterations
----------------------------------------------------------
simple_main           4557 ns         4541 ns       144516
get_global_args        566 ns          578 ns      1000000
parse_normal           568 ns          578 ns      1000000
parse_global          1262 ns         1256 ns       560000

get_global_args and parse_normal being the same time is a little sus, I suspect that maybe there are some optimizations in the benchmark loop which I failed to prevent.

Linux:

Run on (24 X 3700.04 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x12)
  L1 Instruction 32 KiB (x12)
  L2 Unified 512 KiB (x12)
  L3 Unified 32768 KiB (x1)
Load Average: 10.96, 3.19, 1.10
----------------------------------------------------------
Benchmark                Time             CPU   Iterations
----------------------------------------------------------
simple_main           2666 ns         2666 ns       259722
get_global_args       2104 ns         2104 ns       340016
parse_normal           521 ns          521 ns      1024941
parse_global          2818 ns         2818 ns       248291

Overall, for a function called once in the application, I think +2μs of time for Unicode support is acceptable.

codecov · 2022-11-21T01:42:36Z

Codecov Report

Base: 99.45% // Head: 99.45% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (9bda850) compared to base (48624ea).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #804   +/-   ##
=======================================
  Coverage   99.45%   99.45%           
=======================================
  Files          16       18    +2     
  Lines        4019     4069   +50     
=======================================
+ Hits         3997     4047   +50     
  Misses         22       22

Impacted Files	Coverage Δ
include/CLI/App.hpp	`100.00% <ø> (ø)`
include/CLI/Validators.hpp	`99.30% <ø> (ø)`
include/CLI/TypeTools.hpp	`100.00% <100.00%> (ø)`
include/CLI/impl/App_inl.hpp	`99.31% <100.00%> (+<0.01%)`	⬆️
include/CLI/impl/Argv_inl.hpp	`100.00% <100.00%> (ø)`
include/CLI/impl/Encoding_inl.hpp	`100.00% <100.00%> (ø)`
include/CLI/impl/Validators_inl.hpp	`99.50% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

bindreams · 2022-11-21T02:41:16Z

There are Codacy reports which I disagree with but don't have permissions to ignore.

Ready for review.

include/CLI/impl/SlimWindowsH.hpp

phlptp · 2022-11-21T16:52:40Z

for codacy you might be able to to put //NOLINTNEXTLINE on a few depending on the exact tool that is triggering them.

bindreams · 2022-12-10T11:37:53Z

One of the "2 workflows awaiting approval" which someone approved on f7e8901 was a github action "cmake config check", which failed because apparently libidn.so.11 was missing on that particular system. I have no idea why that library was required in the first place (my best guess is the locale-based fallback for older compilers requires it?). Looks like cmake <3.7 is not aware that on ubuntu-latest libidn.so.12 exists instead.

Anyway, since this looks like a cmake bug, I modified github actions to run cmake versions on "appropriate" ubuntu distributions: cmake 3.4~3.10 on ubuntu 18.04, cmake 3.10~3.16 on ubuntu 20.04, and everything newer on ubuntu 22.04. Let me know if this is an ok solution.

I ran cmake config tests on my fork, so they should be good to go. Feel free to approve the remaining workflows again, I'm excited to merge this soon 😅

phlptp · 2022-12-10T13:10:44Z

This was also fixed in #813, so once that is merged and this branch updated it should be good as well.

bindreams · 2022-12-10T14:51:29Z

Looks like coverage failed (even though it succeeded earlier with the same code).
Looking at the report, it seems like codecov decided to stop counting closing braces in unrelated functions. Certainly seems like a bug. Here is what I found: bug report for this in LCOV -> related pull request which was merged 23 days ago. So I guess this can be fixed soon.

In the meantime, I was going to suggest just letting coverage fall, but then I found out that codecov.yml already has informational: true, so I shouldn't ever fail the CI. Not sure what is going on...

phlptp · 2022-12-10T15:26:18Z

I noticed the codecov thing as well, not entirely sure what changed but I think that is resolved in #813 as well. Waiting for @henryiii to review that one, then I think that can be merged and you can rebase this one. This PR is something I want @henryiii to review and approve since it is a pretty big change.

henryiii · 2022-12-10T15:36:44Z

Merged that one. Won't have time to review this till early next week, though.

bindreams · 2023-01-05T20:51:46Z

Re-ran checks after CI had unrelated third-party connection errors.

henryiii · 2023-01-05T21:33:33Z

include/CLI/Config.hpp

@@ -14,7 +14,7 @@
 #include <string>
 #include <utility>
 #include <vector>
-// [CLI11:public_includes:set]
+// [CLI11:public_includes:end]


Wait, why did this work before, actually? It should search to the first "end", which I would have thought would have created a broken single header include. Testing locally, it does look like it just causes #include <cctype> to be missing; I guess this is included through something else so that's why it still works.

After a brief look at the single header script, it seems that the multiline regex for collecting headers needs both "set" and "end" comments. So it just didn't match and by lucky coincidence the necessary headers were included elsewhere.

Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>

henryiii · 2023-01-06T15:11:19Z

I wrote a longer reply yesterday but lost it when switching around. TL;DWA (too long; didn't write again): I think the design decisions are reasonable, I like the argument-less form (looks like Python's argparse), and keeping the cost low for projects that don't care about unicode.

bindreams · 2023-01-12T15:41:17Z

Thanks for your feedback. Does that mean this PR is ready to merge now?

henryiii · 2023-01-12T20:03:12Z

Yes, I think we can put it in, and let it sit for a while. I'm hoping this doesn't affect weird/niche systems negatively by including something they don't have, especially if they aren't using it. But maybe the best way to find out is to see if it breaks anyone in main for a bit.

Spixmaster · 2024-03-07T05:56:57Z

@henryiii Support for ensure_utf8() was released but is stated as a feature only in HEAD in the README.md.

bindreams force-pushed the unicode-support branch 18 times, most recently from 8df6899 to 0974481 Compare November 20, 2022 21:41

bindreams marked this pull request as ready for review November 21, 2022 02:41

phlptp reviewed Nov 21, 2022

View reviewed changes

include/CLI/impl/SlimWindowsH.hpp Outdated Show resolved Hide resolved

henryiii reviewed Jan 5, 2023

View reviewed changes

bindreams and others added 18 commits January 5, 2023 18:01

Fix _Pragma not recognized on MSVC

2bb7d1c

Fix bad macro check

7702fc4

Fix include

eac358d

Fix narrow and widen when codecvt is missing

74cee5e

Fix some weird bug in old MSVC

1ae15cf

Add dependent applications to meson-build

7d25218

Fix precompilation

1f0a809

Fix lint

050f9f2

Fix coverage

70886b5

Update README

6182a98

style: pre-commit.ci fixes

97c808a

Fix lint

05d601c

Fix coverage

7661fd8

Fix optional braces offending clang

b36618b

Remove copied comments from Windows.h

354b7f5

Suppress flawfinder detects

32904fc

Fix cmake config tests failing because of a missing lib

3a059d4

chore: update copyright on new files to 2023

4f13fad

Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>

henryiii force-pushed the unicode-support branch from 0f53c53 to 4f13fad Compare January 5, 2023 23:03

style: pre-commit.ci fixes

9bda850

bindreams mentioned this pull request Jan 12, 2023

Unicode #14

Closed

henryiii merged commit a227cd1 into CLIUtils:main Jan 12, 2023

github-actions bot added the needs changelog Hasn't been added to the changelog yet label Jan 12, 2023

jmarrec mentioned this pull request May 4, 2023

Unicode support: automatically widen on windows when option is assigned a std::filesystem::path #875

Closed

henryiii mentioned this pull request Jun 28, 2023

fix: tests were broken on Catch3 #896

Merged

Porkepix mentioned this pull request Feb 5, 2024

cli11 2.4.0 Homebrew/homebrew-core#161844

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: unicode support #804

feat: unicode support #804

bindreams commented Nov 19, 2022 •

edited

Loading

codecov bot commented Nov 21, 2022 •

edited

Loading

bindreams commented Nov 21, 2022

phlptp commented Nov 21, 2022

bindreams commented Dec 10, 2022 •

edited

Loading

phlptp commented Dec 10, 2022

bindreams commented Dec 10, 2022

phlptp commented Dec 10, 2022

henryiii commented Dec 10, 2022

bindreams commented Jan 5, 2023

henryiii Jan 5, 2023

bindreams Jan 6, 2023 •

edited

Loading

henryiii commented Jan 6, 2023

bindreams commented Jan 12, 2023

henryiii commented Jan 12, 2023

Spixmaster commented Mar 7, 2024

feat: unicode support #804

feat: unicode support #804

Conversation

bindreams commented Nov 19, 2022 • edited Loading

codecov bot commented Nov 21, 2022 • edited Loading

Codecov Report

bindreams commented Nov 21, 2022

phlptp commented Nov 21, 2022

bindreams commented Dec 10, 2022 • edited Loading

phlptp commented Dec 10, 2022

bindreams commented Dec 10, 2022

phlptp commented Dec 10, 2022

henryiii commented Dec 10, 2022

bindreams commented Jan 5, 2023

henryiii Jan 5, 2023

Choose a reason for hiding this comment

bindreams Jan 6, 2023 • edited Loading

Choose a reason for hiding this comment

henryiii commented Jan 6, 2023

bindreams commented Jan 12, 2023

henryiii commented Jan 12, 2023

Spixmaster commented Mar 7, 2024

bindreams commented Nov 19, 2022 •

edited

Loading

codecov bot commented Nov 21, 2022 •

edited

Loading

bindreams commented Dec 10, 2022 •

edited

Loading

bindreams Jan 6, 2023 •

edited

Loading