OpenCCSharp is a (WIP) .NET 8 library for Chinese text script variant conversion. This project is inspired by BYVoid/OpenCC, and used similar approach in word segmentation. This project also used the same conversion dictionary in OpenCC project.
While the v0.0.1-int.6 of the packages support both .NET 6.0 and .NET 8.0, further versions might drop support for .NET 6.0 as this .NET SDK reaching its EOL.
This library is in its very early INT phase and its API is still subject to drastic changes. There is also much optimizations yet to be employed.
To install the NuGet package with preset Chinese variant converters, use
# Package Management Console
Install-Package CXuesong.OpenCCSharp.Presets -Prerelease
# .NET CLI
dotnet add package CXuesong.OpenCCSharp.Presets --prerelease
For now there is no detailed usage documented here, but you can getting started by playing with OpenCCSharp.Presets.ChineseConversionPresets
.
This project also powers OpenCC# WebApp, a serverless Chinese variant conversion web app, with the help of WASM.
- CXuesong.OpenCCSharp.Conversion |
- Provides object model and data structure for variant conversion.
- CXuesong.OpenCCSharp.Presets |
- Provides pre-configured Chinese variant converters.
With Visual Studio 2022 or VSCode + .NET SDK 6.
Currently, this library looks generally 3x faster than the OpenCC library implemented in native C++ on Ubuntu.
Open CC benchmark output
Compared benchmarks: BM_Convert/x v.s. DupSequence
https://github.com/BYVoid/OpenCC/actions/runs/10229279965/job/28302747342
1: Test command: /home/runner/work/OpenCC/OpenCC/build/perf/src/benchmark/performance
1: Working Directory: /home/runner/work/OpenCC/OpenCC/build/perf/src/benchmark
1: Test timeout computed to be: 1500
1: 2024-08-03T16:08:09+00:00
1: Running /home/runner/work/OpenCC/OpenCC/build/perf/src/benchmark/performance
1: Run on (4 X 3218.97 MHz CPU s)
1: CPU Caches:
1: L1 Data 32 KiB (x2)
1: L1 Instruction 32 KiB (x2)
1: L2 Unified 512 KiB (x2)
1: L3 Unified 32768 KiB (x1)
1: Load Average: 0.92, 0.41, 0.15
1: ------------------------------------------------------------------
1: Benchmark Time CPU Iterations
1: ------------------------------------------------------------------
1: BM_Initialization/hk2s 1.09 ms 1.09 ms 638
1: BM_Initialization/hk2t 0.152 ms 0.151 ms 4618
1: BM_Initialization/jp2t 0.235 ms 0.235 ms 2974
1: BM_Initialization/s2hk 30.2 ms 30.2 ms 23
1: BM_Initialization/s2t 29.6 ms 29.6 ms 24
1: BM_Initialization/s2tw 29.7 ms 29.7 ms 24
1: BM_Initialization/s2twp 30.0 ms 30.0 ms 23
1: BM_Initialization/t2hk 0.072 ms 0.072 ms 9975
1: BM_Initialization/t2jp 0.185 ms 0.184 ms 3795
1: BM_Initialization/t2s 0.959 ms 0.959 ms 731
1: BM_Initialization/tw2s 1.01 ms 1.01 ms 692
1: BM_Initialization/tw2sp 1.24 ms 1.24 ms 563
1: BM_Initialization/tw2t 0.097 ms 0.097 ms 7209
1: BM_Convert2M 385 ms 385 ms 2
1: BM_Convert/100 0.757 ms 0.757 ms 929
1: BM_Convert/1000 7.74 ms 7.74 ms 90
1: BM_Convert/10000 79.3 ms 79.3 ms 9
1: BM_Convert/100000 837 ms 837 ms 1
1/1 Test #1: BenchmarkTest .................... Passed 17.54 sec
100% tests passed, 0 tests failed out of 1
- OpenCCTest: Measuring the time and memory footprint when executing original OpenCC tests (
BM_Initialization/*
), excl. loading preset conversion rules. - BulkConversionTest: Measuring the time and memory footprint when executing original OpenCC tests (
BM_Convert*
), excl. loading preset conversion rules. - PresetLoadTest: Measuring the time and memory footprint when loading the preset conversion rules with
ChineseConversionPresets.GetConverterAsync
.
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
Gen 0 : GC Generation 0 collects per 1000 operations
Gen 1 : GC Generation 1 collects per 1000 operations
Gen 2 : GC Generation 2 collects per 1000 operations
Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
1 ms : 1 Millisecond (0.001 sec)
BenchmarkDotNet v0.14.0, Windows 10 (10.0.20348.2582) (Hyper-V)
AMD EPYC 7763, 1 CPU, 4 logical and 2 physical cores
.NET SDK 8.0.303
[Host] : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2
DefaultJob : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2
Method | arguments | Mean | Error | StdDev | Gen0 | Allocated |
---|---|---|---|---|---|---|
OpenCCTest | HK -> Hans [hk2s] | 3.886 μs | 0.0205 μs | 0.0182 μs | 0.0458 | 832 B |
OpenCCTest | HK -> Hant [hk2t] | 1.936 μs | 0.0038 μs | 0.0033 μs | 0.0191 | 344 B |
OpenCCTest | Hans -> HK [s2hk] | 4.298 μs | 0.0220 μs | 0.0184 μs | 0.0610 | 1112 B |
OpenCCTest | Hans -> Hant [s2t] | 32.660 μs | 0.0617 μs | 0.0547 μs | 0.1221 | 2312 B |
OpenCCTest | Hans -> TW [s2twp] | 18.815 μs | 0.0367 μs | 0.0287 μs | 0.0916 | 1752 B |
OpenCCTest | Hant -> HK [t2hk] | 1.228 μs | 0.0029 μs | 0.0026 μs | 0.0267 | 472 B |
OpenCCTest | Hant -> Hans [t2s] | 11.406 μs | 0.0163 μs | 0.0145 μs | 0.0305 | 512 B |
OpenCCTest | Kyujitai -> Shinjitai [t2jp] | 2.961 μs | 0.0116 μs | 0.0103 μs | 0.0381 | 672 B |
OpenCCTest | Shinjitai -> Kyujitai [jp2t] | 5.546 μs | 0.0102 μs | 0.0080 μs | 0.0381 | 680 B |
OpenCCTest | TW -> Hans [tw2sp] | 21.923 μs | 0.0240 μs | 0.0224 μs | 0.0610 | 1512 B |
OpenCCTest | TW -> Hant [tw2t] | 8.032 μs | 0.0214 μs | 0.0189 μs | 0.0305 | 536 B |
Method | arguments | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|---|
BulkConversionTest | DupSequence 100 Hant -> Hans (2700 chars) | 223.3 μs | 0.19 μs | 0.16 μs | 0.2441 | - | - | 5.41 KB |
BulkConversionTest | DupSequence 1000 Hant -> Hans (27000 chars) | 2,226.0 μs | 3.84 μs | 3.40 μs | - | - | - | 52.88 KB |
BulkConversionTest | DupSequence 10000 Hant -> Hans (270000 chars) | 23,196.8 μs | 104.14 μs | 97.41 μs | 93.7500 | 93.7500 | 93.7500 | 528.29 KB |
BulkConversionTest | DupSequence 100000 Hant -> Hans (2700000 chars) | 221,964.1 μs | 1,001.78 μs | 937.07 μs | - | - | - | 5273.91 KB |
BulkConversionTest | Zuozhuan Hans -> Hant (629833 chars) | 132,030.2 μs | 292.90 μs | 273.98 μs | - | - | - | 1230.53 KB |
Method | fromVariant | toVariant | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
PresetLoadTest | Hans | Hant | 23,272.27 μs | 318.163 μs | 297.610 μs | 468.7500 | 281.2500 | 62.5000 | 8437.38 KB |
PresetLoadTest | Hans | HK | 22,383.29 μs | 149.199 μs | 132.261 μs | 468.7500 | 218.7500 | 62.5000 | 8443.69 KB |
PresetLoadTest | Hans | TW | 25,596.93 μs | 165.259 μs | 154.583 μs | 531.2500 | 437.5000 | 62.5000 | 8570.07 KB |
PresetLoadTest | Hant | Hans | 953.79 μs | 4.662 μs | 3.893 μs | 19.5313 | 7.8125 | - | 340.11 KB |
PresetLoadTest | Hant | HK | 49.87 μs | 0.404 μs | 0.338 μs | 0.4883 | - | - | 8.83 KB |
PresetLoadTest | Hant | TW | 328.96 μs | 2.975 μs | 2.323 μs | 7.8125 | 1.9531 | - | 134.37 KB |
PresetLoadTest | Kyujitai | Shinjitai | 110.13 μs | 1.419 μs | 1.258 μs | 1.4648 | - | - | 29.24 KB |
PresetLoadTest | Shinjitai | Kyujitai | 188.91 μs | 2.178 μs | 1.930 μs | 3.9063 | - | - | 64.1 KB |
PresetLoadTest | HK | Hans | 1,068.99 μs | 4.964 μs | 4.145 μs | 15.6250 | - | - | 377.47 KB |
PresetLoadTest | HK | Hant | 142.35 μs | 2.732 μs | 3.455 μs | 2.4414 | - | - | 39.96 KB |
PresetLoadTest | TW | Hans | 1,315.98 μs | 17.470 μs | 14.588 μs | 23.4375 | 7.8125 | - | 491.61 KB |
PresetLoadTest | TW | Hant | 370.78 μs | 7.387 μs | 6.909 μs | 7.8125 | 1.9531 | - | 154.05 KB |
BenchmarkDotNet v0.14.0, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 4 logical and 2 physical cores
.NET SDK 8.0.303
[Host] : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2
DefaultJob : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX2
Method | arguments | Mean | Error | StdDev | Gen0 | Allocated |
---|---|---|---|---|---|---|
OpenCCTest | HK -> Hans [hk2s] | 4.059 μs | 0.0228 μs | 0.0191 μs | 0.0076 | 832 B |
OpenCCTest | HK -> Hant [hk2t] | 1.917 μs | 0.0113 μs | 0.0094 μs | 0.0038 | 344 B |
OpenCCTest | Hans -> HK [s2hk] | 4.252 μs | 0.0074 μs | 0.0062 μs | 0.0076 | 1112 B |
OpenCCTest | Hans -> Hant [s2t] | 31.786 μs | 0.2386 μs | 0.2232 μs | - | 2312 B |
OpenCCTest | Hans -> TW [s2twp] | 17.514 μs | 0.0852 μs | 0.0755 μs | - | 1752 B |
OpenCCTest | Hant -> HK [t2hk] | 1.188 μs | 0.0060 μs | 0.0056 μs | 0.0038 | 472 B |
OpenCCTest | Hant -> Hans [t2s] | 10.442 μs | 0.0553 μs | 0.0518 μs | - | 512 B |
OpenCCTest | Kyujitai -> Shinjitai [t2jp] | 3.051 μs | 0.0106 μs | 0.0099 μs | 0.0076 | 672 B |
OpenCCTest | Shinjitai -> Kyujitai [jp2t] | 5.683 μs | 0.0255 μs | 0.0239 μs | 0.0076 | 680 B |
OpenCCTest | TW -> Hans [tw2sp] | 20.195 μs | 0.0904 μs | 0.0801 μs | - | 1512 B |
OpenCCTest | TW -> Hant [tw2t] | 7.689 μs | 0.0734 μs | 0.0687 μs | - | 536 B |
Method | arguments | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|---|
BulkConversionTest | DupSequence 100 Hant -> Hans (2700 chars) | 219.0 μs | 0.99 μs | 0.88 μs | - | - | - | 5.41 KB |
BulkConversionTest | DupSequence 1000 Hant -> Hans (27000 chars) | 2,208.0 μs | 9.52 μs | 8.44 μs | - | - | - | 52.88 KB |
BulkConversionTest | DupSequence 10000 Hant -> Hans (270000 chars) | 22,112.3 μs | 74.00 μs | 65.60 μs | 93.7500 | 93.7500 | 93.7500 | 527.85 KB |
BulkConversionTest | DupSequence 100000 Hant -> Hans (2700000 chars) | 222,195.1 μs | 2,005.28 μs | 1,875.74 μs | - | - | - | 5273.91 KB |
BulkConversionTest | Zuozhuan Hans -> Hant (629833 chars) | 127,994.6 μs | 430.43 μs | 402.63 μs | - | - | - | 1230.53 KB |
Method | fromVariant | toVariant | Mean | Error | StdDev | Median | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
PresetLoadTest | Hans | Hant | 22,507.68 μs | 223.761 μs | 209.306 μs | 22,432.45 μs | 93.7500 | 93.7500 | 62.5000 | 8424.13 KB |
PresetLoadTest | Hans | HK | 22,539.12 μs | 91.660 μs | 81.254 μs | 22,516.46 μs | 93.7500 | 93.7500 | 62.5000 | 8430.44 KB |
PresetLoadTest | Hans | TW | 24,090.32 μs | 68.093 μs | 56.860 μs | 24,080.34 μs | 156.2500 | 62.5000 | 62.5000 | 8556.82 KB |
PresetLoadTest | Hant | Hans | 1,028.78 μs | 18.979 μs | 17.753 μs | 1,027.27 μs | 3.9063 | - | - | 339.55 KB |
PresetLoadTest | Hant | HK | 34.86 μs | 1.558 μs | 4.595 μs | 32.40 μs | - | - | - | 8.8 KB |
PresetLoadTest | Hant | TW | 352.72 μs | 6.252 μs | 13.853 μs | 349.84 μs | - | - | - | 134.09 KB |
PresetLoadTest | Kyujitai | Shinjitai | 126.33 μs | 2.880 μs | 8.493 μs | 127.69 μs | - | - | - | 29.2 KB |
PresetLoadTest | Shinjitai | Kyujitai | 218.44 μs | 4.341 μs | 10.649 μs | 219.17 μs | - | - | - | 63.96 KB |
PresetLoadTest | HK | Hans | 1,149.91 μs | 18.179 μs | 20.935 μs | 1,144.17 μs | 3.9063 | - | - | 376.92 KB |
PresetLoadTest | HK | Hant | 139.83 μs | 2.965 μs | 8.695 μs | 138.59 μs | - | - | - | 39.89 KB |
PresetLoadTest | TW | Hans | 1,373.88 μs | 21.879 μs | 24.318 μs | 1,368.58 μs | 5.8594 | 1.9531 | - | 490.77 KB |
PresetLoadTest | TW | Hant | 373.56 μs | 7.362 μs | 14.531 μs | 367.88 μs | - | - | - | 153.77 KB |