Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ut): use valid UTF8 sequence instead #4315

Merged
merged 4 commits into from
Mar 17, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 55 additions & 6 deletions dbms/src/Functions/tests/gtest_strings_simd_consistency.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,47 @@

namespace DB
{

char UNIT_TEST_RUSSIAN_ALPHABET[] = u8"ѨѩѬѭѦѧѪѫѠѡѰѱѮѯЅѕѴѵѲѳѢѣІіЯяЮюЭэЬьЫыЪъЩщШшЧчЦцХхФфУуТтСсРрПпОоНнМмЛлКкЙйИиЗзЖжЁёЕеДдГгВвБбАа";
char UNIT_TEST_ENGLISH_ALPHABET[] = u8"A a B b C c D d E e F f G g H h I i J j K k L l M m N n O o P p Q q R r S s T t U u V v W w X x Y y Z z";
char UNIT_TEST_GREEK_ALPHABET[] = u8"Α α, Β β, Γ γ, Δ δ, Ε ε, Ζ ζ, Η η, Θ θ, Ι ι, Κ κ, Λ λ, Μ μ, Ν ν, Ξ ξ, Ο ο, Π π, Ρ ρ, Σ σ/ς, Τ τ, Υ υ, Φ φ, Χ χ, Ψ ψ, and Ω ω.";
char UNIT_TEST_CHINESE_PARAGRAPH[] = u8"PingCAP 成立于 2015 年,是一家企业级开源分布式数据库厂商,提供包括开源分布式数据库产品、解决方案与咨询、技术支持与培训认证服务,致力于为全球行业用户提供稳定高效、安全可靠、开放兼容的新型数据服务平台,"
u8"解放企业生产力,加速企业数字化转型升级。在帮助企业释放增长空间的同时,也提供了一份具有高度可参考性的开源建设实践样本。"
u8"由 PingCAP 创立的分布式关系型数据库 TiDB,为企业关键业务打造,具备「分布式强一致事务、在线弹性水平扩展、故障自恢复的高可用、跨数据中心多活」等企业级核心特性,帮助企业最大化发挥数据价值,充分释放企业增长空间。";
char UNIT_TEST_JAPANESE_PARAGRAPH[] = u8"PingCAPは2015年に3人のインフラストラクチャエンジニアによってスタートしました。 3人はインターネット企業のデータベース管理者として、データベースの管理・スケーリング・運用・保守の業務に莫大な手間と時間に日々頭を抱えておりました。"
u8"市場に良い解決策がないため、彼らはオープンソースで解決策を構築することに決めました。"
u8"PingCAPは、一流のチームと世界中のコントリビューターの協力を得て、オープンソースの分散型NewSQLハイブリッドトランザクションおよび分析処理(HTAP)データベースを構築しています。"
u8"メインに開発したプロジェクトTiDBとは、MySQLと互換性のあるクラウドネイティブの分散SQLレイヤーであり、世界で最も人気のあるオープンソースデータベースプロジェクトの1つです。"
u8"TiDBの関連プロジェクトTiKVは、クラウドネイティブの分散型Key-Valueストアです。現在CNCFの卒業プロジェクトになります。";
char UNIT_TEST_ENGLISH_PARAGRAPH[] = u8"PingCAP started in 2015 when three seasoned infrastructure engineers were sick and tired of the way databases were managed, scaled, and maintained while working at leading Internet companies."
u8"Seeing no good solution on the market, they decided to build one themselves — the open-source way."
u8"With the help of a first-class team, and hundreds of contributors from around the globe, PingCAP is building an open-source distributed NewSQL Hybrid Transactional and Analytical Processing (HTAP)"
u8"database. TiDB, our flagship project, is a cloud-native distributed SQL layer with MySQL compatibility, and one of the most popular open-source database projects in the world (don’t take our word for"
u8"it, check it out). TiDB’s sister project, TiKV, is a cloud-native distributed Key-Value store. It is now a CNCF Graduated project.";

template <class Gen>
std::string_view getRandomString(Gen & gen)
{
size_t seed = gen() % 6;
switch (seed)
{
case 0:
return UNIT_TEST_RUSSIAN_ALPHABET;
case 1:
return UNIT_TEST_ENGLISH_ALPHABET;
case 2:
return UNIT_TEST_GREEK_ALPHABET;
case 3:
return UNIT_TEST_CHINESE_PARAGRAPH;
case 4:
return UNIT_TEST_JAPANESE_PARAGRAPH;
case 5:
return UNIT_TEST_ENGLISH_PARAGRAPH;
default:
__builtin_unreachable();
}
}

template <char not_case_lower_bound, char not_case_upper_bound>
struct LowerUpperImpl
{
Expand Down Expand Up @@ -337,14 +378,22 @@ TEST(StringsLowerUpperUtf8, Random)
std::vector<UInt8> res_new(limit + 1, 0);
std::vector<UInt8> res_old(limit + 1, 0);
std::default_random_engine eng(seed);
std::uniform_int_distribution<UInt8> dist(
std::numeric_limits<UInt8>::min(),
std::numeric_limits<UInt8>::max());
for (auto & i : data)
std::uniform_int_distribution dist('A', 'z');
size_t size = 0;
size_t target = data.size() - 1;
while (size < target)
{
i = dist(eng);
auto t = getRandomString(eng);
if (t.size() > target - size)
{
data[size++] = dist(eng);
}
else
{
std::copy(t.begin(), t.end(), data.begin() + size);
size += t.size();
}
}
data.back() = 0;
{
{
auto begin = high_resolution_clock::now();
Expand Down