-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode check for crate_name attribute #2463
Conversation
// FIXME: move this to rust-unicode.h? | ||
struct Codepoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this class should be moved to rust-unicode.h.
At least it should not be put in under gcc/rust/lex/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that seems good to me
// FIXME: The next test does not pass as of current implementation | ||
// ASSERT_TRUE (Rust::CompileOptions::validate_crate_name ("惊吓")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this original comment is probably wrong.
The kanji 惊(U+60CA) are not categorized as Alphabetic.
In https://www.unicode.org/Public/14.0.0/ucd/DerivedCoreProperties.txt
...
30FF ; Alphabetic # Lo KATAKANA DIGRAPH KOTO
3105..312F ; Alphabetic # Lo [43] BOPOMOFO LETTER B..BOPOMOFO LETTER NN
3131..318E ; Alphabetic # Lo [94] HANGUL LETTER KIYEOK..HANGUL LETTER ARAEAE
31A0..31BF ; Alphabetic # Lo [32] BOPOMOFO LETTER BU..BOPOMOFO LETTER AH
31F0..31FF ; Alphabetic # Lo [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO
3400..4DBF ; Alphabetic # Lo [6592] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DBF
4E00..A014 ; Alphabetic # Lo [21013] CJK UNIFIED IDEOGRAPH-4E00..YI SYLLABLE E
A015 ; Alphabetic # Lm YI SYLLABLE WU
A016..A48C ; Alphabetic # Lo [1143] YI SYLLABLE BIT..YI SYLLABLE YYR
A4D0..A4F7 ; Alphabetic # Lo [40] LISU LETTER BA..LISU LETTER OE
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for checking haha. did you try it with rustc
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh my god. i just checked and rustc compiles it (@_@)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overlooked 4E00..A014
containing 0x60CA
haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hahaha no worries it's all good :D
7d8619a
to
3631430
Compare
The new introduced class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you! Good work!
// FIXME: move this to rust-unicode.h? | ||
struct Codepoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that seems good to me
gcc/rust/rust-session-manager.cc
Outdated
error = Error (UNDEF_LOCATION, "crate name is not a valid UTF-8 string"); | ||
return false; | ||
} | ||
if (uchars.value ().empty ()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it would be clearer to add a temporary variable after the first if (!uchars.has_value())
? You keep calling uchars.value()
which maybe hurts readability just a little bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. fixed
gcc/rust/rust-session-manager.cc
Outdated
@@ -113,24 +114,32 @@ infer_crate_name (const std::string &filename) | |||
static bool | |||
validate_crate_name (const std::string &crate_name, Error &error) | |||
{ | |||
if (crate_name.empty ()) | |||
Utf8String utf8_name = {crate_name}; | |||
auto uchars = utf8_name.get_chars (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I would rename this variable to uchars_opt
{ | ||
error = Error (UNDEF_LOCATION, "crate name is not a valid UTF-8 string"); | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and then add this
} | |
} | |
auto uchars = uchars_opt.value(); |
// FIXME: The next test does not pass as of current implementation | ||
// ASSERT_TRUE (Rust::CompileOptions::validate_crate_name ("惊吓")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for checking haha. did you try it with rustc
directly?
static bool | ||
validate_crate_name (const std::string &crate_name, Error &error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function would probably benefit from the new tl::expected
type we have, but that's a problem for another PR!
|
||
namespace Rust { | ||
|
||
class Utf8String | ||
{ | ||
private: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to say a nit here to move the private fields to the bottom of the class declaration but then i realised we do it this style in AST and HIR classes.
What is the GCC style? I think having them at the top like this is probably the best now when i think about it.
@dkm @CohenArthur @tschwinge any opinions on the style of putting private fields at the top of the class or bottom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like having private fields at the bottom of my classes personally. I never gave it much attention haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
gcc/rust/ChangeLog: * lex/rust-codepoint.h: Add comment * lex/rust-lex.h: New method to get decoded characters * rust-session-manager.cc (validate_crate_name): Modify unicode check (rust_crate_name_validation_test): Add testcases * util/rust-unicode.h (RUST_UNICODE_H): New class Utf8String. (class Utf8String): New class. * util/rust-unicode.cc (binary_search_sorted_array): Add comment. (recursive_decomp_cano): Add comment. (recomp): Remove dead code. (dump_string): Removed. gcc/testsuite/ChangeLog: * rust/compile/bad-crate-name.rs: Moved to... * rust/compile/bad-crate-name1.rs: ...here. * rust/compile/bad-crate-name2.rs: New test. Signed-off-by: Raiki Tamura <tamaron1203@gmail.com>
Thank you for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
depends on PR #2425
Addresses #2287
Modify the function
validate_crate_name
to check if the crate name contains only Unicode alphabetic and numeric.See https://doc.rust-lang.org/reference/crates-and-source-files.html#the-crate_name-attribute