-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework string length rules #86
Conversation
The deprecated attribute message for |
I think you can put |
I attempted to add So, I began searching the internet for solutions and came across proc-macro-warning. This crate inspired me to create a seperate module ( P.S. I'm still a newbie in proc macro development, so my code may be far from perfect. |
Seems like a fine approach! I wonder if there's a way to produce a less noisy warning (without the whole |
I think the solution looks like the best solution for warnings in proc macros. However, it is still in nightly, and I don't know if it is workable at all 😂. Okay, I think I could close the deprecation task and start a new task. I'm going to start moving the |
Should we place the Here, I see several approaches: 1. Make all features non-default.Pros:
Cons:
2. Make only the
|
I expect most people to keep using As for the size of the default feature set: My priority is to make
It does have a As you said, people who care about having the smallest set of dependencies have the option of using One approach of supporting third-party types without adding an impl to every library under the sun is something like the adapters described here |
Thanks a lot for your response! I agreed that it is uncomfortable to enable each feature individually, but I recently got an idea which I want to share with you. Okay, we have these points:
And I had such thoughts:
While I was thinking about whether we should remove the fn main() {
// This sentence in the different languages:
// "The sun sets behind the mountains, casting a warm glow across the tranquil lake"
const MEMORY_LIMIT: usize = 200;
const TEXT_LIMIT: usize = 100;
#[derive(Validate)]
struct Test {
#[garde(length(max = MEMORY_LIMIT), grapheme_count(max = TEXT_LIMIT))]
text: String,
}
// 79 bytes, 79 graphemes
let _a = Test { text: "The sun sets behind the mountains, casting a warm glow across the tranquil lake".into() };
// 58 bytes, 20 graphemes
let _b = Test { text: "太阳落山后,在宁静的湖面上投下温暖的光芒".into() }; // Chinese
// 113 bytes, 62 graphemes
let _c = Test { text: "تغرب الشمس خلف الجبال ، وتلقي توهجا دافئا عبر البحيرة الهادئة.".into() }; // Arabic
// 189(!) bytes, 48 graphemes
// This sentence still conveys the same meaning, but it requires a significant amount of memory.
let _d = Test { text: "পাহাড়ের পিছনে সূর্য অস্ত যায়, প্রশান্ত হ্রদ জুড়ে একটি উষ্ণ আভা ফেলছে".into() }; // Bengali
// 93 bytes, 92(!) graphemes
// This example requires more memory, but it also contains more graphemes.
let _e = Test { text: "Le soleil se couche derrière les montagnes, projetant une lueur chaude sur le lac tranquille".into() }; // French
// Also, I would like to draw your attention to the following. The grapheme count doesn't have
// such a large range of values, unlike bytes:
//
// max_dif_bytes = 189 - 79 = 110
// max_dif_graph = 93 - 48 = 45
// Another sentence: 377 bytes, 9 graphemes
// 1. This example will be rejected by the MEMORY_LIMIT(200).
// 2. For the best results (if you don't want to see these strange things), it will require a custom validator.
// However, our length limiter does its job and can reject these strange texts that could overflow our memory.
let _invalid = Test { text: "l̷̢̢̰̬͇̙͉͕̠̠̥̂̿̋͑̕͝͠ͅͅĩ̴̡̢̛̠̻̫̲͉̤̱̟͍̤̳͔͐̍̔̈́͊̒͂͋̈́̉̔̕̚͜͜͠k̸͍̳̜̗̰̼̦̟̖̳̥̙̗̂́̓̎͌͊͘ȩ̴͔̝̤̳̖̜̓̽̕ ̶̪̺͖̈́̃t̷̪̯̟̳͍̲͔̎͋̿̉̒̑̓̊̾̊̒̚͘ḩ̷̦͂̈́͗͌̏̏̇̔̈́͒̒̆̄̈́̚͠į̸̨̡̛̤͚͓̯͎̘̪̙̟̮͈͔͔̈́͋̉̾̃̎̒̈́́̾͂́ͅś̷̘̙̜̯͖̄͆̿̄̑̄̄͝".into() }; // This example will fail the validation.
// Sometimes, we need to have very short text, for example, for nicknames because
// we have limited space for its rendering. There isn't a simple solution; for more
// information, you could reading this article:
// https://tomdebruijn.com/posts/rust-string-length-width-calculations/
//
// As the most simple solution you could use the widest grapheme (e.g. `🙂`) and calculate
// how much graphemes you could contain in your string.
const TEXT_LIMIT_2: usize = 10;
#[derive(Validate)]
struct User {
#[garde(length(max = MEMORY_LIMIT), grapheme_count(max = TEXT_LIMIT_2))]
nickname: String,
}
// Every nickname has 10 graphemes:
let _a = User { nickname: "Aaaaaaaaaa".into() };
let _b = User { nickname: "太阳落山后在在宁静的".into() };
let _c = User { nickname: "ةاالهادئة.".into() };
let _d = User { nickname: "পাহাড়েরপাপাহাড়েরপা".into() };
let _e = User { nickname: "Aaaaaaaaaa".into() };
// Emoji have almost the widest width because most emoji are rendered with a display width of two columns.
let _f = User { nickname: "🙂🙂🙂🙂🙂🙂🙂🙂🙂🙂".into() };
// If you try to reject by a byte length (e.g. 10), valid user nicknames will looks like this:
// let _a = User { nickname: "Aaaaaaaaaa".into() };
// let _b = User { nickname: "太阳落".into() };
// let _c = User { nickname: "الهاد".into() };
// let _d = User { nickname: "পাহ".into() };
// let _e = User { nickname: "Aaaaaaaaaa".into() };
// let _f = User { nickname: "🙂🙂".into() };
} P.S. If my code has makes sense, we could add it somewhere, for example, in the documentation or examples. Yes, this example mostly repeats known facts, but I arrived at this conclusion. Often, especially when working with non-Latin languages (!), a developer should use byte length validation along with grapheme count validation to perform good validation. Otherwise, non-Latin text with the same meaning as in English may not fit within the same byte length. I think, it is a big fault for developers that didn't consider non-Latin languages. So, there is the fourth point.
I added because I think it is important feature, and should be with other. For the end, I also can add the point:
So, I would like to share my idea that could meet all the requirements mentioned above: In your last response, you mentioned the |
TL;DR: I agree with your suggestions. We should:
Full context:
Interesting, I hadn't thought of that. It makes a lot of sense, you want to ensure the nicknames display with some maximum width, so that your designers are happy, but by only checking byte length, you're discriminating against scripts which use more bytes in unicode. TIL! I think it would be a great addition to the docs. It could probably be simplified to only use two or three languages, e.g. English and Bengali, just to illustrate the concept. Side note: I think this crate is sorely missing a "cookbook" of examples, showcasing common patterns, such as how to validate a typical registration form.
Agreed.
I think I'm also in favor of this now. Initially I wasn't, because I thought it's just a less convenient I will say it's unfortunate that we have to resort to such "hacks" or "workarounds" because Cargo
Btw, what is "gradle" and "gardle"? 😄 |
Also just a reminder: Not everything has to be in one PR. Feel free to just put |
Okay, I have finished implementing
Yeah, it is a good idea. But should we open a new issue before that?
Oops, I have mixed
|
Hey, can this PR also implement Length, ... for |
Yes, of course 🙂 |
Yeah, I have guessed the minimum Rust version 😂. |
Hi! I took some time to get back to this because I wasn't totally satisfied with the solution. There's a few unnecessary/unrelated changes in this PR:
As for the actual implementation:
Because I can't commit here (the base of this PR is your fork's |
See: #84
length
to byte lengthbyte_length
with a message that tells people to uselength
instead.char_count
rule which will exist for backwards compatibility, just in case someone was depending on it counting USVs.README.md