Closed
Description
The specification specifically says that we must validate the number of characters (looks like graphemes would be even a more correct term).
Currently scraperlib is using the len
function which is not counting the number of graphemes (what we want to validate because they are the visually perceived thing) but the number of code points (which is not what is visually perceived).
Looks like (according to ChatGPT, let's be honest) we could use the grapheme
library. Not sure this is the appropriate idea since this lib seems barely maintained / released in a proper manner.
import grapheme
print(len("विकी मेड मेडिकल इनसाइक्लोपीडिया हिंदी में")) # Outputs: 41 => Wrong
print(grapheme.length("विकी मेड मेडिकल इनसाइक्लोपीडिया हिंदी में")) # Outputs: 25 => Correct