-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sqids adds too many junk characters when encoding multiple numbers with a min length specified #6
Comments
In the first 2 examples, the min length requirement is met, so there's nothing extra to do. In the last 2 examples, the min length requirement is not met, so the code enters the additional padding section. This is how it works:
So for example: encoding
|
@4kimov I see, but wouldn't it make sense to rethink the logic for adding junk characters so as to avoid this? I also tested with Hashids and it doesn't have this problem (adds only as many characters as needed). This is what it returns for
Sqids's current behavior here is unexpected and also potentially undesirable from the user's PoV, even though it's explicable in the context of the algorithm's details. |
Hashids is designed differently. The previous algorithm doesn't have the same problem because a) it didn't have to handle a blocklist and b) it reserved too many characters initially - some as separators, some as guards. This resulted in more patterns (for example I can see that "F" is a separator in both ids above). Additionally, because it isolated so many characters for internal use, there's less alphabet used for encoding actual data (smaller alphabet = larger IDs), so there's that downside also.
I can't speak for desirable or undesirable, but it should certainly be expected. The parameter is called |
@4kimov Technically true, but the purpose for which this parameter is often used in practice is aesthetic; you typically set a relatively high min length so that your IDs look uniform (e.g. that an entity with the ID of And this is an "expectation" that Sqids currently violates in the aforementioned scenarios.
Regarding this point in particular and as a side-note, Sqids seems to have the recognizable pattern problem as well, just in different ways. For example, another thing I stumbled into is that e.g. I'm not trying to suggest that Sqids should just do whatever Hashids is doing, Sqids is obviously an improvement over Hashids in a lot of ways and was meant as such, I'm just saying that there are a few things that Hashids seems to do better, which Sqids should probably take inspiration from. And this is a good time to talk about that stuff, before Sqids acquires wide adoption, at which point breaking changes would no longer be feasible. The junk character insertion, I would argue, is clearly one of those things. Isn't there a way this specific part of Sqids's algorithm can be reworked to solve this issue? |
I might be missing something but what would be wrong with simply adding a long enough slice of the alphabet to make up for the missing length? See this PR (made the PR just so you can see the code diff), seems to work, the current algorithm is already partly doing this. |
ping @4kimov? |
I'm not sure I have much else to add to this, since I've given the reasons for the technical choices taken + the limitations. I am not sure I see how your PR would address accidentally appending a blocked substring. I also personally don't consider this issue to be that big of a problem. Having said that, if you do find an improvement that has an elegant fix (meaning does not increase complexity of code and has clear benefits), please create a PR in the spec code (with working tests), and people can chime in on whether it's worth asking everyone to pivot to that or not. I'll close this discussion for now. |
@4kimov Don't the next few lines of code literally take care of that?! If it does "accidentally append a blocked substring" then the blocked word logic kicks in immediately afterwards. What's the problem?
I've presented what I believe to be just that. Feel free to correct me if I'm mistaken.
It's little problems like this compounding together that could make Sqids vs. Hashids even a debate. I find it surprising that you're just hand-waving these away. |
Hi there, I just noticed something:
When no min length is specified:
The result:
Now, if you set
minLength
to5
, this will be the result:Why did the last two suddenly jump to 6 characters?! That seems unnecessary given that they were just encoded with 4 characters, why does specifying the
minLength
cause Sqids to add 2 more characters, rather than 1, which would've been sufficient to meet the min-length requirement?This doesn't actually make sense, correct me if I'm wrong but it seems like some sort of a bug.
It can be especially undesirable since usually the point of setting a min length is aesthetic uniformity, which this particular quirk basically defeats.
The text was updated successfully, but these errors were encountered: