-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data skew in low-concurrency scenarios #44
Comments
does this mean this lib is prone to errors? |
Thanks for reporting an issue :) However, I'm struggling to follow what you're trying to explain and how the current method is causing a problem and why it should be changed :( I'm not sure if there's any more explanation you can provide or if there's someone else that might be familiar with the topic you're discussing and can help clarify the issue. Are you saying the library can generate non-unique ID's with low concurrency? Or are you saying that the way the IDs would sort would be a problem? What does "Make the step round by round" mean? "After calculating with mod all of the result of id generated will be less than 10, which means most of the records will be stored into order_tab_00000000-order_tab_00000009, it caused data skew;" You're doing something to truncate the numbers or reduce them? None of the IDs would be less than 10 normally. I'm also not really clear on the importance of the "order_tab_00000000-order_tab_00000009" tables. I'm just all around confused, sorry. :( |
@patchthecode - I'm not sure :(. Though I've not heard any other reports of the library not generating properly unique IDs. If you run into any issues or if you can help explain the above concern I'd be happy to look into it. |
I think I'm putting together what's being asked here, maybe. Are you talking about data sharding via MongoDB (or maybe some other DB)? By "order_tab_00000000-order_tab_00001000" are you saying you have 1001 order tables? Are you taking the modulus of the snowflake ID and 1001 (the number of tables you have) to try and get a random If you're dividing by 1001 I think it should work but if you happen to be using 1000 I could see how that would return 0 since the last four digits of a showflake using 12 stepbits would be 0000 and the entire number would divide into 1000 leaving no remainder. But if you divide by 1001 that shouldn't be an issue. Another option would be to change the step bits to a smaller value. By "round by round" do you mean just allow the step count to continue to count up and only reset to zero once it overflows (hits the max number it could be) ? If we did that the library performance under heavy load would be impacted negatively and since there's other ways to solve the problem I'm assuming you're working on I'd rather not do that. Finally, would just using a rand(1000) method to get a random number from 0 to 1000 to tell the DB what shard to shove the data into work? If I'm interpreting your issue wrong - please let me know! Hopefully this helps some, I know it's a late attempt to help. |
Data skew in low-concurrency scenarios
According to the source code, it will set the step to 0 once the time increased, it will cause a problem when we use the id as the sharding key for sharding strategy.
The code affects
Consider the following circumstance:
Scenario
Impact
Improve
Code May be referenced
The text was updated successfully, but these errors were encountered: