Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature suggestion: (Option to) keep trying rather than throw SequenceOverflowException #21

Closed
reinux opened this issue Apr 6, 2020 · 9 comments
Assignees

Comments

@reinux
Copy link

reinux commented Apr 6, 2020

SequenceOverflowException only occurs under very specific circumstances (hardware speed, unusually high demand, very tight loop etc.) which may not always be easy to test for or anticipate.

It's hard to imagine a use case where a user can simply give up if they can't have an ID, so there's no harm in simply waiting. The default settings allow for 4 million IDs per second, which ought to be fast enough for all intents and purposes, and much preferable to unexpected crashes.

Of course, it's easy to wrap the whole thing in a try/catch block, but my concern isn't usability so much as reliability.

@RobThree RobThree self-assigned this Apr 7, 2020
@RobThree
Copy link
Owner

RobThree commented Apr 7, 2020

SequenceOverflowException only occurs under very specific circumstances (hardware speed, unusually high demand, very tight loop etc.) which may not always be easy to test for or anticipate.

Well, it's exactly what this test does. Simply create a MaskConfig with only a few bits for the sequence part and make it overflow.

It's hard to imagine a use case where a user can simply give up if they can't have an ID, so there's no harm in simply waiting.

If you're experiencing unusually high demand "simply waiting" might actually worsen the situation, piling up more and more stuff "simply waiting" for an ID, where you actually probably would want to tell the systems upstream requesting ID's to try another host or cluster or... stop requesting ID's or... take a chill-pill or something 😆

Imagine, if you will, a factory that produces internet connected marbles and each marble rolling off the factory line gets its own ID. First, that would have to be an insanely large factory, and you'd probably have hundreds of those around the world - hence why you chose IdGen. Also, that toy would have to be popular in millions of galaxies; can you imagine the profit on that? Now imagine production is ramped up because of high demand; Christmas maybe. And now you're unable to produce ID's fast enough. Would you "skip" assigning marbles ID's altogether and potentially have a lot of unhappy customers because their toy won't work? Or "keep the marbles waiting" (where? the factory floors are finite... so is our solar system) while there are even more coming from the machines producing them? Or do you prefer to signal an operator with a red light? Or maybe tell the machines upstream to slow production down just a little?

I think explicitly leaving handling this exceptional case up to whomever is using the library is exactly what we want. It forces people to think about their code. "Silently slowing down" generating ID's and "solving" the problem may make matters worse. And what better way to notify someone they forgot to handle an overflow by crashing? 😆 As a library creator I don't want to "take responsibility" for "handling"* such cases, the user (should) know(s) best and take responsibility.

* Where I'm not sure if 'just put in a delay and wait for a bit' can even be defined as 'handling' the situation...

The default settings allow for 4 million IDs per second, which ought to be fast enough for all intents and purposes

Actually, 4096 (12 bits) for each generator (1024 generators, 10 bits) per millisecond to be precise in the default configuration. That's over 4096 * 1024 * 1000 ≈ 4 billion per second...

Which is exactly why throwing an exception shouldn't be a problem; it'll be an exceptional situation anyway; hence the exception 😉 Let's be real for a second here; most of us will never, ever, even come close to even the tiniest fraction of requiring millions or billions of ID's per second. Even if you do, you then should have at least 210 generators (which is 1024 generators!) EACH to their fullest capacity; else why would you have used a MaskConfig of only 12 sequence bits and 10 generator bits? Why didn't you use, say, only 2 generator bits and 20 sequence bits which increases each generator's sequence size 256-fold from the default configuration?

Of course, it's easy to wrap the whole thing in a try/catch block, but my concern isn't usability so much as reliability.

Actually catching the exception and handling it accordingly will improve reliability instead of ignoring the situation at hand and trying to swipe the problem under the floormat by just keeping everyone waiting. Exceptions aren't always a Bad Thing™, they signal something exceptional is going on and you might want to act upon it (or catch and swallow, ignoring the problem - it's up to whomever is using the library).

Having said that; it shouldn't be hard to implement a "WaitingIdGen" of some sorts. Just implement the IIdGenerator<long> interface, use a 'normal' IdGenerator internally, catch the SequenceOverflowException, put in a delay of a single tick and then call CreateId() again in the exception handler. Heck, you could even do this in a loop and keep 'trying' until it succeeds eventually; though in reality you'll just be hammering your resources.

Though I currently disagree on the issue at hand, I am open to discussion. Change my mind 😜

@reinux
Copy link
Author

reinux commented Apr 7, 2020

Though I currently disagree on the issue at hand, I am open to discussion. Change my mind 😜

Cool, here goes:

The situation that caused this for me was when I was migrating from another database, and generating several million IDs in advance. In a tight loop, 4,096 is easy to hit in a millisecond, even on my 9 year old machine.

Actually, 4096 (12 bits) for each generator (1024 generators, 10 bits) per millisecond to be precise in the default configuration. That's over 4096 * 1024 * 1000 ≈ 4 billion per second...

Presumably, it's unlikely that you're going to max out on your generator count, and you're probably going to be generating a ton from one generator, since those generators are probably going to be different application instances or even entirely different computers. So having 1024 generators doesn't really help in most practical cases.

Also, the generator count could be precious, whereas a millisecond isn't.

And what better way to notify someone they forgot to handle an overflow by crashing?

Softly, by slowing down. It buys time for developers to react.

I understand that silent failures are bad, but it's also been said that premature optimization is the root of all evil.

It's harder to imagine a situation where generating IDs is the most time-consuming part of any process, or where generating IDs is more time critical than mission critical.

Imagining a very likely scenario: You're developing an image uploader.

During development, you're working on sets of, say, 1000, and then a couple years after release, people start uploading larger and larger sets, and you start hitting 3000, 4000, 5000. A crash would stop business then and there. After hours or potentially days of lost business or a very large customer, you wrap the whole thing in a try/catch. At that point, you can start evaluating other ID options until performance begins to suffer beyond what is acceptable -- you've effectively landed on the same solution as not throwing an exception in the first place.

Years pass, and people start uploading 10,000 images at a time. The upload takes two minutes and a millisecond rather than two minutes exactly.

It's 2080. Miraculously, the earth, along with you and your users, are still alive. Per-core CPU performance is still the same as it is now. People are uploading 100,000 images at a time. The upload takes twenty minutes and ten milliseconds rather than twenty minutes exactly. The time component is about to overflow. Now you need to switch to a new way of doing things -- and not even because it was too slow.

If instead you're doing something that does generate a bajillion IDs all day every day in a tight loop, like, say, a data logger or a user input device driver, you're probably just as likely to encounter an unacceptable slowdown during development as you are to encounter a crash.

I should add that exceptions are also quite slow. This is why you see the TryParse/Try* API all across .NET for time-critical code.

@RobThree
Copy link
Owner

RobThree commented Apr 7, 2020

The situation that caused this for me was when I was migrating from another database, and generating several million IDs in advance. In a tight loop, 4,096 is easy to hit in a millisecond, even on my 9 year old machine.

For that specific scenario you could use a generator-id per-thread (for example). Or just chop up the work in, say, 1024 batches and process each batch "as" a specific generator. And, again, these numbers aren't carved in stone, right? You might as wel use up to 64 generators (only 6 bits per generator, giving you 4 extra bits (16 times!) the extra 'sequence space'.

Another "trick" you could use (but make sure you understand what's going on / what you're doing) is for the migration only "temporarily" use an IdGenerator with a MaskConfig with 1 bit for the generators and the 'left over 9 bits' assigned to the sequence. That way, whenever your (original) 12 bit sequence (which is now 21 bits) "overflows" it does so into the generators' part. That way it will 'automatically' take a new generator and reset the sequence. After the import just set the MaskConfig back to 10 bits for the generators and 12 bits for the sequence (or whatever values you actually use) et voila! It appears as if the migration was done by multiple generators. I would advice to stop any other processes or nodes and I want to absolutely stress that you should understand what's going on performing this 'trick' to understand the (possible) consequences of doing this. It is, however, the easiest and simplest way (from an ID-generating-during-a-migration-on-a-single-node point-of-view) to solve the problem you ran into causing you to open this issue.

I understand that silent failures are bad, but it's also been said that premature optimization is the root of all evil.

It's not about premature optimization at all. It's about fail fast.

It's harder to imagine a situation where generating IDs is the most time-consuming part of any process, or where generating IDs is more time critical than mission critical.

Again; this isn't about the time taken to generate an ID (it being the "most time consuming part" in a system); it's about not sweeping problems under the rug.

A crash would stop business then and there.

No, it wouldn't / shouldn't. Just possibly one of your 1024 nodes threw an exception for an upload because it's sequence overflowed; it may even throw one more for each upload after that for the remainder of the duration of that millisecond and it may even flood some logs if you're not careful. But the next millisecond everything will be (or should be) alright again. In this setup, when the problem doesn't get immediate attention, you'll just start to notice that some percentage in every 4096 * 1024 * 1000 uploads per second will fail and that total number is rising higher as your system evolves. Whenever you're in the market for an Id generator like IdGen you won't be using a single process that, when crashing, will bring down everything to a screeching halt. You'll have redundancy and high availability as part of your system - a horizontal architecture. Not a single process somewhere in the back of a broom-closet that is brought to it's knees by a single exception. Tens or hundreds, maybe thousands of nodes.

It's 2080. Miraculously, the earth, along with you and your users, are still alive. Per-core CPU performance is still the same as it is now.

Because in 2023 the developers noticed a problem and fixed it by either adding more nodes and lowering individual loads or redesigning the system or...

If instead you're doing something that does generate a bajillion IDs all day every day in a tight loop, like, say, a data logger or a user input device driver, you're probably just as likely to encounter an unacceptable slowdown during development as you are to encounter a crash.

I just can't get my head around to... if you have a (single) system that generates a bajillion IDs all day every day in a tight loop and making that system delay intentionally; how does that solve your problem? All it does is make matters worse because the users keep coming and uploading and now you need huge amounts of memory and storage just to be able to keep track of some 'queue' that you're storing requests in waiting for ID's...

I should add that exceptions are also quite slow. This is why you see the TryParse/Try* API all across .NET for time-critical code.

Who's premature optimizing now? 😉

I am very aware of the "expense" of exceptions; again: they're not meant for regular program flow but for exceptional circumstances which an overflow undoubtedly is. We need the (library) user to think on how to actually handle the situation of a sequence overflow. If they want to wait, then sure, be my guest and put a Thread.Sleep(<some_time>); in the exception handler. It's the exact same thing you're proposing, except this time the (library) user has knowingly and intentionally slowed down the system (and only "paid" for it once for a single exception - exceptions may be slow but as long as you're not throwing thousands of them in the same millisecond the 'cost' is near zero).

On the other hand, if they don't want to wait and slow down they have the freedom to just put something else in the exceptionhandler. Like, I don't know, spin up another Amazon node to dynamically 'scale up' and spread the load over more nodes. Or tell the frontend to try another cluster to handle the request. Or... maybe even putting in a Thread.Sleep(...). But they noticed and did something. Their logs were clear (sequence overflow), their fix (whatever it was) was intentional and documented and chosen and designed by them. If we'd have just 'silently waited' they would've spent hours, maybe days, looking in their logfiles as to why their Octa Intel Xeon hecta-core servers couldn't handle a measly few thousand requests individually and seeing all their request queues quickly fill up. The load on their machines is near zero but they're refusing to handle the requests... Imagine having to troubleshoot that 😅

If the "just wait around a bit..." was built into the library the system would never be aware of the problem and not be able to (automatically) 'scale up'. Also; depending on the MaskConfig a tick is, by default, a millisecond but it could be anything; it could be nanoseconds or centuries. Would you still want to wait a century? Or what about trading systems with high volumes, where every microsecond counts; would you be willing to risk ("silently") waiting a millisecond and possibly risking huge amounts of loss? I strongly disagree that this library should be so 'opinionated'; it should be up to the user on how to handle the issue at hand, not the library making these decisions for them. If they want to wait; sure, no problem, they can. If they prefer another way to handle it: they can too. In the same way: just handle the SequenceOverflowException to your liking.

And to drive my point home even a bit more: What if File.Delete() instead of failing on a file that's in use decided to just wait for some time and then try again. Wouldn't that be weird? And you can play that game for just about any exception you can think of (with the possible exception pun intended of TimeoutExceptions).

Also, the generator count could be precious

On a (more) positive note: I think that's one thing we agree on. You (anyone) should carefully think about the amount of generators required (now and in the distant future); adjusting this later can (but doesn't have to) be a problem.

And finally; even though we may disagree (and I'm still open to having my mind changed), I would like to stress that it's only a handful of lines of code to implement a WaitingIdGenerator by using an IdGenerator internally and catching the SequenceOverflowException and putting in a delay of some sorts - "paying" only the cost of a single exception per millisecond (or rather, more accurately: per tick). I could consider a TryCreateId() method to avoid throwing an exception and offer a 'cheaper' way out but, for now, I don't see a strong enough case to do so. The SequenceOverflowException shouldn't happen in the first place; if it does you're (most likely) using the library wrong -or- running into an actual scaling problem and you should be ahead of those too.

@reinux
Copy link
Author

reinux commented Apr 7, 2020

For that specific scenario you could use a generator-id per-thread (for example). Or just chop up the work in, say, 1024 batches and process each batch "as" a specific generator. And, again, these numbers aren't carved in stone, right? You might as wel use up to 64 generators (only 6 bits per generator, giving you 4 extra bits (16 times!) the extra 'sequence space'.

Right, but then I'm using more generator bits, and I have to sacrifice the one-to-one correspondence between a generator and a whatever broader instance. I also need to keep track of what range of generators are reserved for each. So it's very much not ideal.

It's not about premature optimization at all. It's about fail fast.

I mention silent failures for the sake of argument, because those are actually post-mortem failures, which are the latest of all kinds of failures.

Again; this isn't about the time taken to generate an ID (it being the "most time consuming part" in a system); it's about not sweeping problems under the rug.

Getting back to failing fast: The best place to fail is at the compiler, then deployment time, then execution time, then postmortem. Crashes aren't particularly high on the "fail fast" chart.

Not all failures are equal, either. You're going to need to be doing something with the IDs anyway, and the rest of the program is going to be taking up much more time per ID than a quick function like CreateId().

Whenever you're in the market for an Id generator like IdGen you won't be using a single process that, when crashing, will bring down everything to a screeching halt. You'll have redundancy and high availability as part of your system - a horizontal architecture.

Right, but every single node is going to fail identically on the same or similar input if the input someday involves several thousand IDs.

I just can't get my head around to... if you have a (single) system that generates a bajillion IDs all day every day in a tight loop and making that system delay intentionally; how does that solve your problem?

That's a different hypothetical scenario. My first scenario is in regards to an image uploader (or literally any application involving thousands of distinct items being uploaded or stored to a server); the second is in regards to hardware drivers or IoT type situations.

Who's premature optimizing now? 😉

I was just pointing out that if performance is a concern, try/catch isn't the right answer.

Like, I don't know, spin up another Amazon node to dynamically 'scale up' and spread the load over more nodes.

That's a good option if your work unit is parallelizable. It may not be desirable or even possible to do so.

If they want to wait, then sure, be my guest and put a Thread.Sleep(<some_time>); in the exception handler.

Maybe not Sleep, since that can potentially take longer than a millisecond (50ms on some OSes), even when it's set to 0.

The load on their machines is near zero but they're refusing to handle the requests... Imagine having to troubleshoot that 😅

That would be another reason to avoid Sleeping.

If the "just wait around a bit..." was built into the library the system would never be aware of the problem and not be able to (automatically) 'scale up'.

Also; depending on the MaskConfig a tick is, by default, a millisecond but it could be anything; it could be nanoseconds or centuries. Would you still want to wait a century?

I'm just talking about practical cases here. Presumably that's why the defaults are set to what they are, because they're practical.

It's also why I think it would be a good option to have, whether or not it's the default behavior.

And to drive my point home even a bit more: What if File.Delete() instead of failing on a file that's in use decided to just wait for some time and then try again.

But that can halt on the order of hours even in a normal situation, and the wait time is in no way correlated to the size of the workload.

The SequenceOverflowException shouldn't happen in the first place; if it does you're (most likely) using the library wrong -or- running into an actual scaling problem and you should be ahead of those too.

If you want several thousand items to be served (or even just processed) by the same node, I think it makes perfect sense.

@RobThree
Copy link
Owner

RobThree commented Apr 7, 2020

Right, but then I'm using more generator bits, and I have to sacrifice the one-to-one correspondence between a generator and a whatever broader instance.

Only during the migration; time is part of the generated Id's.

I also need to keep track of what range of generators are reserved for each

No you don't? Who cares which generator was used, at a particular time, which node was used to do the migration. And if it's important to you then why not simply put in a Thread.Sleep() yourself and be done with it? But please understand that a generated id, besides being (roughly) time-ordered has no other guarantees whatsoever; you're not even supposed to know the specific implementation details and know that some bits are reserved for generators etc. An IdGen generated id is supposed to be an opaque blob of bits, closely resembling something usually incrementing only and non-conflicting if provided by more than one host/process/thread by more than 1 generator. You're not supposed to be 'decoding' and id, let alone using it to keep track of what generator generated it; it's unimportant and not part of the (intended) contract.

I mention silent failures for the sake of argument, because those are actually post-mortem failures, which are the latest of all kinds of failures.

There's no post-mortem if you actually catch the exception and handle it accordingly.

Crashes aren't particularly high on the "fail fast" chart.

Again; what crashes? Handle 👏 your 👏 exceptions.

You're going to need to be doing something with the IDs anyway

Yes, like signalling the problem and redirecting to a less busy node for example. Not busy waiting and starting thumb-twiddling until some arbitrary time has passed so we 'can' generate a new id.


EDIT: I just realized this is where some of the confusion may come from. I keep/kept saying redirecting to other nodes, spinning up new instances etc. whenever a sequence overflows. Let me be clear: it's not IdGen's job to signal a host is too busy or whatever; there are other means for that (like decent monitoring of your nodes). It's also not up to IdGen to use the "arbitrarily sized" sequence as a means for figuring out what's "too busy" and what isn't. There should be other systems in place for that. By the time your sequence overflows IdGen has no option; it quite literally ran out of possibilities; it can't generate a new id because that will create a conflicting id. One option to solve that is to redirect the work to another node that could handle the job. Another option is to simply fail. Or wait. Again; whichever option you choose; it should be up to the library user, not IdGen deciding for some that, well, it ran out of options, let's just slow things down.

The exception is there to signal something is wrong: your sequence has overflowed where it shouldn't. The number of bits you reserved for the sequence are too few. How you want to fix it (adjust the MaskConfig, HCF, spin-wait, pray for a miracle, whatever...) is up to you.


Right, but every single node is going to fail identically on the same or similar input if the input someday involves several thousand IDs.

That is assuming all 1024 (or whatever the generators size is) are all actually up and the load is evenly distributed over those nodes so they all fail at the same time. By then you should start to wonder if IdGen is actually for you.

I was just pointing out that if performance is a concern

Again, it isn't. Correctness and reliability is. You're insisting on exceptions being bad but they're not. Their actual intended use is to signal exceptional situations; hence their name.

That's a good option if your work unit is parallelizable. It may not be desirable or even possible to do so.

Again; maybe IdGen then isn't for you. Maybe you should look into plain simple auto-incremented ID's. The specific use-case for IdGen is distributed (e.g. parallelizable) uncoordinated ID generation.

Maybe not Sleep, since that can potentially take longer than a millisecond (50ms on some OSes), even when it's set to 0.

Whether it's Thread.Sleep, spin-locking or some other variation of waiting; it's irrelevant. The entire premise of just waiting around for a tick to pass (e.g. a millisecond in the default configuration) is nuts when you have a system that's so busy that the sequence overflows. It's like the higher the demand the slower you're working, the more work is piling up. If your sequence overflows you want more bits for your sequence or more nodes to handle the load.

I'm just talking about practical cases here. Presumably that's why the defaults are set to what they are, because they're practical.

They are for most intended use-cases. distributed, uncoordinated ID generation. Not tight-loop migrations in a single process.

It's also why I think it would be a good option to have, whether or not it's the default behavior.

I still don't see any strong, if any at all, arguments for 'waiting'. I also don't see any solutions on how to handle the work piling up when we start waiting. And finally I still don't see why you wouldn't just write the handful lines of code to create a WaitingIdGenerator.

But that can halt on the order of hours even in a normal situation

Who's to say the Id-generation won't? The more ID's are being generated, even with the smallest workload, if you start waiting for ticks to expire more requests/uploads/marbles will be queuing up. During the next tick, whatever the duration is, even _more _work will have to be processed (causing new waits, causing more backlog).

If you want several thousand items to be served (or even just processed) by the same node, I think it makes perfect sense.

Then adjust the MaskConfig accordingly? It's not as if you don't have the option? Every bit you add to the sequence part doubles the space (at the cost of halving your generators or time resolution, depending on where you 'take' the bit from). IdGen is actually one of the very few, if not the only library offering variable sized timestamp, generator range and sequences. Most other snowflake(-alike) libraries are fixed 41/10/12 bits size respectively.

I'm sorry, I just don't find any convincing or compelling reason to (silently or otherwise) keep waiting around until a tick has expired to generate the next ID in a busy system. Heck, why would you even wait around as you could simply wrap-around the sequence counter and 'artificially' increment the tick; it's what the outcome is going to be anyway; you know this ahead of time. Why keep sitting around, thumb twiddling, for the tick to pass? But by then you're damn near close to plain auto-incrementing ID's. Again, not what IdGen is intended to solve.

Circling back to your migration; that makes me wonder too: Don't you want your migration to happen as fast as possible? If you're having to migrate millions of items, with a low enough workload to actually overflow the sequence, do you really want to add extra time doing nothing? Why don't you take the much easier route(s) of either adjusting the number of bits reserved for the sequence or 'faking' a generator-id for the duration of the migration just to speed things along. Both make much more sense to me than introducing artificial delays just because some counter overflowed.

@reinux
Copy link
Author

reinux commented Apr 8, 2020

No you don't? Who cares which generator was used, at a particular time, which node was used to do the migration.

Yes, if it's only a one-time migration, and you can be sure that there's nothing else happening simultaneously.

The image uploader situation I noted earlier would have most users making IDs in the hundreds (and, crucially, occasionally, potentially, in the thousands) kicked off by multiple users simultaneously and frequently. Allocating generators using this tactic then becomes a non-trivial problem.

Again; what crashes? Handle 👏 your 👏 exceptions.

That would be a better argument in Java, where the caller is forced to be aware of every exception.

May I suggest logging via a callback function or firing an event instead of throwing an exception?

Again, it isn't. Correctness and reliability is. You're insisting on exceptions being bad but they're not. Their actual intended use is to signal exceptional situations; hence their name.

Yes, but unexpected situations are unhandled almost by definition, and unhandled exceptions cause crashes, regardless of how severe the unexpected situation is. An exception that occurs early in development or during testing is a great. An exception that likely won't occur until some point after release is antithetical to reliability.

it's not IdGen's job to signal a host is too busy or whatever

The exception is there to signal something is wrong: your sequence has overflowed where it shouldn't.

These two concerns seem to contradict.

Again; maybe IdGen then isn't for you. Maybe you should look into plain simple auto-incremented ID's. The specific use-case for IdGen is distributed (e.g. parallelizable) uncoordinated ID generation.

Autoincrements and UUIDs pose problems even for relatively simple SQL applications, which can be solved with a combination of source IDs (generators) and mostly-sortable source-specific IDs (timestamp and ordinal). Cloud-style massively scalable services aren't the only place that IDs with these characteristics make sense.

Whether it's Thread.Sleep, spin-locking or some other variation of waiting; it's irrelevant. The entire premise of just waiting around for a tick to pass (e.g. a millisecond in the default configuration) is nuts when you have a system that's so busy that the sequence overflows.

True, if you assume the ID generation is interspersed with the work. Generating a batch of IDs up front for a batch of work is a common scenario, say, if you want to assign IDs to graph nodes before you link them up.

Why keep sitting around, thumb twiddling, for the tick to pass? But by then you're damn near close to plain auto-incrementing ID's. Again, not what IdGen is intended to solve.

Because the worst case isn't the usual case, and in the usual case, autoincrement isn't always the best solution. In most cases, failing in an unusual case rather than taking a few extra milliseconds doesn't seem like a good tradeoff.

But that can halt on the order of hours even in a normal situation

Who's to say the Id-generation won't?

Benford's law. For numeric/digital representations of countable real-world items, most counts will tend toward the highest digit having the lowest value (i.e. 1), with higher values in lower digits being anomalous. This is so much so that it's used as a signal in detecting fraud.

Assuming you've chosen your mask size (i.e. your groupings) to take into account the most common use cases, most of your data is going to be of smaller counts, until you have an unusual case. The unusual case, the rollover digit beyond 4096, will usually be 1 -- meaning that the wait will almost always be one tick, sometimes two or three, rarely more.

Even then, it'll likely be a fraction of a tick as opposed to a full tick. And if someone is going to generate millions of IDs for a time-critical application, surely, they'll have profiled its performance. And if someone is going to generate millions of IDs for things that aren't used in tight loops, surely they need those IDs for some kind of processing which will invariably take up a lot more time than generating IDs or even waiting to generate IDs.

Then adjust the MaskConfig accordingly? It's not as if you don't have the option?

ID formats, in most applications, are kind of set in stone once they're chosen.

Circling back to your migration; that makes me wonder too: Don't you want your migration to happen as fast as possible? If you're having to migrate millions of items, with a low enough workload to actually overflow the sequence, do you really want to add extra time doing nothing?

In my particular case, I had 22 million items to migrate. 22 million IDs on a single generator takes 5 seconds with the default settings. The rest of the migration takes 10 minutes. You're right that I could safely use another generator (and I admit I hadn't thought of that), but what I'm trying to illustrate with this example is that, given that generating IDs is a very small part of the process, waiting another 5 seconds is perfectly fine.

In normal operation, I expect to be generating hundreds of IDs at a time. It would be unusual, but I can't say I won't be generating thousands. In those cases, I would rather lose a millisecond or two than fail an entire work item or have to parallelize right down my work unit.

And if the performance gets really bad -- hey, I should be monitoring my nodes, right?


In any case, it seems I've failed to change your mind. No prob, it's just a difference in philosophy. I was able to write up my own solution in the meantime. If I put it up on Github, I'll be sure to credit you.

Thanks for entertaining my thoughts!

@RobThree
Copy link
Owner

RobThree commented Apr 8, 2020

Allocating generators using this tactic then becomes a non-trivial problem.

Then you should've solved that problem earlier; again: the (intended) use-case for IdGen is distributed uncoordinated id generation. If you're only going to use a single generator then there's not much use for IdGen. Using multiple generators is the intended scenario and, by extension, allocating generators should've been a solved problem by then. Wether you provision each node with a fixed generator id, use a per-thread generator id, have a server ("generator coordinator") hand out generator id's or ... whatever.

May I suggest logging via a callback function or firing an event instead of throwing an exception?

You may, and that would be a possibility, but I don't think it's an very intuitive way of doing things.

Yes, but unexpected situations are unhandled almost by definition, and unhandled exceptions cause crashes, regardless of how severe the unexpected situation is.

The situation isn't unexpected; the SequenceOverflowException is perfectly documented and it's up to the user to handle it accordingly, as with any other exception any other method in any other class may throw. IdGen is no exception again, pun intended 😛 in that.

An exception that occurs early in development or during testing is a great. An exception that likely won't occur until some point after release is antithetical to reliability.

But, again, if it happens it will be clear what is going on, what the problem is. Instead of silently ignoring things.

Generating a batch of IDs up front for a batch of work is a common scenario, say, if you want to assign IDs to graph nodes before you link them up.

Again, if this is your use-case then IdGen offers many options (some of which may be required to have been thought-out upfront):

  • Use another tickduration
  • Use more bits for the sequence part
  • Use more generators
  • Handle the SequenceOverflowException to your likings:
    • Insert a wait
    • Increment (or pick a random or whatever) the generator id
    • Redirect work to a less busy node
    • Queue the work
    • ...
  • Bring the system to a halt and let it crash
  • ...

In most cases, failing in an unusual case rather than taking a few extra milliseconds doesn't seem like a good tradeoff.

Maybe in your situation(s). But where for you a millisecond is "just a small wait", for others it's an unacceptable price to pay. From a library standpoint, which can be used (and is intended to be used) in any random scenario, throwing in a wait is not always the best option. Also, again, even if we were to wait - a tick may be anything from a millisecond to days, nanoseconds to centuries (theoretically).

Benford's law. For numeric/digital representations of countable real-world items, most counts will tend toward the highest digit having the lowest value (i.e. 1), with higher values in lower digits being anomalous. This is so much so that it's used as a signal in detecting fraud.

Without using Benford's law or any other convoluted methods; there's an easier way to detect a sequence overflow: The SequenceOverflowException will be thrown 😉

The unusual case, the rollover digit beyond 4096, will usually be 1 -- meaning that the wait will almost always be one tick, sometimes two or three, rarely more.

Which, again, may be acceptable in your specific use-case/scenario, but it may not be in others. You really should consider scenario's outside your own in which this library will be used (and is intended to be used) and take that into account when suggesting solutions. To exaggerate a bit: what would happen if developers of OS'es, other libraries etc. all would just throw in a random millisecond (or whatever period) wait in code? "Oh, hi! So you want a lock on myobj? Well, not right now buddy, but let me wait a millisecond before I try again... ... ... ... ... -one eternity later- hmmm, yes, that worked (this time)!"

Even then, it'll likely be a fraction of a tick as opposed to a full tick.

That's another assumption I wouldn't want to make; who's to say you didn't oveflow the sequence within the first one-thousands of the tick? I agree it's likely but not that likely.

ID formats, in most applications, are kind of set in stone once they're chosen.

That's why you need to sit down and think about it before and during the design of your application. Having said that, if anything, IdGen allows you to 'cheat' a little; you can 'fake' a generator-id (for example, if you normally only use 1, 2, 3, ... you could use 1023, 1022, 1021... to 'keep track' of when you needed to do this) for these kinds of situations. And because a timestamp is included in the ID you could even keep track of when you cheated and mark ID's between the start- and end-time as 'cheated'.

In my particular case, I had 22 million items to migrate. 22 million IDs on a single generator ...

... again: IdGen is not _intended _to be used in a 'single generator' scenario; it's (very much) ok if you do, but that's not how it's intended to be used. So if you then overflow the sequence you need to either re-think and adjust the maskconfig or use a second generator ID or wait for the remainder of that millisecond or... do whatever you want. IdGen signalled you very clearly, and as early as possible, what was going on and that there was a problem. Didn't it? 😉

...[...] waiting another 5 seconds is perfectly fine.

... for you. Keep that in mind. For you. Who am I to decide that's OK for everyone using this library?

In those cases, I would rather lose a millisecond or two than fail an entire work item or have to parallelize right down my work unit.

Again; YOU would rather loose a millisecond. But that may not be the case for everyone using this library.

No prob, it's just a difference in philosophy.

Agree to disagree 😅

I was able to write up my own solution in the meantime.

Glad you did 👍

If I put it up on Github, I'll be sure to credit you.

Cool. Maybe link to this issue?

Thanks for entertaining my thoughts!

You're welcome and ditto!

RobThree added a commit that referenced this issue Apr 17, 2020
@RobThree
Copy link
Owner

RobThree commented Apr 17, 2020

So I've let this sink in for some time and decided that, even though I (still) don't see a strong case for implementing a TryCreateId() method, it is easy to implement and it can be done without breaking changes. So it won't hurt adding it either. Best of both worlds.

See b582617 and v2.3.0.

@RobThree
Copy link
Owner

RobThree commented Jul 2, 2020

You may also be happy to know that since 2.4.1 (see #24) it is now possible to SpinWait. I still don't recommend it and it still defaults to throwing an exception but...

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants