Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naming and documentation #1

Open
BurningWitness opened this issue Oct 30, 2024 · 3 comments
Open

Naming and documentation #1

BurningWitness opened this issue Oct 30, 2024 · 3 comments

Comments

@BurningWitness
Copy link

  1. Queues are data structures, so I would expect the modules to be called Data.Queue and Data.Queue.Ephemeral respectively;

  2. EphemeralQueue is an extremely unwieldy name. Calling all queues Queue and providing a type synonym for each (RealTimeQueue for the other one) would be more convenient;

  3. Documentation should describe how each queue type operates instead of making "when it doubt" statements.

    Notably:

    While it is certainly common to use a queue ephemerally, it is unusual for a Haskell data structure to
    require ephemeral usage to achieve its stated bounds.

    There aren't all that many data structures implemented in Haskell and pretty much all commonly used ones are spine-strict. Pure queues are inevitably spine-lazy, so we should be able to talk in terms of amortized time complexity instead. I tried to do that with e.g. lazy PATRICIA trees, it'd be nice to share notation across the ecosystem.

  4. I wonder if enqueueFront/enqueue/dequeue should be called cons/snoc/uncons instead. Datatype function definitions are generally expected to clash with Data.List anyway, that's why all such modules are imported qualified.

@mitchellwrosen
Copy link
Member

Hi, could you please clarify what point (3) means? I'm having some trouble understanding.

Regarding (1), (2), and (4), I'm unfortunately not inclined to perform any of these name-changes to satisfy the stated criteria. Thank you for the suggestions.

@BurningWitness
Copy link
Author

BurningWitness commented Nov 6, 2024

For certain data types in the language multiple representations are possible:

  • Spine-lazy. Any operation applied to the data structure is only resolved on demand, propagating through the data structure as needed. The classic example is List/[] from base.

  • Spine-strict. Any operation applied to the datatype creates a fully evaluated altered version on the spot. Everything in containers is spine-strict.

Amortized queues (what this package calls "ephemeral") can be implemented either way:

  • A spine-lazy variant would make it so that any operations on the back of the queue are deferred until it's reversed, while evaluating it is annoying;

  • A spine-strict variant does everything one step at a time and seq evaluates it fully.

Whether or not one would want to use either of these is not a question of performance, the trade-offs are instead:

  • For ephemeral usage, the spine-lazy variant allows faster execution in cases where a lot of elements in the back are never needed. Admittedly this is kinda pointless (enqueue is just a (:), expect to save nothing on average), but anyone who doesn't like that can just use a spine-strict amortized queue instead;

  • For persistent usage, if the user wants constant-time element access, they want spine-strict real-time queues; otherwise they're fine with intermittent lag spikes and a spine-strict amortized queue does just fine.


For the record I rewrote this package to have both of these variants, plus spine-strict real-time deques, so if you're on board with upgrading this package to queues-2 I can make a PR in a few days.

@mitchellwrosen
Copy link
Member

mitchellwrosen commented Nov 6, 2024

Ok. I'm afraid I may have to see some code in order to fully grasp what changes you are proposing we make here.

For now I will say regarding this point:

Amortized queues (what this package calls "ephemeral")

The "ephemeral" queue provided in this package achieves its stated amortized bounds only under ephemeral usage. That is why it is named such. This is distinct from, say, the Banker's queue, which achieves its stated amortized bounds even under persistent usage.

The reason this package only provides an ephemeral "dumb" / "naive" two-list queue and a real-time queue, but not an amortized Banker's queue, is that in my benchmarks the amortized Banker's queue does not out-perform the real-time queue.

The trade-off presented here, then, is between fast EphemeralQueue (if persistence is not needed) and slow Queue (if persistence is needed).

Hope that helps, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants