Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement optional flag for rotating index #37

Merged
merged 7 commits into from
Sep 8, 2024
Merged

Conversation

shayanhabibi
Copy link
Contributor

@shayanhabibi shayanhabibi commented Sep 6, 2024

-d:loonyRotate

This flag indicates that slots are not to be read in sequential order, but rather in a incremental cyclic pattern, stepping over cache lines.

Motivation

Potential reduction in cache coherency overhead from MESI protocol invalidations, false sharing etc

@shayanhabibi shayanhabibi added the enhancement New feature or request label Sep 6, 2024
@shayanhabibi
Copy link
Contributor Author

Is there some benchmark we can challenge this with? There is every potential for this to make almost negligible difference and be a waste of cpu ops on calculating the rotated idx.

@shayanhabibi
Copy link
Contributor Author

Potentially could also chuck #24 in there to see if combined there is a significant performance benefit.

There is a cost to #24 in that the memory consumption of the Queue increases from 3 pointers (typically 24 bytes) to 3 cachelines (typically 192 bytes). Not a big deal considering the number of instances of the Queue we would actually have.

@shayanhabibi shayanhabibi marked this pull request as ready for review September 6, 2024 14:08
Copy link
Contributor

@disruptek disruptek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not export loonyNodeAlignment also? No reason not to merge this.

loony/spec.nim Outdated Show resolved Hide resolved
@disruptek
Copy link
Contributor

disruptek commented Sep 6, 2024

Is there some benchmark we can challenge this with? There is every potential for this to make almost negligible difference and be a waste of cpu ops on calculating the rotated idx.

You can benchmark it, after a fashion, against the insideout tests -- rather stressful but probably not utilizing greater than processorCount() cores.

@shayanhabibi
Copy link
Contributor Author

Rotating flag is now default.

LoonyQueue is now padded to cachelines.

Updated readme.

@shayanhabibi shayanhabibi enabled auto-merge (squash) September 7, 2024 02:49
@shayanhabibi shayanhabibi enabled auto-merge (squash) September 7, 2024 02:57
@shayanhabibi shayanhabibi merged commit cc65bd5 into main Sep 8, 2024
4 checks passed
@shayanhabibi shayanhabibi deleted the rotatedIndex branch September 8, 2024 03:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants