Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch semaphore, causality tracking #151

Merged
merged 4 commits into from
Jul 24, 2024

Conversation

Aurel300
Copy link
Contributor

@Aurel300 Aurel300 commented Jul 10, 2024

This adds batch semaphores into Shuttle (not originally implemented by me) as the core primitive to be used for other synchronisation primitives. Vector clocks were added to the batch semaphore to track causality across acquire, release, and try_acquire calls.

Implementing other primitives using the batch semaphore will thus also cause causality to be tracked. The eventual goal is for primitives to only rely on the API of batch semaphore, without calls to e.g. ExecutionState. As a result, batch semaphore and the runtime internals can be pulled out of Shuttle into a separate shuttle-core crate, while both the stdlib and tokio primitives can be placed into new sibling crates, shuttle-stdlib and shuttle-tokio, respectively.

An open question is how to proceed with the crate restructuring. One option is for the crate called shuttle now to "become" shuttle-stdlib, as it will contain the same primitives, but this would require various re-exports. We could also publish a wrapper crate called shuttle (intended to be deprecated?) which will re-export items from both shuttle-core and shuttle-stdlib.

Planned for this PR still:

  • Test cases.
    • Test case to show where causality tracking is not precise.
  • Option for the semaphore to not be strictly fair, by waking all feasible waiters at once, then letting the scheduler resolve the race.

@sarsko
Copy link
Contributor

sarsko commented Jul 11, 2024

Sweet.

[...] while both the stdlib and tokio primitives can be placed into new sibling crates, shuttle-stdlib and shuttle-tokio, respectively.

An open question is how to proceed with the crate restructuring. One option is for the crate called shuttle now to "become" shuttle-stdlib, as it will contain the same primitives, but this would require various re-exports. We could also publish a wrapper crate called shuttle (intended to be deprecated?) which will re-export items from both shuttle-core and shuttle-stdlib.

There's more.

Shuttle currently:

  1. "Core": all the functionality to run the program under test, ie starting up, pausing and resuming tasks, shutting down, etc.
  2. The schedulers (scheduler).
  3. std-lib mirrors (sync,thread.rs, hint.rs, thread_local! in lib.rs).
  4. future, ie a mix of futures::executor and tokio::task::JoinHandle.
  5. rand mirror.
  6. lazy_static mirror.
  7. The "external" interface: lib.rs (minus thread_local! and lazy_static!) and current.rs

My opinion is that all of these should be its own crate.

The question then becomes whether something which is called Shuttle should stick around. I think the thing called Shuttle will be the crate corresponding to bullet number 7. There is a question of whether it should also provide reexports of some of the other crates. I have not thought about that enough to have a definitive opinion, but I am generally of the opinion that the correct approach is to use the wrapper crates over what is provided by Shuttle directly. I am also of the opinion that future should be deprecated (ie it should not be provided by Shuttle, but may be provided by some crate).

@Aurel300 Aurel300 force-pushed the feature/branch-semaphore branch from a8f1676 to e335688 Compare July 11, 2024 04:51
src/future/batch_semaphore.rs Outdated Show resolved Hide resolved
src/future/batch_semaphore.rs Outdated Show resolved Hide resolved
src/future/batch_semaphore.rs Outdated Show resolved Hide resolved
src/future/batch_semaphore.rs Outdated Show resolved Hide resolved
src/future/batch_semaphore.rs Outdated Show resolved Hide resolved
src/future/batch_semaphore.rs Outdated Show resolved Hide resolved
src/future/batch_semaphore.rs Outdated Show resolved Hide resolved
@@ -57,17 +226,26 @@ struct BatchSemaphoreState {
//
// (4) closed ==> waiters.is_empty()
waiters: VecDeque<Arc<Waiter>>,
permits_available: usize,
permits_available: PermitsAvailable,
// TODO: should there be a clock for the close event?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, think we need to add a causal dependency from the close event to subsequent Closed error returns. But we can do that in a followup.

Copy link
Contributor Author

@Aurel300 Aurel300 Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I tried to implement this, but ran into issues with getting a clock in the close method, because it may happen as part of the cleanup process/drop handlers. We talked with @sarsko a bit about re-organising how drops happen, which would unblock this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we track this in an issue?

src/future/batch_semaphore.rs Outdated Show resolved Hide resolved
@Aurel300 Aurel300 force-pushed the feature/branch-semaphore branch 7 times, most recently from 3d6275d to 9b041d4 Compare July 18, 2024 23:02
src/future/batch_semaphore.rs Outdated Show resolved Hide resolved
tests/future/batch_semaphore.rs Show resolved Hide resolved
tests/future/batch_semaphore.rs Outdated Show resolved Hide resolved
tests/future/batch_semaphore.rs Outdated Show resolved Hide resolved
tests/future/batch_semaphore.rs Outdated Show resolved Hide resolved
@Aurel300 Aurel300 force-pushed the feature/branch-semaphore branch from 9b041d4 to 835cbce Compare July 22, 2024 22:57
@Aurel300 Aurel300 force-pushed the feature/branch-semaphore branch from 835cbce to 5a4df57 Compare July 23, 2024 00:26
@Aurel300 Aurel300 force-pushed the feature/branch-semaphore branch from 5a4df57 to 0cb9d32 Compare July 23, 2024 00:35
use test_log::test;

#[test]
fn batch_semaphore_basic() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we need the batch_semaphore prefix for all of the tests — they are in the batch_semaphore.rs file after all.

/// repeats. Neither thread can be blocked in this situation, and so neither
/// thread should causally depend on the other, but currently they do.
#[test]
#[should_panic(expected = "doesn't satisfy predicate")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be should_panic, but ignore no ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this specifies how the semaphore actually behaves, i.e. it will panic this way. If it ever starts failing (and the check succeeds), then the behaviour has changed and the test should be updated by removing the should_panic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Why do we want to waste CPU cycles to confirm that we have not improved Shuttle? I like the mechanical-ness of "oh hey btw here's a test which now works", but that could also be gotten by running this test (and optionally other tests) in CI.

Not blocking for me, I'll leave it for others to decide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I asked Aurel to add this, so we have a scenario showing that our causality tracking is imprecise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I asked Aurel to add this, so we have a scenario showing that our causality tracking is imprecise.

?

I don't disagree on the test but whether it should be ignore (and not should_panic) or should_panic (and not ignore)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two cents: with the explanation in the comment, the risk of someone mistaking this test as desirable behavior is pretty low. I think it's better to run the test than not to run it. If the test starts to break without addressing the issue, we'd like to know about it, instead of hoping that it still does what we want when running it manually.

Aurel300 and others added 4 commits July 22, 2024 18:17
Co-authored-by: Rajeev Joshi <jorajeev@amazon.com>
Co-authored-by: Sarek Høverstad Skotåm <44001885+sarsko@users.noreply.github.com>
@Aurel300 Aurel300 force-pushed the feature/branch-semaphore branch from 0cb9d32 to 2b9cc6a Compare July 23, 2024 01:18
@Aurel300 Aurel300 marked this pull request as ready for review July 23, 2024 01:20
@jorajeev jorajeev merged commit a1a88b3 into awslabs:main Jul 24, 2024
5 checks passed
@Aurel300 Aurel300 deleted the feature/branch-semaphore branch July 24, 2024 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants