Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement MatchingEngineStateManager #1041

Conversation

adam-singer
Copy link
Contributor

@adam-singer adam-singer commented Jun 20, 2024

Description

Implement MatchingEngineStateManager

Implement MatchingEngineStateManager for StateManager. This hides implementation
logic for do_try_match() behind MatchingEngineStateManager on filter_operations and
update_operation. tasks_or_workers_change_notify has been moved into the StateManager.

Fixes #1043

Type of change

Please delete options that aren't relevant.

  • Refactor

How Has This Been Tested?

bazel test //...

Checklist

  • Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

@adam-singer adam-singer force-pushed the adams/impelment-matching-engine-trait branch 4 times, most recently from 69aee08 to f7f3e9d Compare June 21, 2024 23:22
@adam-singer adam-singer changed the title [WIP] Implement MatchingEngineStateManager Implement MatchingEngineStateManager Jun 21, 2024
@adam-singer adam-singer marked this pull request as ready for review June 21, 2024 23:24
@adam-singer adam-singer requested review from allada and zbirenbaum June 21, 2024 23:25
Copy link
Contributor

@zbirenbaum zbirenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, macos-13, windows-2022 / stable, and 4 discussions need to be resolved


nativelink-scheduler/src/simple_scheduler.rs line 257 at r1 (raw file):

            &self.state_manager,
            OperationFilter {
                stages: OperationStageFlags::Queued,

Why only return Queued actions? Shouldn't this be any stage or passed as a parameter to the function?


nativelink-scheduler/src/scheduler_state/matching_engine_action_state_result.rs line 43 at r1 (raw file):

    }

    async fn as_action_info(&self) -> Result<Arc<ActionInfo>, Error> {

Do we still need this


nativelink-scheduler/src/scheduler_state/state_manager.rs line 328 at r1 (raw file):

    }

    async fn update_operation(

Lets see if we can break this up into some helper functions so it's a bit easier to document and break down what's happening here


nativelink-scheduler/src/scheduler_state/state_manager.rs line 340 at r1 (raw file):

        {
            // worker_id related updates
            // We do not have the infra to handle this code path currently

probably don't need this comment

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 8 files at r1, all commit messages.
Reviewable status: 0 of 1 LGTMs obtained, and 12 discussions need to be resolved


nativelink-scheduler/src/simple_scheduler.rs line 288 at r1 (raw file):

                    // TODO(adams): error handling here.
                    let action_info: Arc<ActionInfo> =
                        action_state_result.as_action_info().await.unwrap();

don't use unwrap.


nativelink-scheduler/src/simple_scheduler.rs line 290 at r1 (raw file):

                        action_state_result.as_action_info().await.unwrap();

                    // TODO(adams): AwaitedAction has action_info: Arc<ActionInfo> and current_state: Arc<ActionState>

note: comment.


nativelink-scheduler/src/simple_scheduler.rs line 307 at r1 (raw file):

                    };

                    // TODO(adams): What happens when we have two concurrent OperationsId

nit: comment.


nativelink-scheduler/src/scheduler_state/state_manager.rs line 96 at r1 (raw file):

                let send_result = if awaited_action.attempts >= self.inner.max_job_retries {
                    self.inner.metrics.retry_action_max_attempts_reached.inc();
                    Arc::make_mut(&mut awaited_action.current_state).stage = ActionStage::Completed(ActionResult {

nit: Use new function introduced in other PR?


nativelink-scheduler/src/scheduler_state/state_manager.rs line 114 at r1 (raw file):

                } else {
                    self.inner.metrics.retry_action.inc();
                    Arc::make_mut(&mut awaited_action.current_state).stage = ActionStage::Queued;

nit: ditto.


nativelink-scheduler/src/scheduler_state/state_manager.rs line 343 at r1 (raw file):

            if let Some(worker_id) = worker_id {
                // Here we would remove the worker and assign it to a new one

nit: comment.


nativelink-scheduler/src/scheduler_state/state_manager.rs line 388 at r1 (raw file):

                match action_stage {
                    Ok(action_stage) => {
                        Arc::make_mut(&mut awaited_action.current_state).stage = action_stage;

nit: ditto.


nativelink-scheduler/src/scheduler_state/workers.rs line 117 at r1 (raw file):

            }),
        };
        workers_iter.map(|(_, w)| &w.id).copied()

I don't think this counts as "touch"ing the item. To touch it we need to call get_mut I believe.

Copy link
Contributor

@zbirenbaum zbirenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and 12 discussions need to be resolved


nativelink-scheduler/src/scheduler_state/workers.rs line 117 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

I don't think this counts as "touch"ing the item. To touch it we need to call get_mut I believe.

It's enough to just call get it delegates to get_mut under the hood.


    /// Returns a reference to the value of the key in the cache or `None` if it is not
    /// present in the cache. Moves the key to the head of the LRU list if it exists.
    ///
    /// # Example
    ///
    /// ```
    /// use lru::LruCache;
    /// use std::num::NonZeroUsize;
    /// let mut cache = LruCache::new(NonZeroUsize::new(2).unwrap());
    ///
    /// cache.put(1, "a");
    /// cache.put(2, "b");
    /// cache.put(2, "c");
    /// cache.put(3, "d");
    ///
    /// assert_eq!(cache.get(&1), None);
    /// assert_eq!(cache.get(&2), Some(&"c"));
    /// assert_eq!(cache.get(&3), Some(&"d"));
    /// ```
    pub fn get<'a, Q>(&'a mut self, k: &Q) -> Option<&'a V>
    where
        K: Borrow<Q>,
        Q: Hash + Eq + ?Sized,
    {
        if let Some(node) = self.map.get_mut(KeyWrapper::from_ref(k)) {
            let node_ptr: *mut LruEntry<K, V> = node.as_ptr();

            self.detach(node_ptr);
            self.attach(node_ptr);

            Some(unsafe { &*(*node_ptr).val.as_ptr() })
        } else {
            None
        }
    }

Copy link
Contributor Author

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and 12 discussions need to be resolved


nativelink-scheduler/src/simple_scheduler.rs line 257 at r1 (raw file):

Previously, zbirenbaum (Zach Birenbaum) wrote…

Why only return Queued actions? Shouldn't this be any stage or passed as a parameter to the function?

I assume we are only running matching engine on actions that are queued, tbh, I don't think the implementation honors this atm and directly checks queued actions


nativelink-scheduler/src/scheduler_state/matching_engine_action_state_result.rs line 43 at r1 (raw file):

Previously, zbirenbaum (Zach Birenbaum) wrote…

Do we still need this

@zbirenbaum can you suggest what should be done otherwise? I'll revisit some of the call sites, iirc we didn't or don't have a way to go from ActionState -> ActionInfo. ah.. ActionState.id: OperationId means we can look up the action info in one of the queued_actions_set: HashSet<Arc<ActionInfo>>, queued_actions: BTreeMap<Arc<ActionInfo>, AwaitedAction>, active_actions: HashMap<Arc<ActionInfo>, AwaitedAction> ?


nativelink-scheduler/src/scheduler_state/state_manager.rs line 96 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: Use new function introduced in other PR?

I'll move this over once the other is landed

Copy link
Contributor

@zbirenbaum zbirenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and 12 discussions need to be resolved


nativelink-scheduler/src/scheduler_state/matching_engine_action_state_result.rs line 43 at r1 (raw file):

Previously, adam-singer (Adam Singer) wrote…

@zbirenbaum can you suggest what should be done otherwise? I'll revisit some of the call sites, iirc we didn't or don't have a way to go from ActionState -> ActionInfo. ah.. ActionState.id: OperationId means we can look up the action info in one of the queued_actions_set: HashSet<Arc<ActionInfo>>, queued_actions: BTreeMap<Arc<ActionInfo>, AwaitedAction>, active_actions: HashMap<Arc<ActionInfo>, AwaitedAction> ?

Exactly, given the OperationId you have the unique_qualifier so you can look up in any of those. Ideally you would do the lookup in queued_actions_set and active_actions since those are optimized for it, and then update the BTreeMap if need to mutate queued_actions_set

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and 12 discussions need to be resolved


nativelink-scheduler/src/simple_scheduler.rs line 257 at r1 (raw file):

Previously, adam-singer (Adam Singer) wrote…

I assume we are only running matching engine on actions that are queued, tbh, I don't think the implementation honors this atm and directly checks queued actions

nit: Can we rename this get_queued_operations()? The current name is not clear what it is going to return.

Copy link
Contributor Author

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and 11 discussions need to be resolved


nativelink-scheduler/src/scheduler_state/matching_engine_action_state_result.rs line 43 at r1 (raw file):

Previously, zbirenbaum (Zach Birenbaum) wrote…

Exactly, given the OperationId you have the unique_qualifier so you can look up in any of those. Ideally you would do the lookup in queued_actions_set and active_actions since those are optimized for it, and then update the BTreeMap if need to mutate queued_actions_set

Discussed offline with @zbirenbaum , no change for now

Copy link
Contributor

@zbirenbaum zbirenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and 11 discussions need to be resolved


nativelink-scheduler/src/scheduler_state/workers.rs line 117 at r1 (raw file):

Previously, zbirenbaum (Zach Birenbaum) wrote…

It's enough to just call get it delegates to get_mut under the hood.


    /// Returns a reference to the value of the key in the cache or `None` if it is not
    /// present in the cache. Moves the key to the head of the LRU list if it exists.
    ///
    /// # Example
    ///
    /// ```
    /// use lru::LruCache;
    /// use std::num::NonZeroUsize;
    /// let mut cache = LruCache::new(NonZeroUsize::new(2).unwrap());
    ///
    /// cache.put(1, "a");
    /// cache.put(2, "b");
    /// cache.put(2, "c");
    /// cache.put(3, "d");
    ///
    /// assert_eq!(cache.get(&1), None);
    /// assert_eq!(cache.get(&2), Some(&"c"));
    /// assert_eq!(cache.get(&3), Some(&"d"));
    /// ```
    pub fn get<'a, Q>(&'a mut self, k: &Q) -> Option<&'a V>
    where
        K: Borrow<Q>,
        Q: Hash + Eq + ?Sized,
    {
        if let Some(node) = self.map.get_mut(KeyWrapper::from_ref(k)) {
            let node_ptr: *mut LruEntry<K, V> = node.as_ptr();

            self.detach(node_ptr);
            self.attach(node_ptr);

            Some(unsafe { &*(*node_ptr).val.as_ptr() })
        } else {
            None
        }
    }

@allada I think this is a better way of handling find_worker_for_action returning an ID being a essentially peek operation and then updating the LRU when the worker is actually used/sent an update makes the most sense and causes the least borrow checker issues. The LRU cache is still updated inside of update_action when the update is sent here

async fn update_operation(
    &mut self,
    operation_id: OperationId,
    worker_id: Option<WorkerId>,
    action_stage: Result<ActionStage, Error>,
) -> Result<(), Error> {
if let Some(action_info) = self
    .inner
    .queued_actions_set
    .get(&operation_id.unique_qualifier)
{
    // worker_id related updates
    // We do not have the infra to handle this code path currently
    if let Some(worker_id) = worker_id {
        // Here we would remove the worker and assign it to a new one

        let worker = self.inner.workers.workers.get_mut(&worker_id);

        let action_info = action_info.clone();

        if let Some(worker) = worker {
            let notify_worker_result =
                worker.notify_update(WorkerUpdate::RunAction(action_info.clone()));

@adam-singer adam-singer force-pushed the adams/impelment-matching-engine-trait branch from f7f3e9d to 8162c16 Compare June 24, 2024 21:31
Copy link
Contributor Author

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and 6 discussions need to be resolved


nativelink-scheduler/src/simple_scheduler.rs line 257 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: Can we rename this get_queued_operations()? The current name is not clear what it is going to return.

Done.


nativelink-scheduler/src/simple_scheduler.rs line 288 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

don't use unwrap.

Done.


nativelink-scheduler/src/simple_scheduler.rs line 290 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

note: comment.

Done.


nativelink-scheduler/src/simple_scheduler.rs line 307 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: comment.

@allada is there a way we can handle this currently? My understanding is we planned to defer this or not able to handle it for some reasons, tbh I forgot the reasons for that.


nativelink-scheduler/src/scheduler_state/state_manager.rs line 96 at r1 (raw file):

Previously, adam-singer (Adam Singer) wrote…

I'll move this over once the other is landed

Done.


nativelink-scheduler/src/scheduler_state/state_manager.rs line 114 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: ditto.

Done.


nativelink-scheduler/src/scheduler_state/state_manager.rs line 340 at r1 (raw file):

Previously, zbirenbaum (Zach Birenbaum) wrote…

probably don't need this comment

Done.


nativelink-scheduler/src/scheduler_state/state_manager.rs line 343 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: comment.

Done.


nativelink-scheduler/src/scheduler_state/state_manager.rs line 388 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: ditto.

Done.


nativelink-scheduler/src/scheduler_state/workers.rs line 117 at r1 (raw file):

Previously, zbirenbaum (Zach Birenbaum) wrote…

@allada I think this is a better way of handling find_worker_for_action returning an ID being a essentially peek operation and then updating the LRU when the worker is actually used/sent an update makes the most sense and causes the least borrow checker issues. The LRU cache is still updated inside of update_action when the update is sent here

async fn update_operation(
    &mut self,
    operation_id: OperationId,
    worker_id: Option<WorkerId>,
    action_stage: Result<ActionStage, Error>,
) -> Result<(), Error> {
if let Some(action_info) = self
    .inner
    .queued_actions_set
    .get(&operation_id.unique_qualifier)
{
    // worker_id related updates
    // We do not have the infra to handle this code path currently
    if let Some(worker_id) = worker_id {
        // Here we would remove the worker and assign it to a new one

        let worker = self.inner.workers.workers.get_mut(&worker_id);

        let action_info = action_info.clone();

        if let Some(worker) = worker {
            let notify_worker_result =
                worker.notify_update(WorkerUpdate::RunAction(action_info.clone()));

@allada any other concerns related to that?

@adam-singer adam-singer force-pushed the adams/impelment-matching-engine-trait branch from 8162c16 to 007177a Compare June 24, 2024 23:28
Copy link
Contributor Author

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), integration-tests (20.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, vale, windows-2022 / stable, and 5 discussions need to be resolved


nativelink-scheduler/src/scheduler_state/state_manager.rs line 328 at r1 (raw file):

Previously, zbirenbaum (Zach Birenbaum) wrote…

Lets see if we can break this up into some helper functions so it's a bit easier to document and break down what's happening here

Done.

@adam-singer adam-singer requested review from allada and zbirenbaum June 24, 2024 23:30
Copy link
Contributor

@zbirenbaum zbirenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: Awesome work! Looks great

Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: 1 of 1 LGTMs obtained, and 4 discussions need to be resolved


nativelink-scheduler/src/simple_scheduler.rs line 307 at r1 (raw file):

Previously, adam-singer (Adam Singer) wrote…

@allada is there a way we can handle this currently? My understanding is we planned to defer this or not able to handle it for some reasons, tbh I forgot the reasons for that.

There can't be two OperationIds with the same unique qualifier for the in memory version of the state manager so it doesn't need to be handled here. For the distributed version we can just return the first detected OperationId containing the corresponding unique qualifier.

You shouldn't have to handle this edge case here, the state manager can abstract it away for now when it's relevant.


nativelink-scheduler/src/scheduler_state/state_manager.rs line 340 at r1 (raw file):

Previously, adam-singer (Adam Singer) wrote…

Done.

So much cleaner, awesome change!

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r2, all commit messages.
Reviewable status: 1 of 1 LGTMs obtained, and 3 discussions need to be resolved


nativelink-scheduler/src/simple_scheduler.rs line 282 at r2 (raw file):

                while let Some(action_state_result) = stream.next().await {
                    let action_info_result = action_state_result.as_action_info().await;
                    let action_info = match action_info_result {

nit, rust has new better syntax:

let Ok(action_info) = action_state_result.as_action_info().await else {
    event!(
        Level::ERROR,
        ?err,
        "Failed to get action_info from action_state_results stream"
    );
    continue;
};

nativelink-scheduler/src/simple_scheduler.rs line 445 at r2 (raw file):

                self.state_manager
                    .inner
                    .tasks_or_workers_change_notify

nit: Lets put this in update_operation bindly.


nativelink-scheduler/src/scheduler_state/workers.rs line 117 at r1 (raw file):

Previously, adam-singer (Adam Singer) wrote…

@allada any other concerns related to that?

All good.

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 2 of 1 LGTMs obtained, and 3 discussions need to be resolved

@adam-singer adam-singer force-pushed the adams/impelment-matching-engine-trait branch from 007177a to 287b104 Compare June 25, 2024 20:32
Copy link
Contributor Author

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 1 LGTMs obtained, and pending CI: Local / ubuntu-22.04, Vercel, docker-compose-compiles-nativelink (20.04), pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, vale, and 2 discussions need to be resolved


nativelink-scheduler/src/simple_scheduler.rs line 445 at r2 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: Lets put this in update_operation bindly.

Done.


nativelink-scheduler/src/scheduler_state/workers.rs line 117 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

All good.

Done.

Copy link
Contributor Author

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable, and 2 discussions need to be resolved


nativelink-scheduler/src/simple_scheduler.rs line 282 at r2 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit, rust has new better syntax:

let Ok(action_info) = action_state_result.as_action_info().await else {
    event!(
        Level::ERROR,
        ?err,
        "Failed to get action_info from action_state_results stream"
    );
    continue;
};

In the else case how do you extract the err for error message? One alternative that does provide a handle to the err within the Result and using the let type matching for Ok without directly unpacking err was

                    let action_state_result = action_state_result.as_action_info().await;
                    let Ok(action_info) = action_state_result else {
                        let _ = action_state_result.inspect_err(|err| {
                            event!(
                                Level::ERROR,
                                ?err,
                                "Failed to get action_info from action_state_results stream"
                            );
                        });
                        continue;
                    };

I think the purpose of the code is "on error run this side effecting function of logging", slight indication we are moving forward without halting of an error in this case, which is maybe more of a warning this this case?

@adam-singer adam-singer force-pushed the adams/impelment-matching-engine-trait branch from 287b104 to 6cd0687 Compare June 25, 2024 20:51
Copy link
Contributor Author

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Bazel Dev / ubuntu-22.04, Local / ubuntu-22.04, Vercel, asan / ubuntu-22.04, pre-commit-checks, ubuntu-22.04, vale, and 1 discussions need to be resolved


nativelink-scheduler/src/simple_scheduler.rs line 282 at r2 (raw file):

Previously, adam-singer (Adam Singer) wrote…

In the else case how do you extract the err for error message? One alternative that does provide a handle to the err within the Result and using the let type matching for Ok without directly unpacking err was

                    let action_state_result = action_state_result.as_action_info().await;
                    let Ok(action_info) = action_state_result else {
                        let _ = action_state_result.inspect_err(|err| {
                            event!(
                                Level::ERROR,
                                ?err,
                                "Failed to get action_info from action_state_results stream"
                            );
                        });
                        continue;
                    };

I think the purpose of the code is "on error run this side effecting function of logging", slight indication we are moving forward without halting of an error in this case, which is maybe more of a warning this this case?

Done.

Copy link
Contributor Author

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Bazel Dev / ubuntu-22.04, Local / ubuntu-22.04, Vercel, asan / ubuntu-22.04, pre-commit-checks, ubuntu-22.04, vale, and 1 discussions need to be resolved


nativelink-scheduler/src/simple_scheduler.rs line 307 at r1 (raw file):

Previously, zbirenbaum (Zach Birenbaum) wrote…

There can't be two OperationIds with the same unique qualifier for the in memory version of the state manager so it doesn't need to be handled here. For the distributed version we can just return the first detected OperationId containing the corresponding unique qualifier.

You shouldn't have to handle this edge case here, the state manager can abstract it away for now when it's relevant.

Closing out (after comment update) assuming @zbirenbaum has answered/resolved concerns here?

Implement `MatchingEngineStateManager` for `StateManager`. This hides implementation
logic for `do_try_match()` behind `MatchingEngineStateManager` on `filter_operations` and
`update_operation`. `tasks_or_workers_change_notify` has been moved into the `StateManager`.
@adam-singer adam-singer force-pushed the adams/impelment-matching-engine-trait branch from 6cd0687 to 12ad6b3 Compare June 25, 2024 20:53
Copy link
Contributor Author

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Vercel, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, vale, windows-2022 / stable


nativelink-scheduler/src/simple_scheduler.rs line 307 at r1 (raw file):

Previously, adam-singer (Adam Singer) wrote…

Closing out (after comment update) assuming @zbirenbaum has answered/resolved concerns here?

Done.

@adam-singer adam-singer merged commit 684dbc1 into TraceMachina:main Jun 25, 2024
29 checks passed
@adam-singer adam-singer deleted the adams/impelment-matching-engine-trait branch June 25, 2024 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement MatchingEngineStateManager
3 participants