Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace trees computation tasks with a worker #1303

Merged
merged 3 commits into from
Aug 9, 2024

Conversation

OlivierHecart
Copy link
Contributor

No description provided.

@OlivierHecart OlivierHecart requested a review from Mallets August 8, 2024 15:46
@YuanYuYuan
Copy link
Contributor

Hi @OlivierHecart, I summarize my comments as below.

Recap

  1. TerminatableTask is designed not only for our ZRuntime but also for termination safety.
  2. TerminatableTask will terminate itself once it was dropped.
  3. If we want to spawn a background task, we can achieve this via ZRuntime::Net::spawn(...).
  4. Background tasks spreaded everywhere will make us hard to monitor their status. That's why we adopt a worker pattern in this PR.
  5. Repeatedly spawning a task causes some overhead compared to one single worker that is spawned at most once.

Review

  1. Maybe we can explicitly use a Worker as below to clarify the content?
  2. We can use spawn_abortable if we don't care about the cancellation token.
  3. Regarding the TREES_COMPUTATION_DELAY_MS delay, I guess we needed this since we want to slow down the new tasks scheduled on the JoinHandle.
    However, with the new worker pattern, we make sure it would only take the next job (TablesLock) once it finished the computation.
struct HatTables {
    peer_subs: HashSet<Arc<Resource>>,
    peer_qabls: HashSet<Arc<Resource>>,
    peers_net: Option<Network>,
    worker: Worker,
}

struct Worker {
    // NOTE: Unless we need to have a pulic API, it's unecessary to implement Worker::terminate to terminate the task since this is
    // automatically called when Worker is dropped
    _task: TerminatableTask,
    tx: flume::Sender<Arc<TablesLock>>,
}

impl Worker {
    fn new() -> Self {
        let (tx, rx) = flume::bounded::<Arc<TablesLock>>(1);
        // NOTE: it seems that we don't care about the cancellation token, so we can use `spawn_abortable` here
        let task = TerminatableTask::spawn_abortable(zenoh_runtime::ZRuntime::Net, async move {
            loop {
                // NOTE: Why do we need this delay?
                tokio::time::sleep(std::time::Duration::from_millis(
                    *TREES_COMPUTATION_DELAY_MS,
                ))
                .await;

                if let Ok(tables_ref) = rx.recv_async().await {
                    // NOTE: Blocking codes omitted
                }
            }
        });
        Self { _task: task, tx }
    }
}

impl HatTables {
    fn new() -> Self {
        Self {
            peer_subs: HashSet::new(),
            peer_qabls: HashSet::new(),
            peers_net: None,
            worker: Worker::new(),
        }
    }

    fn schedule_compute_trees(&mut self, tables_ref: Arc<TablesLock>) {
        tracing::trace!("Schedule trees computation");
        // NOTE: What if this try_send fails?
        let _ = self.worker.tx.try_send(tables_ref);
    }
}

@OlivierHecart
Copy link
Contributor Author

OlivierHecart commented Aug 9, 2024

About the TREES_COMPUTATION_DELAY_MS:
Routers receive events (LinkState msgs) that modify the network graph and require a tree recomputation. As those computations are expensive we'd like to perform a single computation for multiple events. The TREES_COMPUTATION_DELAY_MS (both in old and new pattern) allows to give a chance for other events to arrive before performing the computation. (Note those events tend to arrive in groups as a change in the graph will induce a LinkState msg from all impacted nodes).

This is why we use a bounded queue of length 1 and try_send method. If a computation is already scheduled, no need to schedule again.

@YuanYuYuan
Copy link
Contributor

LGTM

@OlivierHecart OlivierHecart merged commit fef38dc into main Aug 9, 2024
23 checks passed
@OlivierHecart OlivierHecart deleted the bug/trees_computation_task_fix branch August 9, 2024 14:29
OlivierHecart added a commit that referenced this pull request Aug 9, 2024
* Replace trees computation tasks with a worker

* Address review comments

* Remove review comments
gmartin82 pushed a commit to ZettaScaleLabs/zenoh that referenced this pull request Aug 19, 2024
* Replace trees computation tasks with a worker

* Address review comments

* Remove review comments
milyin pushed a commit that referenced this pull request Aug 27, 2024
* Replace trees computation tasks with a worker

* Address review comments

* Remove review comments
Mallets added a commit that referenced this pull request Aug 27, 2024
* Add NOTE for LowLatency transport. (#1088)

Signed-off-by: ChenYing Kuo <evshary@gmail.com>

* fix: Improve debug messages in `zenoh-transport` (#1090)

* fix: Improve debug messages for failing RX/TX tasks

* fix: Improve debug message for `accept_link` timeout

* chore: Fix `clippy::redundant_pattern_matching` error

* Improve pipeline backoff (#1097)

* Yield task for backoff

* Improve comments and error handling in backoff

* Simplify pipeline pull

* Consider backoff configuration

* Add typos check to CI (#1065)

* Fix typos

* Add typos check to CI

* Start link tx_task before notifying router (#1098)

* Fix typos (#1110)

* bump quinn & rustls (#1086)

* bump quinn & rustls

* fix ci windows check

* add comments

* Fix interface name scanning when listening on IP unspecified for TCP/TLS/QUIC/WS (#1123)

Co-authored-by: Julien Enoch <julien.e@zettascale.tech>

* Enable releasing from any branch (#1136)

* Fix cargo clippy (#1145)

* Release tables locks before propagating subscribers and queryables declarations to void dead locks (#1150)

* Send simple sub and qabl declarations using a given function

* Send simple sub and qabl declarations after releasing tables lock

* Send simple sub and qabl declarations after releasing tables lock (missing places)

* feat: make `TerminatableTask` terminate itself when dropped (#1151)

* Fix bug in keyexpr::includes leading to call get_unchecked on empty array UB (#1208)

* REST plugin uses unbounded flume channels for queries (#1213)

* fix: typo in selector.rs (#1228)

* fix: zenohd --cfg (#1263)

* fix: zenohd --cfg

* ci: trigger

* Update zenohd/src/main.rs

---------

Co-authored-by: Luca Cominardi <luca.cominardi@gmail.com>

* Fix failover brokering bug reacting to linkstate changes (#1272)

* Change missleading log

* Fix failover brokering bug reacting to linkstate changes

* Retrigger CI

---------

Co-authored-by: Luca Cominardi <luca.cominardi@gmail.com>

* Code format

* Fix clippy warnings

* Code format

* Fix Clippy errors from Rust 1.80 (#1273)

* Allow unexpected `doc_auto_cfg` flag

* Keep never-constructed logger interceptor

* Ignore interior mutability of `Resource`

* Fix typo

* Resolve `clippy::doc-lazy-continuation` errors

* Upgrade `time@0.3.28` to `time@0.3.36`

See time-rs/time#693

* Update Cargo.toml (#1277)

Updated description to be aligned with what we use everywhere else

* fix: typos (#1297)

* Replace trees computation tasks with a worker (#1303)

* Replace trees computation tasks with a worker

* Address review comments

* Remove review comments

* zenohd-default config error #1292 (#1298)

* Zenohd panic when tring load file

When zenohd trying load file, if it have a problem it crash cause another treat was "unwrap", and it return to a type config. So, it crash and cause painic.

* zenohd default config error #1292

When tring load config file defined by -c option. With haver any problema "unwrap" has been to Config type.

I treat it return a Default Config whe it happen

* If file fail when try load configs

If file fail when try load configs

* Update main.rs

* Resolve typos at comment

Resolve typos at comment

* fix: typos (#1297)

* zenohd-default config error #1292 (#1298)

* Zenohd panic when tring load file

When zenohd trying load file, if it have a problem it crash cause another treat was "unwrap", and it return to a type config. So, it crash and cause painic.

* zenohd default config error #1292

When tring load config file defined by -c option. With haver any problema "unwrap" has been to Config type.

I treat it return a Default Config whe it happen

* If file fail when try load configs

If file fail when try load configs

* Update main.rs

* Resolve typos at comment

Resolve typos at comment

* Replace trees computation tasks with a worker (#1303)

* Replace trees computation tasks with a worker

* Address review comments

* Remove review comments

* revering fix #1298

---------

Signed-off-by: ChenYing Kuo <evshary@gmail.com>
Co-authored-by: ChenYing Kuo (CY) <evshary@gmail.com>
Co-authored-by: Mahmoud Mazouz <mazouz.mahmoud@outlook.com>
Co-authored-by: Luca Cominardi <luca.cominardi@gmail.com>
Co-authored-by: Tavo Annus <tavo.annus@gmail.com>
Co-authored-by: JLer <jlerxky@live.com>
Co-authored-by: Julien Enoch <julien.e@zettascale.tech>
Co-authored-by: OlivierHecart <olivier@zettascale.tech>
Co-authored-by: Yuyuan Yuan <az6980522@gmail.com>
Co-authored-by: Diogo Matsubara <diogo.matsubara@zettascale.tech>
Co-authored-by: OlivierHecart <olivier.hecart@adlinktech.com>
Co-authored-by: kydos <kydos@protonmail.com>
Co-authored-by: brianPA <80439594+brian049@users.noreply.github.com>
Co-authored-by: Tiago Neves <32251249+anhaabaete@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants