tweak threading model for streamers #206

scottlamb · 2022-03-18T21:02:17Z

In 307a388 I tweaked the threading model for the streamers: before when using Retina, they ran the RTSP code on the main tokio runtime and sent frames over a channel to the per-stream writer threads to write to disk. After, they do everything on the per-stream writer threads.

I expected this to modestly improve CPU usage (at least as much as previous experiments to use reduce the multi-thread runtime to 1 thread or use the "current thread runtime") because it has fewer thread hand-offs for each frame and because it doesn't do the multi-thread runtime's spinning that I previously mentioned here.

Instead, it's a bit slower. On one setup instead of CPU falling from ~13–14% (currently using 1 multi thread) to <=12% it rose to ~16%. Not huge but still, worse when I was expecting better. I guess having the more common work of each recvfrom syscall being spread out over more threads is worse than having extra handoffs per frame.

Ideally I'd be doing writes once per key frame (mentioned previously at #34 and #116). Then it probably really makes more sense to use the channel, and additionally the extra buffering there can keep the RTSP stream from stalling even if disk writes have a moment of poor latency.

I probably won't look at this until after getting rid of moonfire-nvr config, which complicates things due to not being inherently async.

The text was updated successfully, but these errors were encountered:

scottlamb · 2022-04-02T00:59:03Z

Oh, there's an actual bug in the new model. On shutdown, we can get panics like the following:

panic at '/home/slamb/git/retina/src/client/mod.rs:219:22': teardown Sender shouldn't be dropped: RecvError(())

I20220401 16:49:34.810 main moonfire_nvr::cmds::run] Waiting for TEARDOWN requests to complete.
E20220401 16:49:37.682 main moonfire_nvr] panic at '/home/slamb/git/retina/src/client/mod.rs:219:22': teardown Sender shouldn't be dropped: RecvError(())

Backtrace:
   0: failure::backtrace::internal::InternalBacktrace::new
             at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/src/backtrace/internal.rs:46:44
   1: failure::backtrace::Backtrace::new
             at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/src/backtrace/mod.rs:121:35
   2: moonfire_nvr::panic_hook
             at src/main.rs:93:13
   3: core::ops::function::Fn::call
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/ops/function.rs:70:5
   4: core::ops::function::impls::<impl core::ops::function::Fn<A> for &F>::call
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/ops/function.rs:237:13
   5: std::panicking::rust_panic_with_hook
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:610:17
   6: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:502:13
   7: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/sys_common/backtrace.rs:139:18
   8: rust_begin_unwind
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:498:5
   9: core::panicking::panic_fmt
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/panicking.rs:116:14
  10: core::result::unwrap_failed
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/result.rs:1690:5
  11: core::result::Result<T,E>::expect
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/result.rs:975:23
  12: retina::client::SessionGroup::await_teardown::{{closure}}
             at /home/slamb/git/retina/src/client/mod.rs:217:17
  13: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/future/mod.rs:84:19
  14: moonfire_nvr::cmds::run::inner::{{closure}}
             at src/cmds/run/mod.rs:425:43
  15: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/future/mod.rs:84:19
  16: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/future/future.rs:123:9
  17: <&mut F as core::future::future::Future>::poll
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/future/future.rs:111:9
  18: moonfire_nvr::cmds::run::async_run::{{closure}}::{{closure}}
             at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/macros/select.rs:505:49
  19: <tokio::future::poll_fn::PollFn<F> as core::future::future::Future>::poll
             at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/future/poll_fn.rs:38:9
  20: moonfire_nvr::cmds::run::async_run::{{closure}}
             at src/cmds/run/mod.rs:182:5
  21: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/future/mod.rs:84:19
  22: tokio::park::thread::CachedParkThread::block_on::{{closure}}
             at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/park/thread.rs:263:54
  23: tokio::coop::with_budget::{{closure}}
             at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/coop.rs:102:9
  24: std::thread::local::LocalKey<T>::try_with
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/thread/local.rs:412:16
  25: std::thread::local::LocalKey<T>::with
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/thread/local.rs:388:9
  26: tokio::coop::with_budget
             at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/coop.rs:95:5
     tokio::coop::budget
             at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/coop.rs:72:5
      tokio::park::thread::CachedParkThread::block_on
             at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/park/thread.rs:263:31
  27: tokio::runtime::enter::Enter::block_on
             at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/runtime/enter.rs:151:13
  28: tokio::runtime::thread_pool::ThreadPool::block_on
             at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/runtime/thread_pool/mod.rs:73:9
  29: tokio::runtime::Runtime::block_on
             at /home/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.17.0/src/runtime/mod.rs:477:43
  30: moonfire_nvr::cmds::run::run
             at src/cmds/run/mod.rs:147:13
  31: moonfire_nvr::Args::run
             at src/main.rs:68:29
  32: moonfire_nvr::main
             at src/main.rs:143:9

because this code is assuming the tokio runtime that was used to create the session is still around, when it's actually been shut down without waiting for its tasks to complete.

scottlamb · 2022-04-13T19:36:41Z

Hmm, the panic on shutdown should be fixed with 7b0a489, but the performance moved further in the wrong direction. With a 1-thread "multi-threaded" tokio reactor on my main setup:

commit	cpu	flamegraph	notes
ffmpeg (unsure of exact commit)	~16%		old test results, flamegraph. see here.
retina (unsure of exact commit)	~14%		same commit as above, different commandline parameters.
`v0.7.2-1-g307a388` (I think)	~16%	(don't have handy)	old test results, flamegraph. regression with move to separate per-stream single-thread reactor
`v0.7.3-1-g5e7d558`	~15%		as of today. likely improved due to new Rust version, new tokio versions including this one that promises significant performance improvements, etc.
`v0.7.3-2-g7b0a489`	~19%		as of today. another regression
`v0.7.3-3-g967834c`	~16%		clawed back some performance via a `tokio::task::spawn` per frame

Looks like having the block_on calls on one thread outside the reactor is worse than using the channel for one hand-off per frame. It might be incurring a bunch of handoffs for futures it launches or something.

so I'm going to keep this issue open for now. I could go back to the more performant model in which the stream threads pull from a channel, but that has the downside I mentioned in the 7b0a489 commit description that the session could still be open when we start another. And I don't think it's the absolute best model either. I think that's having the streamers started from moonfire_nvr::cmds::run::run being async and pushing data once per key frame to a single thread for each sample file directory. That requires more work to achieve, but as I've mentioned elsewhere, batching frames will be helpful anyway for audio support.

update: 967834c now does a tokio::task::spawn per thread, and we're basically back where we were with ffmpeg. Good enough for now.

scottlamb · 2022-04-13T20:44:19Z

update: 967834c now does a tokio::task::spawn per thread, and we're basically back where we were with ffmpeg. Good enough for now.

scottlamb added rust Rust backend work required performance Performance—throughput and latency labels Mar 18, 2022

scottlamb added the bug label Apr 2, 2022

scottlamb changed the title ~~tweak threading model for streamers~~ tweak threading model for streamers, fix panic at '/home/slamb/git/retina/src/client/mod.rs:219:22': teardown Sender shouldn't be dropped: RecvError(()) Apr 2, 2022

scottlamb changed the title ~~tweak threading model for streamers, fix panic at '/home/slamb/git/retina/src/client/mod.rs:219:22': teardown Sender shouldn't be dropped: RecvError(())~~ tweak threading model for streamers, fix teardown Sender shouldn't be dropped: RecvError(()) Apr 2, 2022

scottlamb mentioned this issue Apr 2, 2022

Camera stops recording after a few days #209

Closed

scottlamb closed this as completed in 7b0a489 Apr 13, 2022

scottlamb reopened this Apr 13, 2022

scottlamb changed the title ~~tweak threading model for streamers, fix teardown Sender shouldn't be dropped: RecvError(())~~ tweak threading model for streamers Apr 13, 2022

scottlamb removed the bug label Apr 13, 2022

scottlamb closed this as completed Apr 13, 2022

scottlamb added a commit that referenced this issue Apr 14, 2022

mention #206 in v0.7.4 changelog

9e6ac27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tweak threading model for streamers #206

tweak threading model for streamers #206

scottlamb commented Mar 18, 2022

scottlamb commented Apr 2, 2022 •

edited

Loading

scottlamb commented Apr 13, 2022 •

edited

Loading

scottlamb commented Apr 13, 2022

tweak threading model for streamers #206

tweak threading model for streamers #206

Comments

scottlamb commented Mar 18, 2022

scottlamb commented Apr 2, 2022 • edited Loading

scottlamb commented Apr 13, 2022 • edited Loading

scottlamb commented Apr 13, 2022

scottlamb commented Apr 2, 2022 •

edited

Loading

scottlamb commented Apr 13, 2022 •

edited

Loading