Regression: RedisStorage in v0.6.0 #466

AzHicham · 2024-11-28T12:24:24Z

Hello,

I started to test the latest release 0.6.0 with Redis and found a regression.

I noticed that all tasks present in the redis queue NAMESPACE:active are fetched right away by a worker even though I'm using the following config:

let config = Config::default()
        .set_namespace(T::BROKER_NAMESPACE_JOB)
        .set_buffer_size(1)   // **Setting a buffer size of 1 to not fetch more than a task, allowing potential other workers to fetch them**
        .set_poll_interval(Duration::from_millis(1000))
        .set_max_retries(1)
        .set_keep_alive(Duration::from_secs(120));

let product_worker_name = format!("Worker-{}-{}", T::PRODUCT_ID, random::<u32>());

let product_worker = WorkerBuilder::new(product_worker_name)
        .layer(ConcurrencyLimitLayer::new(1)) // Allow running 1 job per worker at a time
        .data(backend.clone())
        .chain(|svc| svc.check_service_clone::<Identity>())
        .backend(RedisStorage::new_with_config(
            redis_connection.clone(),
            config,
        ))
        .build_fn(handle_task::<T, HttpDispatcherClient, HttpSlideClient>);

I think the regression is coming from this line :

apalis/packages/apalis-redis/src/storage.rs

Line 482 in 094938d

let res = self.fetch_next(worker.id()).await;

IMO, we should have a safety here in order to not fetch from Redis in case the buffer is full ??
WDYT ?

EDIT: In order to do that I think we need to use Sender/Receiver providing methode like is_full ? or at least len & size
A good candidate could be async_channel ?

The text was updated successfully, but these errors were encountered:

geofmureithi · 2024-11-28T12:42:45Z

Hmm, we might not need to use async_channel. We could just compare the worker task count with the buffer size, This would be a quick fix too. If you can apply a quick PR that would be nice,
Another fix might be adding an atomic bool in Worker context for is_ready. Both would be straight forward. Let me know your thoughts.

AzHicham · 2024-11-28T12:48:21Z

I'll try that ;)

AzHicham · 2024-11-28T13:13:24Z

TBH I don't know how to connect the task_count to buffer_size.

I can set a cond like this worker.task_count() == 0 then fetch from redis.
This is nice in my case because I have a buffer_size of 1 and I use a ConcurrencyLimitLayer::new(1) but in the case
I have a ConcurrencyLimitLayer::new(10) for instance I don't want the task_count to reach 0 before fetching new tasks from Redis

geofmureithi · 2024-11-28T15:48:51Z

ConcurrencyLimitLayer::new(10) just means the worker cannot be executing more than 10 tasks at a time.
Our goal here is that we need not to fetch when the service is not ready. This was the previous behavior. But currently we are using tower::CallAll which is possibly why consuming everything, coz the stream is being consumed to the end.
In all essence, we should use ConcurrencyLayer rather than buffer size. Buffer size is specific to redis and is about how many jobs to fetch at a time. I think I have some ideas on how to get this behavior back, let me write something tonight and we can discuss it tomorrow.

AzHicham · 2024-11-28T15:55:41Z

OK thanks for your help !!!

AzHicham · 2024-11-28T16:04:37Z

The thing I don't clearly understand if the definition of is_ready, for me it's:
Is the worker able to start a new task ? This may depend on ConcurrencyLimitLayer.

Indeed if instead of tower::CallAll we can use another mechanism in order to not start task until the workze is_ready, eg able to start new task, then it should solve this issue

EDIT: IMO buffer_size should not be specific to redis.
IMO this is a parameters that can influence latency / performance.
In order to achieve something like 1M tasks/sec for instance one must fetch taks by batch of 1K tasks for example and not 1 by one. This should be true for redis, rmq & sql db

geofmureithi · 2024-11-28T16:08:04Z

Meanwhile, could you check something for me?
Modify the tower::CallAllUnordered to tower::CallAll in Worker::poll_jobs in apalis-core and see if it works

geofmureithi · 2024-11-28T16:10:03Z

EDIT: IMO buffer_size should not be specific to redis.

I should have said its backend specific since some backends like messagequeues already handle this for themselves.
Workers just take Stream and consume them. To control this from a worker perspective, you need to use ConcurencyLayer

AzHicham · 2024-11-28T16:14:01Z

Meanwhile, could you check something for me?
Modify the tower::CallAllUnordered to tower::CallAll in Worker::poll_jobs in apalis-core and see if it works

Same behavior with tower::CallAll. All tasks are fetched from active and put into inflight :/

geofmureithi · 2024-11-28T16:18:12Z

Ok, I already have something in mind. Hopefully its a single change :). Give me a few hrs

geofmureithi · 2024-11-29T10:46:07Z

Hey, I have a small layer added to worker which checks if worker is ready in the branch fix/poll-when-ready
Could you apply the check, on redis before fetch next, combine it with concurrency and see if it works?

AzHicham · 2024-11-29T11:09:10Z

Hello,

Still not working, but I think you forget to push some changes ? Because on this branch I only see the layer, but not how the AtomicBool is updated.

Edit: my bad I see it!!

geofmureithi · 2024-11-29T11:27:05Z

I need you to do something like:

if worker.is_ready() {
  fetch_next().await
}

In redis storage. Are you sure its not working?

AzHicham · 2024-11-29T11:36:08Z

It's getting better, with :

is_ready: Arc::new(AtomicBool::new(true)),

and

if worker.is_ready() {
  fetch_next().await
}

Fetch occurs only when the worker is not busy.
But still with a buffer_size of 1, a task is fetched while the worker is running.
Eg:
I push 3 tasks.
1 task is launched (concurrency still limited to 1)
1 task is i think the buffer
1 is in the redis active queue

geofmureithi · 2024-11-29T11:41:50Z

Good!, I might need to use load(Acquire) this may fix it. But we are headed there.

AzHicham · 2024-11-29T13:57:34Z

Another weired behaviour but may be I'll open another dedicated issue is when I send a SIGTERM/SIGINT while using this code:

...
monitor.run_with_signal(shutdown_signal()).await?;

pub(crate) async fn shutdown_signal() -> Result<(), io::Error> {
    let mut sigterm = sigterm();
    let mut sigint = sigint();
    select! {
        biased;
        _ = sigterm.recv() => info!("SIGTERM signal. Exit now !!"),
        _ = sigint.recv() => info!("SIGINT signal. Exit now !!"),
    }
    Ok(())
}

Then the worker finish to handle all tasks and then exit, even the tasks that is still in the Redis active queue. (Following my previous comment all 3 tasks are done).

Note: this code works with apalis v0.5.5.

geofmureithi · 2024-11-29T18:46:37Z

Aah, Its pretty related. I guess we should stop calling next for the stream when shutdown is called. I will provide the fix tomorrow.

geofmureithi · 2024-12-01T08:28:33Z

Hey, I think this has been resolved with the current push.

geofmureithi · 2024-12-01T08:31:29Z

Also I have created this issue tower-rs/tower#801 which would also relates to this.

AzHicham · 2024-12-01T15:11:42Z

Hello,
Thanks a lot for your work, unfortunately I have two issues (I checkout commit 408fc4f)

only one of my two workers get registered into Redis consumers queue no issue on master.
I removed the one I do not use for my test but tasks are not getting handled by my worker. I flushed the entire redis db just to be sure.
I also put a dbg!() into the loop { select! } but nothing show up

geofmureithi · 2024-12-02T08:46:43Z

Lol, look at what I did:

    /// Start running the worker
    pub fn start(&self) {
        self.state.running.store(false, Ordering::Relaxed);
        self.state.is_ready.store(false, Ordering::Relaxed);
        self.emit(Event::Start);
    }

Start sets it to false 😢

AzHicham · 2024-12-02T09:13:28Z

haha thx I'll test against the fix ;)

AzHicham · 2024-12-02T09:28:23Z

Everything works as expected !!!

My test:

Push 3 tasks on redis queue
start a worker -> 1 job is launched against the 1st task.
before the end I send a SIGTERM.
the job finish then the worker shutdown.
In Redis I still have 2 tasks remaining.
I re-launch a worker and anothe job is launched against the 2nd task.

Nothing to say except Good job !!!!

Thanks for your help !!

AzHicham · 2024-12-09T08:40:44Z

Hello @geofmureithi

Sorry to bother you, I just wanted to know when you're gonna release a patch version ?

Thank you

geofmureithi · 2024-12-10T05:12:38Z

@AzHicham a new version has been released

geofmureithi linked a pull request Dec 1, 2024 that will close this issue

fix: allow polling only when worker is ready #472

Merged

geofmureithi closed this as completed in #472 Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression: RedisStorage in v0.6.0 #466

Regression: RedisStorage in v0.6.0 #466

AzHicham commented Nov 28, 2024 •

edited

Loading

geofmureithi commented Nov 28, 2024

AzHicham commented Nov 28, 2024

AzHicham commented Nov 28, 2024

geofmureithi commented Nov 28, 2024

AzHicham commented Nov 28, 2024

AzHicham commented Nov 28, 2024 •

edited

Loading

geofmureithi commented Nov 28, 2024

geofmureithi commented Nov 28, 2024

AzHicham commented Nov 28, 2024

geofmureithi commented Nov 28, 2024

geofmureithi commented Nov 29, 2024

AzHicham commented Nov 29, 2024 •

edited

Loading

geofmureithi commented Nov 29, 2024

AzHicham commented Nov 29, 2024 •

edited

Loading

geofmureithi commented Nov 29, 2024

AzHicham commented Nov 29, 2024 •

edited

Loading

geofmureithi commented Nov 29, 2024 via email

geofmureithi commented Dec 1, 2024

geofmureithi commented Dec 1, 2024

AzHicham commented Dec 1, 2024 •

edited

Loading

geofmureithi commented Dec 2, 2024 •

edited

Loading

AzHicham commented Dec 2, 2024

AzHicham commented Dec 2, 2024 •

edited

Loading

AzHicham commented Dec 9, 2024

geofmureithi commented Dec 10, 2024

Regression: RedisStorage in v0.6.0 #466

Regression: RedisStorage in v0.6.0 #466

Comments

AzHicham commented Nov 28, 2024 • edited Loading

geofmureithi commented Nov 28, 2024

AzHicham commented Nov 28, 2024

AzHicham commented Nov 28, 2024

geofmureithi commented Nov 28, 2024

AzHicham commented Nov 28, 2024

AzHicham commented Nov 28, 2024 • edited Loading

geofmureithi commented Nov 28, 2024

geofmureithi commented Nov 28, 2024

AzHicham commented Nov 28, 2024

geofmureithi commented Nov 28, 2024

geofmureithi commented Nov 29, 2024

AzHicham commented Nov 29, 2024 • edited Loading

geofmureithi commented Nov 29, 2024

AzHicham commented Nov 29, 2024 • edited Loading

geofmureithi commented Nov 29, 2024

AzHicham commented Nov 29, 2024 • edited Loading

geofmureithi commented Nov 29, 2024 via email

geofmureithi commented Dec 1, 2024

geofmureithi commented Dec 1, 2024

AzHicham commented Dec 1, 2024 • edited Loading

geofmureithi commented Dec 2, 2024 • edited Loading

AzHicham commented Dec 2, 2024

AzHicham commented Dec 2, 2024 • edited Loading

AzHicham commented Dec 9, 2024

geofmureithi commented Dec 10, 2024

AzHicham commented Nov 28, 2024 •

edited

Loading

AzHicham commented Nov 28, 2024 •

edited

Loading

AzHicham commented Nov 29, 2024 •

edited

Loading

AzHicham commented Nov 29, 2024 •

edited

Loading

AzHicham commented Nov 29, 2024 •

edited

Loading

AzHicham commented Dec 1, 2024 •

edited

Loading

geofmureithi commented Dec 2, 2024 •

edited

Loading

AzHicham commented Dec 2, 2024 •

edited

Loading