Skip to content

Conversation

@arturaz
Copy link
Collaborator

@arturaz arturaz commented Dec 19, 2024

If the stream has no data for a long time, lettuce will throw RedisCommandTimeoutException.

In that case we don't want to fail the stream and we should restart the read.

Alternatively we should use the lettuce reactive API, but that requires quite a large of a refactoring.

…ndTimeoutException`

If the stream has no data for a long time, lettuce will throw `RedisCommandTimeoutException`.

In that case we don't want to fail the stream and we should restart the read.
@arturaz arturaz changed the title bugfix: reading from a silent stream should not fail with RedisCommndTimeoutException bugfix: reading from a silent stream should not fail with RedisCommandTimeoutException Dec 19, 2024
arturaz added a commit to arturaz/packages that referenced this pull request Dec 19, 2024
yisraelU
yisraelU previously approved these changes Dec 19, 2024
Copy link
Collaborator

@yisraelU yisraelU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY

@yisraelU
Copy link
Collaborator

@arturaz I think I was a bit too hasty, I have had more time to look at this in detail.
Can you clarify in your scenario

  • Whether the BLOCK option was used.
  • If used, what timeout value was specified.

If block was not used , then a timeout exception occurred most likely due to underlying network latency issues (as you can see from the myriad of issues raised here

In that case perhaps this should be viewed as a feature and require opt in by the user

I would also expect in the test scenario added, where one explicitly sets a timeout in the client options, the stream would be shutdown in that case.

@yisraelU yisraelU self-requested a review December 19, 2024 21:34
@yisraelU yisraelU dismissed their stale review December 19, 2024 21:35

updated above

@arturaz
Copy link
Collaborator Author

arturaz commented Dec 20, 2024

Whether the BLOCK option was used.

My case uses default option, which is Some(Zero), meaning blocking indefinitely. But yeah, if there is no block, it should return immediately and suppressing the exception is bad.

I will take a look at rewriting this using the reactive redis API instead of async one to see if that helps.

@arturaz
Copy link
Collaborator Author

arturaz commented Dec 20, 2024

I migrated streaming to reactive Lettuce API, which, unfortunately, still raises the timeout exception if the stream is silent.

However, I still think it's a better option than repeatedly polling using async API. The simple use case (without restarts) is very straightforward now.

I also added restartOnTimeout: Option[FiniteDuration => Boolean] = None as an option to read.

@yisraelU
Copy link
Collaborator

Putting aside the switch to the Reactive API, as you are correct , as noted , it wont solve this problem.
The problem is inherent to how Lettuce works. Lettuce has 2 separate timeout configs .

  • Connection timeouts
  • Command timeouts

Using the default configs for a client would result in having a CommandTimeout set to 60 seconds as its inherited by default from the connection config which has a default of 60 seconds
The timeout options are set in Lettuce here

The command timeout mechanism is ignorant of the api choice (sync/async/reactive) and will timeout regardless .
One must either change the command timeout config duration, configure it to ignore certain commands, or not use a blocking command .

See this for discussion around CommandTimeout for blocking commands

In fact in the history of Redis4Cats , there is discussion if this was a wise decision to set block to 0 as default.

@arturaz
Copy link
Collaborator Author

arturaz commented Dec 22, 2024

So is there a problem with the added feature?

@yisraelU
Copy link
Collaborator

@arturaz apologies for the delay.
I am getting back to this now , and I think we should go back to your original commit , with handling/ignoring the RedisCommandTimeoutException .
This is effectively what is recommended by Lettuce for these scenarios (except they suggest the burden be handed to the caller and must define it in the TimeoutOptions )
By handling the CommandTimeout, we are providing a better UX to the user , and I believe what would be expected by a user when utilizing an fs2 Stream.

@arturaz
Copy link
Collaborator Author

arturaz commented Jan 30, 2025

I think we should go back to your original commit

Why? The reactive API is better suited to this scenario, rather than polling Redis via the async API and maintaining queues.

More over, I think the whole design of streaming API is broken due to use of the same F[_] for both streams and IO operations. See #962

@yisraelU
Copy link
Collaborator

I would like to dicsuss pros/cons of reactive an addition to the other issue you mentioned, I just think it should be a separate discussion and PR from this bug/ux fix, as it really is unrelated, and will be better served that way.

@arturaz
Copy link
Collaborator Author

arturaz commented Feb 2, 2025

OK, we can move reactivity to another PR, but can we please first discuss #963 as this is a dealbraker for otel4s-redis4cats integration?

@yisraelU
Copy link
Collaborator

yisraelU commented Feb 2, 2025

yes, I'll give it a review. TY

@arturaz
Copy link
Collaborator Author

arturaz commented Feb 5, 2025

I removed the reactivity. Turns out, when converting push-based Publisher to pull-based Stream, you can't really do any buffering, so you have to have a chunk size of 1, which is ineffective, thus the current approach is better.

I left the (failed) attempt with the comment in https://github.com/arturaz/redis4cats/commit/484a28de18982d7396a80aafde91c8301d804f23

arturaz added a commit to arturaz/packages that referenced this pull request Feb 5, 2025
@yisraelU
Copy link
Collaborator

yisraelU commented Feb 5, 2025

Looks good.
As this is breaking binary compatibility , lets create a new branch for the next major version. I am guessing the other PR is also breaking binary compatibility , so once all work is complete , we can merge and deploy new version

@arturaz
Copy link
Collaborator Author

arturaz commented Feb 5, 2025

Yes. In theory this could be not breaking bincompat, but that means adding an overload with no default values (due to Scala's limitation of two methods with default values unable to share the same name), which is... weird?

Can you review these PR's and give me LGTM, so I could prepare the joint version for you to review? On my local build there was a couple of minor merge conflicts.

@arturaz
Copy link
Collaborator Author

arturaz commented Feb 5, 2025

Also, what do you think of having a default Some(_ => true) for the reconnectOnTimeout?

@arturaz
Copy link
Collaborator Author

arturaz commented Feb 5, 2025

Argh... The same problem surfaces in subscribe and psubscribe. Need to add fixes there as well.

@arturaz
Copy link
Collaborator Author

arturaz commented Feb 5, 2025

That's weird. They kind of do not, but in my codebase I had this:

2025-02-05T18:12:38.424118962Z     io.lettuce.core.RedisCommandTimeoutException: Command timed out after 1 minute(s)
2025-02-05T18:12:38.424125403Z          at io.lettuce.core.internal.ExceptionFactory.createTimeoutException(ExceptionFactory.java:63)
2025-02-05T18:12:38.424131998Z          at io.lettuce.core.protocol.CommandExpiryWriter.lambda$null$0(CommandExpiryWriter.java:183)
2025-02-05T18:12:38.424154572Z          at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:96)
2025-02-05T18:12:38.424160815Z          at io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106)
2025-02-05T18:12:38.424166935Z          at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
2025-02-05T18:12:38.424197963Z          at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:66)
2025-02-05T18:12:38.424205232Z          at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
2025-02-05T18:12:38.424211358Z          at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
2025-02-05T18:12:38.424217671Z          at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
2025-02-05T18:12:38.424223891Z          at java.lang.Thread.run(Thread.java:1583)
2025-02-05T18:12:38.424230013Z          at delay @ dev.profunktor.redis4cats.effect.FutureLift$$anon$1.lift(FutureLift.scala:50)
2025-02-05T18:12:38.424236203Z          at fromCompletionStage @ dev.profunktor.redis4cats.effect.FutureLift$$anon$1.lift(FutureLift.scala:50)
2025-02-05T18:12:38.424242658Z          at fromCompletionStage @ dev.profunktor.redis4cats.effect.FutureLift$$anon$1.lift(FutureLift.scala:50)

The error message is kind of useless. Thus I have added .enhanceTimeoutException(s"xread(offsets=$offsets, block=$block, count=$count)") that injects the name into the exception, so it would be more useful.

@yisraelU
Copy link
Collaborator

yisraelU commented Feb 5, 2025

Also, what do you think of having a default Some(_ => true) for the reconnectOnTimeout?

yes , I think we should . Otherwise it requires a user to be exposed to the inner workings of Lettuce and understand what CommandTimeout is.
I worry that could be confusing and perhaps mixed up with network timeouts.

block: Option[Duration] = Some(Duration.Zero),
count: Option[Long] = None,
restartOnTimeout: Option[FiniteDuration => Boolean] = None
restartOnTimeout: RestartOnTimeout = RestartOnTimeout.never
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be .always .
Keep in mind a RedisCommandTimeoutException is unrelated to network issues, rather its simpoly how long a command takes to run. When running a blocking command that waits for data, it will wait until there is data

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think it is related to network issues. If the network goes dark without any traffic on the line, you can hang there indefinitely, IIRC. Timeouts resolve that situation.

Maybe we should not provide a default here at all and force the user to choose an option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another exception that is thrown for connection issues. This exception is not about network issues

This either needs to be on all of the operations, or none of them.

Users of the library can add a similar thing themselves.
@arturaz
Copy link
Collaborator Author

arturaz commented Feb 7, 2025

I have changed the default and removed enhanceTimeoutException, I think we can merge this now?

@yisraelU
Copy link
Collaborator

Go for it

@arturaz arturaz merged commit 63532d6 into profunktor:series/1.x Feb 11, 2025
2 checks passed
@arturaz arturaz deleted the bug/subscribing-to-silent-stream-raises-timeout-exception branch February 11, 2025 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants