Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A very fast retry strategy hangs blocking commands forever #1718

Closed
manast opened this issue Feb 3, 2023 · 8 comments · Fixed by #1720 · 4 remaining pull requests
Closed

A very fast retry strategy hangs blocking commands forever #1718

manast opened this issue Feb 3, 2023 · 8 comments · Fixed by #1720 · 4 remaining pull requests

Comments

@manast
Copy link

manast commented Feb 3, 2023

I was debugging an issue reported to BullMQ related to reconnections and I found something really strange that somehow seems to also affect blocking connections (need to investigate this more though).

How to reproduce

Run this simple code:

const Redis = require("ioredis");

const connection = new Redis({
  maxRetriesPerRequest: null,
  retryStrategy: function (times) {
    return 10;
  },
});
connection.on("connect", async () => {
    console.log("Connected to Redis!")
    const result = await connection.set("test", "test");   
    console.log("OK, Command issued", result);
});
connection.on("error", (err) => console.error("Redis error", err.message));
connection.on("reconnecting", () => console.log("Reconnecting to Redis!"));

You will get this output:

Connected to Redis!
OK, Command issued OK

Leave the code running and just close your Redis instance, in my case since I am running docker:

docker stop some-redis

Wait a few seconds and then start it again:

docker start some-redis

The result would be:

Redis error connect ECONNREFUSED 127.0.0.1:6379
Reconnecting to Redis!
Redis error connect ECONNREFUSED 127.0.0.1:6379
Reconnecting to Redis!
Redis error connect ECONNREFUSED 127.0.0.1:6379
Reconnecting to Redis!
Connected to Redis!
Reconnecting to Redis!
Connected to Redis!
OK, Command issued OK
OK, Command issued OK
OK, Command issued OK
OK, Command issued OK

Notice that the output of 4 "OK, Command issued OK". Somehow the callback to the connect event was called 4 times instead of one. Now if we modify the code to retry once per second instead:

const Redis = require("ioredis");

const connection = new Redis({
  maxRetriesPerRequest: null,
  retryStrategy: function (times) {
    return 1000;
  },
});
connection.on("connect", async () => {
    console.log("Connected to Redis!")
    const result = await connection.set("test", "test");   
    console.log("OK, Command issued", result);
});
connection.on("error", (err) => console.error("Redis error", err.message));
connection.on("reconnecting", () => console.log("Reconnecting to Redis!"));

And we do the same as we did above then we get the correct output, i.e. one single output of "OK, Command issued OK".

It seems like there is a small hazard that happens when the retries happen very close together. Not sure if this is completely harmless or not, since I am not done debugging my particular case, but I think this should be investigated and resolved if possible.

@manast
Copy link
Author

manast commented Feb 3, 2023

My suspictions turned out to be correct. If you have issued a blocking command, then for some reason it gets stuck in that command ONLY when the retry is very fast. So adding this code to the original issue:

async function run() {
  while (true) {
    console.log("Waiting for message...");
    const result = await connection.brpop("whatever", 1);
    console.log("Message received", result);
  }
}

run();

It will try to do a blocking pop with a 1-second timeout. While it is connected it will loop forever with 1-second timeout. If you stop Redis and you have a very fast retry delay, and then start redis again it will not continue looping anymore...

@manast manast changed the title A very fast retry strategy gives a strange behaviour A very fast retry strategy hangs blocking commands forever Feb 3, 2023
@manast
Copy link
Author

manast commented Feb 10, 2023

@luin it would be great if you could peek at this issue, it seems like it is snowballing 🤔 anything I can do to help?

@luin
Copy link
Collaborator

luin commented Feb 11, 2023

@manast 👋

I was not able to reproduce the issue:

CleanShot.2023-02-11.at.15.48.56.mp4

No matter if I changed retryStrategy to 10 or 1, I always got one "Command issued". Do I miss anything?

@manast
Copy link
Author

manast commented Feb 11, 2023

Is the Redis instance running locally or inside a docker container? (in my case I run it in a docker container, but the test itself in my local machine).

@manast
Copy link
Author

manast commented Feb 11, 2023

Also, did you try replacing the set command by brpop as in the second comment?

@luin
Copy link
Collaborator

luin commented Feb 11, 2023

Thanks! I finally reproduced this with docker. This happens when reconnecting successfully but the ready check fails. In our code, what we do at different events:

  1. On close, we move the unfulfilled commands from commandQueue to prevCommandQueue.
  2. On connect, we reset commandQueue.
  3. On ready, we resent commands in prevCommandQueue.

When the issue happens, the events are emitted in the order close -> connect -> close -> connect -> ready, so brpop is lost at the second close.

@manast
Copy link
Author

manast commented Feb 11, 2023

Yes, that makes a lot of sense. I have had similar issues in the past as it is difficult to guarantee the order of asynchronous events that happens very tight and that can potentially interleave.

github-actions bot pushed a commit that referenced this issue Feb 12, 2023
## [5.3.1](v5.3.0...v5.3.1) (2023-02-12)

### Bug Fixes

* Fix commands not resend on reconnect in edge cases ([#1720](#1720)) ([fe52ff1](fe52ff1)), closes [#1718](#1718)
* Fix db parameter not working with auto pipelining ([#1721](#1721)) ([d9b1bf1](d9b1bf1))
@github-actions
Copy link

🎉 This issue has been resolved in version 5.3.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

luin pushed a commit that referenced this issue Apr 15, 2023
## [5.3.1](v5.3.0...v5.3.1) (2023-02-12)

### Bug Fixes

* Fix commands not resend on reconnect in edge cases ([#1720](#1720)) ([fe52ff1](fe52ff1)), closes [#1718](#1718)
* Fix db parameter not working with auto pipelining ([#1721](#1721)) ([d9b1bf1](d9b1bf1))
janus-dev87 added a commit to janus-dev87/ioredis-work that referenced this issue Mar 1, 2024
## [5.3.1](redis/ioredis@v5.3.0...v5.3.1) (2023-02-12)

### Bug Fixes

* Fix commands not resend on reconnect in edge cases ([#1720](redis/ioredis#1720)) ([fe52ff1](redis/ioredis@fe52ff1)), closes [#1718](redis/ioredis#1718)
* Fix db parameter not working with auto pipelining ([#1721](redis/ioredis#1721)) ([d9b1bf1](redis/ioredis@d9b1bf1))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment