Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in JedisClusterTopologyProvider #2986

Closed
velna opened this issue Sep 9, 2024 · 0 comments
Closed

Race condition in JedisClusterTopologyProvider #2986

velna opened this issue Sep 9, 2024 · 0 comments
Assignees
Labels
type: bug A general bug

Comments

@velna
Copy link

velna commented Sep 9, 2024

JedisClusterTopologyProvider use a cache object to optimize performace on frequent cluster requests, but the time value is updated before the cache object, there is a race condition that the old cache including the invalid cluster topology might get returned, which will result in a ClusterCommandExecutionFailureException: Could not get a resource from th pool.

	public ClusterTopology getTopology() {

		if (cached != null && shouldUseCachedValue()) {
			return cached;
		}

		Map<String, Exception> errors = new LinkedHashMap<>();

		List<Entry<String, ConnectionPool>> list = new ArrayList<>(cluster.getClusterNodes().entrySet());

		Collections.shuffle(list);

		for (Entry<String, ConnectionPool> entry : list) {

			try (Connection connection = entry.getValue().getResource()) {

				time = System.currentTimeMillis();  // time value is updated before cached object

				Set<RedisClusterNode> nodes = Converters.toSetOfRedisClusterNodes(new Jedis(connection).clusterNodes());

				cached = new ClusterTopology(nodes);

				return cached;

			} catch (Exception ex) {
				errors.put(entry.getKey(), ex);
			}
		}

		StringBuilder stringBuilder = new StringBuilder();

		for (Entry<String, Exception> entry : errors.entrySet()) {
			stringBuilder.append(String.format("\r\n\t- %s failed: %s", entry.getKey(), entry.getValue().getMessage()));
		}

		throw new ClusterStateFailureException(
				"Could not retrieve cluster information; CLUSTER NODES returned with error" + stringBuilder);
	}
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Sep 9, 2024
@mp911de mp911de self-assigned this Sep 9, 2024
@mp911de mp911de added type: bug A general bug and removed status: waiting-for-triage An issue we've not yet triaged labels Sep 9, 2024
@mp911de mp911de changed the title Race condition in org.springframework.data.redis.connection.jedis.JedisClusterConnection.JedisClusterTopologyProvider Race condition in JedisClusterTopologyProvider Sep 11, 2024
mp911de added a commit that referenced this issue Sep 11, 2024
We now use a value object for caching the topology to avoid races in updating the cache timestamp.

Also, we set the cache timestamp after obtaining the topology to avoid that I/O latency expires the topology cache.

Closes #2986
@mp911de mp911de added this to the 3.3.4 (2024.0.4) milestone Sep 11, 2024
christophstrobl pushed a commit that referenced this issue Sep 11, 2024
We now use a value object for caching the topology to avoid races in updating the cache timestamp.

Also, we set the cache timestamp after obtaining the topology to avoid that I/O latency expires the topology cache.

Closes: #2986
Original Pull Request: #2989
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A general bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants