Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

线上环境,偶尔Could not get a resource from the pool #66

Open
Force-King opened this issue Oct 14, 2019 · 8 comments
Open

线上环境,偶尔Could not get a resource from the pool #66

Force-King opened this issue Oct 14, 2019 · 8 comments

Comments

@Force-King
Copy link

Force-King commented Oct 14, 2019

错误如下:

2019-10-14 at 13:05:26.633 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
2019-10-14 at 13:05:27.387 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
	at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.350 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
	at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.304 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
	at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.199 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
	at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.219 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
2019-10-14 at 13:05:27.161 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
	at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.092 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool
	at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
2019-10-14 at 13:05:27.071 CST ERROR com.br.hermes.web.cache.CodisService get-codis get exception, key =***. Exception: redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
	at io.codis.jodis.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:218) ~[jodis-0.5.1.jar:?]
@Force-King
Copy link
Author

Force-King commented Oct 14, 2019

补充:线上环境,运行一段时间后 报 超时, 观察报错节点, 有大量 swap 操作,后关闭了 swap, 报错没了。
运行了一段时间,现在又偶尔报 以上错误,无法获取连接。查找 codis 和 代理 zk 的日志,均微发现异常log.

请问哪位大神帮解答一下?

codis 客户端连接代码:

@Bean
public JedisResourcePool getPool() {
        JedisPoolConfig poolConfig = new JedisPoolConfig();
        poolConfig.setMaxIdle(max_idle);
        poolConfig.setMaxTotal(max_active);
        poolConfig.setTestOnBorrow(true);
        poolConfig.setTestOnReturn(true);
        poolConfig.setMaxWaitMillis(max_wait);
        poolConfig.setBlockWhenExhausted(false);

        JedisResourcePool pool = RoundRobinJedisPool.create().poolConfig(poolConfig)
                .curatorClient(zkAddr, timeout).zkProxyDir(zkProxyDir).build();
        return pool;
    }

codis 操作类:

@Autowired
private JedisResourcePool jedisPool;

    /**
     * 获取缓存
     *
     * @param key
     * @return
     */
    public String get(String key) {
        try (Jedis jedis = jedisPool.getResource()) {
            return jedis.get(key);
        } catch (Exception e) {
            logger.error("codis get exception, key ={}. Exception:", key, e);
            return null;
        }
    }

@etansens
Copy link

我也遇到这个情况,
看情况是在并发小的情况下没有问题。线上10台设备写codis,流量比较平滑,跑2年了都没问题。
最近上了一个查询接口峰值在4kqps,这个接口隔天必宕,并且无法自动恢复。
接口日志报Could not get a resource from the pool
但是从TCP查看连接数,远远没有到配置的最大连接数。

@Force-King
Copy link
Author

@etansens 你找到问题原因了吗? 加机器是否能解决这个问题? 目前我们是 2K QPS, 就报这个错了

@etansens
Copy link

etansens commented Nov 5, 2019

@Force-King 测试了一下,应该跟多线程有关。单线程无限循环跑是没问题的。
多线程跑,结束线程之后,池中连接还是ALLOCATED状态,无法恢复到IDLE。
然后我在getResource方法上包装了synchronized也无法解决~
下一步准备细看下源码实现

@etansens
Copy link

etansens commented Nov 6, 2019

@Force-King 昨晚跟了下代码,发现是jedis的bug;并且新版jedis已经修复。指定最新版jedis依赖就能解决了哈。
@Apache9 可以关闭这个issue了

@etansens
Copy link

etansens commented Nov 6, 2019

附上测试代码

@test
public void poolTest() throws InterruptedException {
RedisFactory factory=new RedisFactory();
CountDownLatch latch=new CountDownLatch(5);//count=5>thread=4;让主线程无限等待,方便测试
AtomicLong curr=new AtomicLong(0);//用来记录获取-释放连接的速度
AtomicLong prev=new AtomicLong(0);
for(int i=0;i<4;i++) {//启动4个线程无限循环获取连接,让问题暴露出来
new Thread() {
@OverRide
public void run() {
try {
while (true){
try(Jedis jedis = factory.getRedisClient()) {
curr.incrementAndGet();
}
}
} catch (Exception e) {//异常则跳出循环,结束线程
System.err.println("can not get conn, loop out: ");
e.printStackTrace();
}finally {
System.out.println("runner count down");
latch.countDown();
}
}
}.start();
}
new Thread(){//启动1个线程定时获取连接,测试连接池异常后能否自动恢复
@OverRide
public void run() {
while (true){//持续获取连接,异常打印信息
try (Jedis jedis = factory.getRedisClient()) {
Thread.sleep(1000L);
long rate=curr.incrementAndGet()-prev.longValue();
prev.set(curr.longValue());
System.out.println("curr conn: "+jedis+", rate: "+rate);
}catch (Exception e){
System.err.println("can not get conn: "+e.getMessage());
}
}
}
}.start();
latch.await();
System.out.println(factory);
}

@Force-King
Copy link
Author

Force-King commented Nov 6, 2019

@etansens 我目前用的jedis 版本是 2.9.0 ,是改为最新版 3.1.0 就没问题了是吗?

@luolifeng
Copy link

jedis-2.9.3.jar 就已经解决这个问题了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants