Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support IOBuf Profiler #2497

Merged
merged 4 commits into from
Apr 8, 2024
Merged

Support IOBuf Profiler #2497

merged 4 commits into from
Apr 8, 2024

Conversation

chenBright
Copy link
Contributor

@chenBright chenBright commented Jan 4, 2024

What problem does this PR solve?

Issue Number:

Problem Summary: 此前遇到过IOBuf泄漏的情况,从heap profiler看到泄漏的位置是Socket读包的位置,实际上是因为IOBuf的tls缓存复用机制,掩盖了业务代码中存在IOBuf泄漏的问题。

What is changed and the side effects?

Changed:

受Contention Profiler启发,开发一个IOBuf Profiler,将未释放的IOBuf::Block的引用计数操作的调用栈及其操作数量(整数表示持有引用计数,负数表示释放引用计数)展示位调用图。通过可视化的方式,可快速定位到一些可能存在泄漏的模块,缩小排查的范围。

因为数据上报有一定的顺序要求,所以没有使用Contention Profiler的方案(Collector)来上报数据,而是使用 #2492 封装的无锁队列来上报。

该工具可以扩展到其他引用计数相关的数据结构。

Side effects:

  • Performance effects(性能影响):

  • Breaking backward compatibility(向后兼容性):


Check List:

  • Please make sure your changes are compilable(请确保你的更改可以通过编译).
  • When providing us with a new feature, it is best to add related tests(如果你向我们增加一个新的功能, 请添加相关测试).
  • Please follow Contributor Covenant Code of Conduct.(请遵循贡献者准则).

@chenBright chenBright force-pushed the iobuf_profiler branch 2 times, most recently from 9466369 to b5317ea Compare January 4, 2024 17:21
@sinomiko
Copy link
Contributor

sinomiko commented Jan 9, 2024

借问个问题

image

这种被socket read创建的io block被 其他类似压缩/解压通过tls共享的io block有什么办法强制回收;

如果业务中 收包很大,挂到tls共享出来了,解压/解压通过也可能用这块io buf的block。
如果socket不断开,这块io buf一直在;

可否主动调用这个代码,强制socket归还 block
m->_read_buf.return_cached_blocks();

    m->AddInputMessages(1);
        // Calculate average size of messages
        const size_t cur_size = m->_read_buf.length();
        if (cur_size == 0) {
            // _read_buf is consumed, it's good timing to return blocks
            // cached internally back to TLS, otherwise the memory is not
            // reused until next message arrives which is quite uncertain
            // in situations that most connections are idle.
            m->_read_buf.return_cached_blocks();
        }

@chenBright
Copy link
Contributor Author

chenBright commented Jan 9, 2024

socket 不会一直持有申请的block吧。每个请求解析的时候,从_read_buf cut出一个包长度的数据。直到socket在解析完_read_buf的数据,会调用m->_read_buf.return_cached_blocks();,归还_read_buf缓存的block(优先归还到tls的)。

为了后续更快地分配block,每个线程的tls缓存block数软上限是8。应该不需要强制回收吧?

@wwbmmm
Copy link
Contributor

wwbmmm commented Jan 9, 2024

是否可以给IOBuf开发一个不共享模式,然后通过memory profiler来定位问题?

@chenBright
Copy link
Contributor Author

是否可以给IOBuf开发一个不共享模式,然后通过memory profiler来定位问题?

应该是可以的。
当时的IOBuf泄漏问题在测试环境无法复现,只能在线上服务排查。因为(拷贝模式)这个方案可能需要改造BLock、IOBuf及其所有派生类等相关的功能,侵入性大、改动量大、风险大,所以最后没有采用这个方案。

@chenBright chenBright force-pushed the iobuf_profiler branch 2 times, most recently from c3d6ce4 to b6775e1 Compare February 8, 2024 14:48
@wwbmmm
Copy link
Contributor

wwbmmm commented Mar 18, 2024

LGTM

@wwbmmm wwbmmm merged commit 498c3e1 into apache:master Apr 8, 2024
18 checks passed
@chenBright chenBright deleted the iobuf_profiler branch April 8, 2024 03:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants