Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] doris2.1.7的fe的cpu持续超过60% #44121

Open
2 of 3 tasks
jameswangcnbj opened this issue Nov 18, 2024 · 4 comments
Open
2 of 3 tasks

[Bug] doris2.1.7的fe的cpu持续超过60% #44121

jameswangcnbj opened this issue Nov 18, 2024 · 4 comments

Comments

@jameswangcnbj
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

Version

2.1.7

What's Wrong?

很奇怪现象,2.1.7的fe在没有实时连接的情况下,cpu的占用率也超过60%,如下图所示
https://ask.selectdb.com/uploads/post/5jc8xaLDYY9.png
top -H -p fe进程
https://ask.selectdb.com/uploads/post/5jc8D2nnYeW.png

fe的配置:
元数据保存位置
meta_dir = /data/doris-meta

#多网卡配置网段
priority_networks = 172.16.0.0/24

#调整 FE 内存,默认8g,因为目前总内存16G,暂不调整
#修改 Doris 大小写敏感参数
lower_case_table_names = 1
enable_outfile_to_local=true

#解决内存占用高的参数
wait_timeout = 300
set global enable_auto_analyze = false

What You Expected?

cpu在没有外部连接的时候,低于20%

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@ixzc
Copy link
Contributor

ixzc commented Nov 18, 2024

How many cores does your FE have? which version did you use before?

@jameswangcnbj
Copy link
Author

How many cores does your FE have? which version did you use before?

the machine is Alibaba Cloud Elastic Compute Service .

cat /proc/cpuinfo |grep "name" |cut -f2 -d: |uniq -c
4 Intel(R) Xeon(R) Platinum

@jameswangcnbj
Copy link
Author

jstack output:
jstack 31657 |grep -A 50 7bee
"replayer" #92 daemon prio=5 os_prio=0 tid=0x00007fa65c005000 nid=0x7bee runnable [0x00007fa60affa000]
java.lang.Thread.State: RUNNABLE
at com.sleepycat.je.dbi.DiskOrderedScanner.processBINInternal(DiskOrderedScanner.java:1945)
at com.sleepycat.je.dbi.DiskOrderedScanner.accumulateBINs(DiskOrderedScanner.java:1169)
at com.sleepycat.je.dbi.DiskOrderedScanner.scanSerial(DiskOrderedScanner.java:758)
at com.sleepycat.je.dbi.DiskOrderedScanner.scan(DiskOrderedScanner.java:708)
at com.sleepycat.je.dbi.DatabaseImpl.count(DatabaseImpl.java:1510)
at com.sleepycat.je.Database.count(Database.java:2042)
at org.apache.doris.journal.bdbje.BDBJEJournal.getMaxJournalIdInternal(BDBJEJournal.java:414)
at org.apache.doris.journal.bdbje.BDBJEJournal.getMaxJournalId(BDBJEJournal.java:379)
at org.apache.doris.persist.EditLog.getMaxJournalId(EditLog.java:136)
at org.apache.doris.catalog.Env.getMaxJournalId(Env.java:4257)
at org.apache.doris.catalog.Env.replayJournal(Env.java:2821)
- locked <0x00000005c5801ca0> (a org.apache.doris.catalog.Env)
at org.apache.doris.catalog.Env$4.runOneCycle(Env.java:2622)
at org.apache.doris.common.util.Daemon.run(Daemon.java:119)

"Thread-38" #50 daemon prio=5 os_prio=0 tid=0x00007fa6c0b53800 nid=0x7bed waiting on condition [0x00007fa60b3fb000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.doris.common.util.Daemon.run(Daemon.java:125)

"Automatic Analyzer" #43 daemon prio=5 os_prio=0 tid=0x00007fa6c0b52800 nid=0x7bec waiting on condition [0x00007fa60b7fc000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.doris.common.util.Daemon.run(Daemon.java:125)

"Statistics Table Cleaner" #41 daemon prio=5 os_prio=0 tid=0x00007fa6c1cf1000 nid=0x7beb waiting on condition [0x00007fa60bbfd000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.doris.common.util.Daemon.run(Daemon.java:125)

"stateListener" #90 daemon prio=5 os_prio=0 tid=0x00007fa6c1cf0000 nid=0x7bea waiting on condition [0x00007fa60bffe000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000005c5b11c08> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
at java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680)
at org.apache.doris.catalog.Env$5.runOneCycle(Env.java:2715)
- locked <0x00000005c3203ff0> (a org.apache.doris.catalog.Env$5)
at org.apache.doris.common.util.Daemon.run(Daemon.java:119)

"ReplayThread" #87 daemon prio=5 os_prio=0 tid=0x00007fa648825000 nid=0x7be9 waiting on condition [0x00007fa614df2000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000005c3204328> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)

@jameswangcnbj
Copy link
Author

自己解决了上述问题
之前是关闭了默认的22的端口,导致ssh到其他机器,无法正常ssh过去,我重新开发了22端口后,目前机器的cpu已经稳定了,fe的cpu在7%-10%之前,完全正常了。
原理猜测是fe的元数据同步不能正常进行,导致fe总在尝试,具体原理细节没有看到对应的资料,这里做个记录。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants