问题发现
使用top命令:
6355 root 20 0 3624776 931128 7544 S 198.0 24.9 4643:34 java
可以看到进程PID为:6355的进程
此时的cpu占用率为:198%,内存使用率是:24.9%
问题排查
查看该进程的线程情况
根据cpu占比从高到底排列
ps -mp 6355 -o THREAD,tid,time | sort -rn
此时,可以看出6613-6619的线程cpu占用率都比较高
查看问题线程堆栈
我们以TID为:6613的线程为例,来分析:
1、先将其线程ID转为16进制
2、使用jstack命令打印线程堆栈信息
将TID转换为16进制
命令:
printf "%x\n" 6613
使用jstack命令打印线程堆栈信息
命令:jstack pid |grep tid -A 30
Pid为进程号,tid为16进制的线程号
这里我们对应的就是:jstack 6355 |grep 19d5 -A 30
"org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1" #250 prio=5 os_prio=0 tid=0x00007f106dbf0800 nid=0x19d5 runnable [0x00007f10146e4000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000000c9c22c18> (a sun.nio.ch.Util$3)
- locked <0x00000000c9c22c08> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000c9c1a748> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:105)
at org.apache.kafka.common.network.Selector.select(Selector.java:845)
at org.apache.kafka.common.network.Selector.poll(Selector.java:469)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:549)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1308)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1248)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doPoll(KafkaMessageListenerContainer.java:1091)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1047)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:972)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
"ThreadPoolTaskScheduler-1" #249 prio=5 os_prio=0 tid=0x00007f106dadb800 nid=0x19d4 waiting on condition [0x00007f10147e5000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c9c41688> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
可以看到这里是kafka的消费者线程造成的。
导出问题线程的堆栈信息到文件中
jstack -l 6355 >> /temp/6355.dump
问题处理
由于是由于kafka造成的,我们只需分析项目的kafka的使用情况就可以了
关于jstack命令
jstack命令可用于输出java进程的线程堆栈信息。
[root@iZ8vb698vy6k1v365g0tioZ publish]# jstack -help
Usage:
jstack [-l] <pid>
(to connect to running process)
jstack -F [-m] [-l] <pid>
(to connect to a hung process)
jstack [-m] [-l] <executable> <core>
(to connect to a core file)
jstack [-m] [-l] [server_id@]<remote server IP or hostname>
(to connect to a remote debug server)
Options:
-F to force a thread dump. Use when jstack <pid> does not respond (process is hung)
-m to print both java and native frames (mixed mode)
-l long listing. Prints additional information about locks
-h or -help to print this help message