- 阿里云报警,以为是连接数上去了,检查无果。
- 又认为是TCP断开连接时候time_wait时间过长,于是
sysctl -w net.ipv4.tcp_keepalive_time=1800
sysctl -w net.ipv4.tcp_keepalive_probes=3
sysctl -w net.ipv4.tcp_keepalive_intvl=15
仍然无果。
执行 ss -s
# ss -s
Total: 65957 (kernel 66625)
TCP: 65599 (estab 36, closed 65534, orphaned 0, synrecv 0, timewait 3/0), ports 0
Transport Total IP IPv6
* 66625 - -
RAW 0 0 0
UDP 24 17 7
TCP 65 57 8
INET 89 74 15
FRAG 0 0 0
为毛这么多closed的。。一番搜索查找
http://mdba.cn/2015/03/10/tcp-socket%E6%96%87%E4%BB%B6%E5%8F%A5%E6%9F%84%E6%B3%84%E6%BC%8F/
按照博文,检查系统sock的文件句柄 lsof | grep sock
java 6646 root *786u sock 0,7 0t0 1431969496 protocol: TCP
java 6646 root *787u sock 0,7 0t0 1431971854 protocol: TCP
java 6646 root *788u sock 0,7 0t0 1431970488 protocol: TCP
java 6646 root *789u sock 0,7 0t0 1431969497 protocol: TCP
java 6646 root *790u sock 0,7 0t0 1431960142 protocol: TCP
查找6646进程 ps -ef|grep 6646
处理后,再次执行 ss -s,closed数量下降
# ss -s
Total: 401 (kernel 1725)
TCP: 90 (estab 36, closed 28, orphaned 1, synrecv 0, timewait 3/0), ports 0
Transport Total IP IPv6
* 1725 - -
RAW 0 0 0
UDP 25 17 8
TCP 62 56 6
INET 87 73 14
FRAG 0 0 0