问题:
某环境发现nscd服务(dns缓存服务)长期高cpu负载率(100%),如下:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3323 root 20 0 187m 3828 2368 R 99 0.0 20231:33 nscd
重启nscd服务情况依然。这是台mysql服务器,正在导入数据,反映出来的问题就是入库很慢(因为相当于少了个cpu)。
排查:
使用strace
跟踪下服务:
strace -p 3323
发现不断地刷出下面的日志:
gettimeofday({1433475756, 150449}, NULL) = 0
accept(9, 0, NULL) = -1 EMFILE (Too many open files)
epoll_wait(10, {{EPOLLRDNORM, {u32=9, u64=9}}}, 100, 29988) = 1
很明显是打开文件数过多了,导致进程不断地在重试,占用了cpu。
解决:
修改最大打开文件数(原先是1024):
ulimit -n 10240
再重启nscd服务,问题解决。 附:正常进程的strace输出:
Process 32043 attached - interrupt to quit
gettimeofday({1433475911, 593946}, NULL) = 0
epoll_wait(10, {{EPOLLRDNORM, {u32=9, u64=9}}}, 100, 29999) = 1
gettimeofday({1433475913, 232384}, NULL) = 0
accept(9, 0, NULL) = 11
epoll_ctl(10, EPOLL_CTL_ADD, 11, {EPOLLRDNORM, {u32=11, u64=11}}) = 0
epoll_wait(10, {{EPOLLRDNORM, {u32=11, u64=11}}}, 100, 29998) = 1
gettimeofday({1433475913, 232502}, NULL) = 0
epoll_ctl(10, EPOLL_CTL_DEL, 11, NULL) = 0
futex(0x7f04ff5cc524, 0x5 /* FUTEX_??? */, 1) = 1
epoll_wait(10, {{EPOLLRDNORM, {u32=9, u64=9}}}, 100, 29999) = 1