strace处理nscd服务高cpu利用率用例一则

问题:

某环境发现nscd服务(dns缓存服务)长期高cpu负载率(100%),如下:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                           
 3323 root      20   0  187m 3828 2368 R   99  0.0  20231:33 nscd 

重启nscd服务情况依然。这是台mysql服务器,正在导入数据,反映出来的问题就是入库很慢(因为相当于少了个cpu)。


排查:

使用strace跟踪下服务:

strace -p 3323  
发现不断地刷出下面的日志:
gettimeofday({1433475756, 150449}, NULL) = 0  
accept(9, 0, NULL)                      = -1 EMFILE (Too many open files)  
epoll_wait(10, {{EPOLLRDNORM, {u32=9, u64=9}}}, 100, 29988) = 1  

很明显是打开文件数过多了,导致进程不断地在重试,占用了cpu。


解决:

修改最大打开文件数(原先是1024):

ulimit -n 10240  

再重启nscd服务,问题解决。 附:正常进程的strace输出:

Process 32043 attached - interrupt to quit  
gettimeofday({1433475911, 593946}, NULL) = 0  
epoll_wait(10, {{EPOLLRDNORM, {u32=9, u64=9}}}, 100, 29999) = 1  
gettimeofday({1433475913, 232384}, NULL) = 0  
accept(9, 0, NULL)                      = 11  
epoll_ctl(10, EPOLL_CTL_ADD, 11, {EPOLLRDNORM, {u32=11, u64=11}}) = 0  
epoll_wait(10, {{EPOLLRDNORM, {u32=11, u64=11}}}, 100, 29998) = 1  
gettimeofday({1433475913, 232502}, NULL) = 0  
epoll_ctl(10, EPOLL_CTL_DEL, 11, NULL)  = 0  
futex(0x7f04ff5cc524, 0x5 /* FUTEX_??? */, 1) = 1  
epoll_wait(10, {{EPOLLRDNORM, {u32=9, u64=9}}}, 100, 29999) = 1