文章目录
现象描述
- 2022-08-08 11点开始好几个业务系统相继抛出错误,其中一个业务系统抛出的错误截图如下:
- UidService是用作分布式id生成的服务,打开uid-service服务日志如下,看到从 2022-08-08 23:05:32 开始大量抛出“打开的文件过多”的错误日志。熟悉Linux的同学,对这个都不应该默认,这个一般是应为单个进程打开的 File Descriptor(简称:文件描述符) 太多导致的(注:Linux中一切皆文件,当进程对文件进行读写或是打开网络连接时都会产生fd;OS处于安全考虑都会对fd的个数加以限制,CentOS中默认为每个进程1024个)。
2022-08-08 23:05:32.614 WARN 30239 --- [nio-8777-exec-8] o.s.r.s.RemoteInvocationTraceInterceptor : Processing of HessianServiceExporter remote call resulted in fatal exception: com.ccjt.eip.web.service.IUidService.nextProjectNo
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at redis.clients.util.Pool.getResource(Pool.java:53) ~[jedis-2.9.0.jar!/:na]
at redis.clients.jedis.JedisPool.getResource(JedisPool.java:226) ~[jedis-2.9.0.jar!/:na]
at cn.hutool.db.nosql.redis.RedisDS.getJedis(RedisDS.java:136) ~[hutool-all-4.6.8.jar!/:na]
at com.ccjt.eip.web.service.impl.UidServiceImpl.connectRedis(UidServiceImpl.java:45) ~[classes!/:0.0.1-SNAPSHOT]
at com.ccjt.eip.web.service.impl.UidServiceImpl.nextProjectNo(UidServiceImpl.java:146) ~[classes!/:0.0.1-SNAPSHOT]
at sun.reflect.GeneratedMethodAccessor57.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_171]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_171]
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) ~[spring-aop-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) [spring-aop-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) [spring-aop-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.remoting.support.RemoteInvocationTraceInterceptor.invoke(RemoteInvocationTraceInterceptor.java:78) ~[spring-context-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) [spring-aop-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212) [spring-aop-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at com.sun.proxy.$Proxy69.nextProjectNo(Unknown Source) [na:na]
at sun.reflect.GeneratedMethodAccessor57.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_171]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_171]
at com.caucho.hessian.server.HessianSkeleton.invoke(HessianSkeleton.java:302) [hessian-4.0.60.jar!/:3.1.0]
at com.caucho.hessian.server.HessianSkeleton.invoke(HessianSkeleton.java:217) [hessian-4.0.60.jar!/:3.1.0]
at org.springframework.remoting.caucho.HessianExporter.doInvoke(HessianExporter.java:228) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.remoting.caucho.HessianExporter.invoke(HessianExporter.java:144) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.remoting.caucho.HessianServiceExporter.handleRequest(HessianServiceExporter.java:64) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.web.servlet.mvc.HttpRequestHandlerAdapter.handle(HttpRequestHandlerAdapter.java:53) [spring-webmvc-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1040) [spring-webmvc-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:943) [spring-webmvc-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006) [spring-webmvc-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:909) [spring-webmvc-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:652) [tomcat-embed-core-9.0.37.jar!/:4.0.FR]
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883) [spring-webmvc-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:733) [tomcat-embed-core-9.0.37.jar!/:4.0.FR]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53) [tomcat-embed-websocket-9.0.37.jar!/:9.0.37]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:541) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:373) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1589) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_171]
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]
Caused by: redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketException: 打开的文件过多
at redis.clients.jedis.Connection.connect(Connection.java:207) ~[jedis-2.9.0.jar!/:na]
at redis.clients.jedis.BinaryClient.connect(BinaryClient.java:93) ~[jedis-2.9.0.jar!/:na]
at redis.clients.jedis.BinaryJedis.connect(BinaryJedis.java:1767) ~[jedis-2.9.0.jar!/:na]
at redis.clients.jedis.JedisFactory.makeObject(JedisFactory.java:106) ~[jedis-2.9.0.jar!/:na]
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:899) ~[commons-pool2-2.8.0.jar!/:2.8.0]
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:429) ~[commons-pool2-2.8.0.jar!/:2.8.0]
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:354) ~[commons-pool2-2.8.0.jar!/:2.8.0]
at redis.clients.util.Pool.getResource(Pool.java:49) ~[jedis-2.9.0.jar!/:na]
... 63 common frames omitted
Caused by: java.net.SocketException: 打开的文件过多
at java.net.Socket.createImpl(Socket.java:460) ~[na:1.8.0_171]
at java.net.Socket.getImpl(Socket.java:520) ~[na:1.8.0_171]
at java.net.Socket.setReuseAddress(Socket.java:1449) ~[na:1.8.0_171]
at redis.clients.jedis.Connection.connect(Connection.java:174) ~[jedis-2.9.0.jar!/:na]
... 70 common frames omitted
紧急处理
- 由于这个是操作系统对用户启动单个进程作的限制(配置是针对用户,用ulimit -a可查看);默认为1024个。
配置查看
- 通过 ulimit -a 配合 lsof -p $(jps -l | grep ‘uid’ | awk ‘{print $1}’) | wc -l 查看
- cat /proc/$(jps -l | grep ‘uid’ | awk ‘{print $1}’)/limits,其中uid为服务查找关键词
[root@ejy-kf2-jyxt2 ~]# cat /proc/$(jps -l | grep uid | awk '{print $1}')/limits
Limit Soft Limit Hard Limit Units
...
Max open files 1024 1024 files
...
处理方法
- 重启应用,优先保证服务可用性(适合由于长期运行导致,fd累计增多达到上限【此处适用】)。
- 应用确实需要比较多的fd(最典型的例子就是ES服务了,它需要同时操作大量磁盘文件达到很高的IOPS),修改/etc/security/limits.conf文件,重启或sysctl -p后生效。
[root@demo ~]# vi /etc/security/limits.conf
# 配置文件末尾添加如下配置,*表示针对所有用户,nofile表示fd最大个数
...
* soft nofile 204800
* hard nofile 204800
...
根原排查
- 经过 紧急处理 处理后服务暂时可用,但是根本原因还不得而知;和同事讨论后得知此服务已经正常运行1年半左右,且中间没有重启过。结合fd的产生原因以及这个服务本身只是负责产生全局的id,应该不存在大量的文件操作和网络操作;如此的话不应该会触及系统默认1024个fd的限制才对。
代码定位
- 从日志的异常栈中 at redis.clients.jedis.JedisFactory.makeObject(JedisFactory.java:106) ~[jedis-2.9.0.jar!/:na] 提示找到Redis获取链接的方法。
public class UidServiceImpl implements IUidService {
// 关键代码如下...start...
private static Jedis jedis = null;
private static String confname = null;
static {
confname = ResourceBundle.getBundle("conf").getString("confname");
Setting setting = new Setting("config/redis-" + Lists.newArrayList(Splitter.on("_").split(confname)).get(1) + ".setting");
jedis = RedisDS.create(setting,null).getJedis();
System.out.println("已连接上redis");
}
private void connectRedis(){
confname = ResourceBundle.getBundle("conf").getString("confname");
Setting setting = new Setting("config/redis-" + Lists.newArrayList(Splitter.on("_").split(confname)).get(1) + ".setting");
jedis = RedisDS.create(setting,null).getJedis();
System.out.println("已连接上redis");
}
@Override
public synchronized Integer nextNoticeId() {
String noticeIdKey = confname + "_" + IdKeyEnum.NOTICE.getkey();
String idStr = null;
try {
idStr = jedis.get(noticeIdKey);
}catch (Exception e){
connectRedis();
idStr = jedis.get(noticeIdKey);
}
//无缓存需要重新初始化
if(idStr==null){
try {
//获取当前最大编号,存入缓存,自增一次后再返回出去
idStr = Convert.toStr(initServiceImpl.initNoticeIdStartValue());
jedis.set(noticeIdKey, idStr);
jedis.incr(noticeIdKey);
System.out.println(LocalDateTime.now().toString() + "noticeidt" + jedis.get(noticeIdKey));
return Convert.toInt(idStr) + 1;
} catch (Exception e) {
e.printStackTrace();
return null;
}
}else{
jedis.incr(noticeIdKey);
System.out.println(LocalDateTime.now().toString() + "noticeidt" + jedis.get(noticeIdKey));
return Convert.toInt(idStr) + 1;
}
}
// 关键代码如下...end...
}
try {
idStr = jedis.get(noticeIdKey);
} catch (Exception e){
// connectRedis的操作:一旦产生异常,就重现创建链接池,并从中获取一个链接
// 并没有对之前生成的链接池进行关闭
connectRedis();
idStr = jedis.get(noticeIdKey);
}
复现问题
- 在测试环境发布一个测试用的项目,并通过重启redis服务来触发jedis.get方法的异常,使其走到connectRedis方法中
- 重启Redis服务
[root@VM-4-3-centos ~]# docker start myredis
myredis
[root@VM-4-3-centos ~]# docker restart myredis
myredis
[root@VM-4-3-centos ~]# docker restart myredis
myredis
[root@VM-4-3-centos ~]# docker restart myredis
myredis
[root@VM-4-3-centos ~]# docker restart myredis
myredis
- 调用服务的/uid/notice/nextid接口(触发nextNoticeId方法)并查看fd【25932 为对应的】
[root@ejy-kf2-jyxt2 ~]# jps -l | grep uid
25932 uid-distribute-0.0.1-SNAPSHOT.jar
# 查看fd个数
[root@ejy-kf2-jyxt2 ~]# lsof -p 25932 | wc -l
81
[root@ejy-kf2-jyxt2 ~]# lsof -p 25932 | wc -l
83
[root@ejy-kf2-jyxt2 ~]# lsof -p 25932 | wc -l
84
[root@ejy-kf2-jyxt2 ~]# lsof -p 25932 | wc -l
85
[root@ejy-kf2-jyxt2 ~]# lsof -p 25932 | wc -l
86
- 查看具体的fd占用情况 lsof -p $(jps -l | grep uid) | wc -l,发现未释放的链接池产生的fd变成了sock还存在
代码改造
- 获取链接之前把老的 Jedis链接 以及 RedisDS 关闭。
public class UidServiceImpl implements IUidService {
private static RedisDS redisDS = null;
private static Jedis jedis = null;
private static String confname = null;
static {
confname = ResourceBundle.getBundle("conf").getString("confname");
Setting setting = new Setting("config/redis-" + Lists.newArrayList(Splitter.on("_").split(confname)).get(1) + ".setting");
jedis = RedisDS.create(setting,null).getJedis();
System.out.println("已连接上redis");
}
private void connectRedis(){
// 释放资源,测试发现只有通过jedis.close();才能把fd释放掉
if (null != jedis) {
System.out.println("关闭jedis连接");
jedis.close();
}
if (null != redisDS) {
try {
System.out.println("关闭连接池");
redisDS.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
confname = ResourceBundle.getBundle("conf").getString("confname");
Setting setting = new Setting("config/redis-" + Lists.newArrayList(Splitter.on("_").split(confname)).get(1) + ".setting");
redisDS = RedisDS.create(setting,null);
jedis = redisDS.getJedis();
System.out.println("已连接上redis");
}
}
[root@ejy-kf2-jyxt2 ~]# lsof -p $(jps -l | grep uid | awk '{print $1}') | wc -l
77
[root@ejy-kf2-jyxt2 ~]# lsof -p $(jps -l | grep uid | awk '{print $1}') | wc -l
77
[root@ejy-kf2-jyxt2 ~]# lsof -p $(jps -l | grep uid | awk '{print $1}') | wc -l
77
- 查看具体的fd占用情况,fd一直保持在35u没有新增
总结
- 这是一个由于链接资源没有关闭而导致的内存泄漏的实际案例。
- 代码有待优化,这里虽然初始化了redis连接池,然而并没有用到连接池的功能。
- 思考为什么链接断开后链接池没有检测到链接断开从而自动执行close动作。
- 这里我并没有完整复现Too many open files现象,有兴趣的同学可以把ulimit调小试下能否完整复现出线上的问题场景。