Rancher启用Flannel网络组件造成服务宕机,磁盘Inode100%


Rancher启用Flannel网络组件造成服务宕机,磁盘Inode100%

症状

Rancher Master节点Docker无法启动,使用df -i查看inode已经100%

[root@rancher ~]# systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2022-10-13 12:31:51 CST; 6h ago
     Docs: https://docs.docker.com
  Process: 1492 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE)
 Main PID: 1492 (code=exited, status=1/FAILURE)

10月 13 12:31:51 rancher systemd[1]: docker.service: Service RestartSec=2s expired, scheduling restart.
10月 13 12:31:51 rancher systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
10月 13 12:31:51 rancher systemd[1]: Stopped Docker Application Container Engine.
10月 13 12:31:51 rancher systemd[1]: docker.service: Start request repeated too quickly.
10月 13 12:31:51 rancher systemd[1]: docker.service: Failed with result 'exit-code'.
10月 13 12:31:51 rancher systemd[1]: Failed to start Docker Application Container Engine.
10月 13 12:32:18 rancher systemd[1]: docker.service: Start request repeated too quickly.
10月 13 12:32:18 rancher systemd[1]: docker.service: Failed with result 'exit-code'.
10月 13 12:32:18 rancher systemd[1]: Failed to start Docker Application Container Engine.
[root@rancher ~]# ● docker.service - Docker Application Container Engine
-bash: ●: 未找到命令
[root@rancher ~]#    Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
-bash: 未预期的符号 `(' 附近有语法错误
[root@rancher ~]#    Active: failed (Result: exit-code) since Thu 2022-10-13 12:31:51 CST; 6h ago
-bash: 未预期的符号 `(' 附近有语法错误
[root@rancher ~]#      Docs: https://docs.docker.com
-bash: Docs:: 未找到命令
[root@rancher ~]#   Process: 1492 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE)
-bash: 未预期的符号 `(' 附近有语法错误
[root@rancher ~]#  Main PID: 1492 (code=exited, status=1/FAILURE)
-bash: 未预期的符号 `(' 附近有语法错误
[root@rancher ~]#
[root@rancher ~]# 10月 13 12:31:51 rancher systemd[1]: docker.service: Service RestartSec=2s expired, scheduling restart.
-bash: 10月: 未找到命令
[root@rancher ~]# 10月 13 12:31:51 rancher systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
-bash: 10月: 未找到命令
[root@rancher ~]# 10月 13 12:31:51 rancher systemd[1]: Stopped Docker Application Container Engine.
-bash: 10月: 未找到命令
[root@rancher ~]# 10月 13 12:31:51 rancher systemd[1]: docker.service: Start request repeated too quickly.
-bash: 10月: 未找到命令
[root@rancher ~]# 10月 13 12:31:51 rancher systemd[1]: docker.service: Failed with result 'exit-code'.
-bash: 10月: 未找到命令
[root@rancher ~]# 10月 13 12:31:51 rancher systemd[1]: Failed to start Docker Application Container Engine.
-bash: 10月: 未找到命令
[root@rancher ~]# 10月 13 12:32:18 rancher systemd[1]: docker.service: Start request repeated too quickly.
-bash: 10月: 未找到命令
[root@rancher ~]# 10月 13 12:32:18 rancher systemd[1]: docker.service: Failed with result 'exit-code'.
-bash: 10月: 未找到命令
[root@rancher ~]# 10月 13 12:32:18 rancher systemd[1]: Failed to start Docker Application Container Engine.
-bash: 10月: 未找到命令

经过排查,发现是/var/lib/docker/volumes/{数据卷ID}/_data/flannel下产生了大量无效文件占满Inode

[root@rancher flannel]# ls
ls: 无法访问'fb74d7857ffd3874f122b49cdd722eb3b14afef4881f52b8a523e2c88e516032': No such file or directory
ls: 无法访问'78733ae687765a9195bc97f95b893443358dacfc8cdbc5de2423302938132d3b': No such file or directory
ls: 无法访问'808c8c8b992937c7794c97fc6206eabb86fdb48439c20e0d90f8af99391781b0': No such file or directory
ls: 无法访问'029bf0519848760fb195765b66b52d049887dd9088e5760e7709cc0d2607b429': No such file or directory
ls: 无法访问'f31d65c532490f8c06acf3fa8615d760047112f2f145ca33678d4f10cec28ff8': No such file or directory
ls: 无法访问'06229177d5feb29cefc9cb7a2f4b6eb12ba581160350104b7ff753f6f613a644': No such file or directory
ls: 无法访问'daf0e871beac371e5853814b64c4baf9a6ddab2a9742caa56ace690b911361a9': No such file or directory
ls: 无法访问'ce3bb50306b1a3924fedbf6872a5137b108dad661383c420bda6ccbfaf52b317': No such file or directory
ls: 无法访问'e951895976c1f41c48fea7cc241bf6908312243226b2c8666fdf054454b41225': No such file or directory
ls: 无法访问'7b2b2c6359d1cdefbcff0864ddcd73305512315bac9177fe6735bdb54d38caa2': No such file or directory
ls: 无法访问'bc35b39d39f2eeccd9b62236abbd335568a7a488893c537a6488e34241fca79c': No such file or directory
ls: 无法访问'133224f8e7beb526e3251104b657cc05367ffafcc1be7dacbd2fee0b11520792': No such file or directory
ls: 无法访问'4973e7a18735e66eab9357a337a9bb09fd3cedd88282bf3cfe9adf9614a03840': No such file or directory
ls: 无法访问'5a467f6593da3265478d8d6a4b98502b286ff0983d52aa4aaf10f9a46eaec551': No such file or directory
ls: 无法访问'fe86f6e99ee0c24d9ae402d4782f2b776d39df178f31d1e2b3323f90206a74f5': No such file or directory
ls: 无法访问'789d526beae7bcf706e93ed2c235cb459cadd1f3bbe7cb1c5eb469582a29ce21': No such file or directory
ls: 无法访问'7ca54151cdd855742875c0b7698dfd1573c47741ab1186934d20a7f0801a758c': No such file or directory
ls: 无法访问'83d3895795da5821dbccd55b9e123df7e02d4eaec441e5cb7e7a233a2bb126c5': No such file or directory
ls: 无法访问'5ad82fb3f349e95a52ec65fe92ef6a8fa1f6111b0d20e2fd23b74259944ca9ab': No such file or directory
ls: 无法访问'45520197384ae843b66b930d85da82ae5b7b3e5c51bbdef515c7cf756b8037b1': No such file or directory
ls: 无法访问'230a98d2e529a9a38ac7c1d5e6f85f7c43f43603d7ce83171eea08313637e16d': No such file or directory
ls: 无法访问'1d81d646d7626e4dd5130b47d232409b411ad1b1ab7fed4ddbebca2d7b9b2500': No such file or directory
ls: 无法访问'd19357daf7e24af3dcb8d7363282294ff6eb8b0f8ac7e4e8e70356cfecef9e80': No such file or directory
ls: 无法访问'67a5dd643b08c061c976bf70009411a8c6d8ca79daa924fddc6275d0877562a9': No such file or directory
ls: 无法访问'c6af0691e70041fffaf0c98e40208ea5343d7f829c61ee7b6b68c90d688f9760': No such file or directory
ls: 无法访问'a8839d339ea55134dfd9c7f0362ee701a4e06b8d3c30c8d09f594383e203dbdd': No such file or directory
ls: 无法访问'3104b34f59e8f1a6b171af7a6899cbf4daf07936cf72586124558f0cc978113a': No such file or directory
ls: 无法访问'bb24b482c5378c21821efad68308dca6a38298ec39fdbcb7b6ee1dac84cd44f0': No such file or directory
ls: 无法访问'606fac13550fb4930a0d78e446f5abffecd49b63bde3c7fc8e2cfbb46ebe3411': No such file or directory
ls: 无法访问'4196ccb04361a21c5cc667fe07285b0f6020748259a8188028a3eff83d12a7c2': No such file or directory
ls: 无法访问'2cf28f10ae0d459fbedb27991de1a882c0107dd7df1ef94cc9e158bf8f6c66af': No such file or directory
ls: 无法访问'15301cedcaf4eec156c17a419de0d0e2b96607653027ccfce7d87af1f2a1d277': No such file or directory
ls: 无法访问'0cec7f448761437ca086987adaabc9025f7049b22f74d7cdef9e27ef1f6bdf3c': No such file or directory
ls: 无法访问'd20bf98b6df172fadbb6561396d6eeb7d89e468654f51188daac6c218900ce59': No such file or directory
ls: 无法访问'850fdcac6d76ca897184ab3ef31bd6e10d484902bfd82be03e559d940a294be8': No such file or directory
ls: 无法访问'bb262ddae13def22857ee4a37c0c4cabe2f68a5f307054a2fbd1d5ca117ac632': No such file or directory
ls: 无法访问'94a07e2f9ae56eb54a2ebfedb00f1c2674be19388032fb0e770156700cc91cbb': No such file or directory
ls: 无法访问'16c356010cf9fbb2aa845990adb950ee7d64c80377b3484e3e7ccbdea98bc761': No such file or directory
ls: 无法访问'7443a6983a9e4892d99898313a8b144d373b3200135f13ae6229829f6b4f8c4f': No such file or directory
ls: 无法访问'b7895325169bb9831eafb79ba407b08094b8bd3d979967cf275b8bf46408a654': No such file or directory

解决

  1. 临时解决

停掉docker,删掉flannel创建的无效文件,重新创建docker容器。这种方法只能临时解决问题随着使用时间的延长最终又会占满inode

  1. 彻底解决(暂未验证)
    尝试替换Flannel网络组件,可以替换为Calico

声明:初心|版权所有,违者必究|如未注明,均为原创|本网站采用BY-NC-SA协议进行授权

转载:转载请注明原文链接 - Rancher启用Flannel网络组件造成服务宕机,磁盘Inode100%


愿你勿忘初心,并从一而终