[问题描述]:anolis23 执行ltp测试mm模块执行到ksm05会导致内存不足,系统中断一下(ssh断连、ltp无法继续执行等),无法继续执行。 message日志: 2023-03-03T11:06:09.471894+08:00 qibo-anck014-an23-g6r-1 systemd-logind[790]: Removed session 9. 2023-03-03T11:06:09.471251+08:00 qibo-anck014-an23-g6r-1 systemd[1]: session-9.scope: Consumed 12min 23.395s CPU time. 2023-03-03T11:06:09.469958+08:00 qibo-anck014-an23-g6r-1 systemd[1]: session-9.scope: Failed with result 'oom-kill'. 2023-03-03T11:06:09.377890+08:00 qibo-anck014-an23-g6r-1 systemd-logind[790]: Session 9 logged out. Waiting for processes to exit. 2023-03-03T11:06:09.377867+08:00 qibo-anck014-an23-g6r-1 systemd[1]: session-9.scope: Killing process 1644914 (node) with signal SIGKILL.、 2023-03-03T11:06:09.373685+08:00 qibo-anck014-an23-g6r-1 systemd[1]: session-9.scope: Killing process 2112707 (oom01) with signal SIGKILL. 2023-03-03T11:06:09.373656+08:00 qibo-anck014-an23-g6r-1 systemd[1]: session-9.scope: Killing process 2111537 (cpuUsage.sh) with signal SIGKILL. 2023-03-03T11:06:09.373635+08:00 qibo-anck014-an23-g6r-1 systemd[1]: session-9.scope: Killing process 1880907 (sleep) with signal SIGKILL. 2023-03-03T11:06:09.373611+08:00 qibo-anck014-an23-g6r-1 systemd[1]: session-9.scope: Killing process 1645418 (ltp-pan) with signal SIGKILL. 2023-03-03T11:06:09.373588+08:00 qibo-anck014-an23-g6r-1 systemd[1]: session-9.scope: Killing process 1645208 (runltp) with signal SIGKILL. 2023-03-03T11:06:09.373564+08:00 qibo-anck014-an23-g6r-1 systemd[1]: session-9.scope: Killing process 1645186 (run_test.sh) with signal SIGKILL. 2023-03-03T11:06:09.373542+08:00 qibo-anck014-an23-g6r-1 systemd[1]: session-9.scope: Killing process 1645150 (python) with signal SIGKILL. 2023-03-03T11:06:09.373517+08:00 qibo-anck014-an23-g6r-1 systemd[1]: session-9.scope: Killing process 1644967 (bash) with signal SIGKILL. 2023-03-03T11:06:09.373494+08:00 qibo-anck014-an23-g6r-1 systemd[1]: session-9.scope: Killing process 1644901 (node) with signal SIGKILL. 2023-03-03T11:06:09.373402+08:00 qibo-anck014-an23-g6r-1 systemd[1]: session-9.scope: Killing process 1644689 (sh) with signal SIGKILL. 2023-03-03T11:06:09.373378+08:00 qibo-anck014-an23-g6r-1 systemd[1]: session-9.scope: Killing process 1644649 (bash) with signal SIGKILL. 2023-03-03T11:06:09.366658+08:00 qibo-anck014-an23-g6r-1 systemd[1]: session-9.scope: Killing process 1644633 (sshd) with signal SIGKILL. 2023-03-03T11:06:09.358146+08:00 qibo-anck014-an23-g6r-1 systemd[1]: session-9.scope: A process of this unit has been killed by the OOM killer. 2023-03-03T11:06:09.356904+08:00 qibo-anck014-an23-g6r-1 kernel: Out of memory: Killed process 2112715 (oom01) total-vm:44100812kB, anon-rss:29070676kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:56996kB oom_score_adj:0 2023-03-03T11:06:09.356889+08:00 qibo-anck014-an23-g6r-1 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/session-9.scope,task=oom01,pid=2112715,uid=0 [环境信息]: 机器类型:ECS 内核信息: [root@qibo-anck014-an23-g6r-1 2]# uname -r 5.10.134-14_rc1.an23.aarch64 操作系统信息: [root@qibo-anck014-an23-g6r-1 2]# uname -r 5.10.134-14_rc1.an23.aarch64 [root@qibo-anck014-an23-g6r-1 2]# cat /etc/os-release NAME="Anolis OS" VERSION="23" ID="anolis" VERSION_ID="23" PLATFORM_ID="platform:an23" PRETTY_NAME="Anolis OS 23" ANSI_COLOR="0;31" HOME_URL="https://openanolis.cn/" BUG_REPORT_URL="https://bugzilla.openanolis.cn/" [问题发生概率]:必现 [复现步骤]: 1、安装部署LTP环境 2、执行LTP测试,执行ltp mm模块 ./runltp -f mm -s oom01
Anck-5.10-14版本及其他版本都存在此问题
该用例代码: static void verify_oom(void) { #if __WORDSIZE == 32 tst_brk(TCONF, "test is not designed for 32-bit system."); #endif /* we expect mmap to fail before OOM is hit */ set_sys_tune("overcommit_memory", 2, 1); oom(NORMAL, 0, ENOMEM, 0); /* with overcommit_memory set to 0 or 1 there's no * guarantee that mmap fails before OOM */ set_sys_tune("overcommit_memory", 0, 1); oom(NORMAL, 0, ENOMEM, 1); set_sys_tune("overcommit_memory", 1, 1); testoom(0, 0, ENOMEM, 1); } 意图是测试分别设置overcommit_memory 为0、1、2时,触发oom的反应。 3个参数的配置分别代表: #define OVERCOMMIT_GUESS 0 #define OVERCOMMIT_ALWAYS 1 #define OVERCOMMIT_NEVER 2 oom函数每次会吃掉3G的内存,直到触发oomkiller,当overcommit_memory配置时1的时候,申请3G的内存,剩余的内存不够,就会一致杀死系统中的进程直到满足3G为止。所以这个case 会导致断链应该是正常现象。 跟海洋交流过,出现oom后,设备“就是会一下子好像所有东西都断了一下 ssh会断联一下”。并没有其他异常,是正常现象。