Created attachment 84 [details] 系统日志 [环境] 硬件:aarch64架构的飞腾-2500服务器 cpu: Phytium,FT-2500/128 内存:128G 硬盘:2T 软件:AnolisOS7.9_rc1_aarch64系统iso [前置条件] 系统安装成功,运行正常。 [步骤] 1、配置、编译、安装ltp-full-20160510 2、终端运行命令./ltpstress.sh -d /home/sardata -p -l /home/ltpstress.log -I /home/iofile -i 5 -t 168 -S -n [实际结果] 运行4天后,机器卡死,键鼠无法操作,重新插拔亦如此,无法ping通。 [期望结果] ltp-full运行完成,系统未出现卡死,崩溃等情况,运行正常,通过率超过97%。 详情见日志附件。
Created attachment 85 [details] 工具日志
Created attachment 86 [details] tmp日志
Created attachment 133 [details] 复测ltpstres.log
Created attachment 134 [details] 复测varlog
开始运行时间及系统卡死时间请看截图,机器卡死无法操作,无法ping通,此次复测是基于GUI的。
Created attachment 135 [details] 开始及系统卡死时间
分析了下系统日志, 并没有看到内核crash相关的日志。没发现什么有效的线索。另外, 测试的这个版本kdump 还配置不正确的那个版本。 Jan 19 04:50:16 localhost systemd: Started Job spooling tools. Jan 19 04:50:17 localhost kdumpctl: No memory reserved for crash kernel Jan 19 04:50:17 localhost kdumpctl: Starting kdump: [FAILED] Jan 19 04:50:17 localhost systemd: Started Command Scheduler. Jan 19 04:50:17 localhost systemd: kdump.service: main process exited, code=exited, status=1/FAILURE kdump服务没有起来。 最好拿最新的iso装机再复现试试,看看能不能拿到vmcore
实体机+gui基本环境安装,已复测两次(7d),没有问题; 虚拟机+最小化安装(虚拟镜像),已复两次(7d),没有问题; 虚拟机+虚拟化主机基本环境安装。已复测两次,全部失败; 综上,问题现收缩到虚拟机+虚拟化主机基本环境安装的情况。
测试同学反馈: 9号的,下午17:33:46 开始测试ltpstress 2022年3月14日7点57 异常 共计运行时间110小时24分钟 从测试环境分析来看, 1. 内核没有crash, 同时kdump service 工作正常 2. 出现问题的时间点是 tuned 进程因为全局oom被kill,然后不再打印系统日志,直到vm被重启。同时 ltpstress 进程也不再打印文件内容(iofile). 系统日志里面出现了多次oom, 目前猜测是由于oom导致的系统无响应。但是,具体触发原因,无从分析。 结论: 由于该问题出现的原因不是内核crash,同时,只有在安装了 虚拟机+虚拟化主机基本环境安装 才可以复现该问题, 这个 虚拟机+虚拟化主机基本环境安装 目前没有应用场景,虚拟机目前都是最小化安装。 基于以上,我不建议再继续跟踪分析该问题。 异常时的日志如下, 注意: 测试环境由于没有设置os时间,所以看到的os日志的时间-8小时才是现实时间。: ''' Mar 14 07:57:28 localhost kernel: [ 1025] 0 1025 37980 3 94208 135 0 gssproxy Mar 14 07:57:28 localhost kernel: [ 1030] 0 1030 31253 10 94208 211 0 abrt-watch-log Mar 14 07:57:28 localhost kernel: [ 1038] 0 1038 4204 5 69632 205 0 rngd Mar 14 07:57:28 localhost kernel: [ 1058] 0 1058 26848 81 53248 21 0 ksmtuned Mar 14 07:57:28 localhost kernel: [ 1091] 0 1091 88145 211 135168 857 0 NetworkManager Mar 14 07:57:28 localhost kernel: [ 1351] 0 1351 4450 3 73728 255 -1000 sshd Mar 14 07:57:28 localhost kernel: [ 1352] 0 1352 108377 847 163840 2094 0 tuned Mar 14 07:57:28 localhost kernel: [ 1357] 0 1357 109821 2446 360448 113 0 rsyslogd Mar 14 07:57:28 localhost kernel: [ 1363] 0 1363 26925 24 53248 20 0 rhsmcertd Mar 14 07:57:28 localhost kernel: [ 1369] 0 1369 121906 44 225280 1215 0 libvirtd Mar 14 07:57:28 localhost kernel: [ 1390] 0 1390 986 7 45056 45 0 atd Mar 14 07:57:28 localhost kernel: [ 1412] 0 1412 4432 2 73728 238 0 login Mar 14 07:57:28 localhost kernel: [ 1413] 0 1413 30441 2 94208 243 0 login Mar 14 07:57:28 localhost kernel: [ 1690] 0 1690 5004 22 77824 237 0 master Mar 14 07:57:28 localhost kernel: [ 1697] 89 1697 5077 19 73728 235 0 qmgr Mar 14 07:57:28 localhost kernel: [ 1735] 99 1735 2038 3 53248 98 0 dnsmasq Mar 14 07:57:28 localhost kernel: [ 1736] 0 1736 2031 1 53248 95 0 dnsmasq Mar 14 07:57:28 localhost kernel: [ 1891] 0 1891 27194 3 69632 438 0 bash Mar 14 07:57:28 localhost kernel: [ 1973] 0 1973 27247 49 61440 425 0 bash Mar 14 07:57:28 localhost kernel: [ 7460] 0 7460 27008 26 69632 126 0 crond Mar 14 07:57:28 localhost kernel: [ 29557] 0 29557 1099 0 45056 55 0 xinetd Mar 14 07:57:28 localhost kernel: [ 29565] 0 29565 850 0 45056 48 0 rpc.idmapd Mar 14 07:57:28 localhost kernel: [ 29567] 29 29567 1741 3 53248 208 0 rpc.statd Mar 14 07:57:28 localhost kernel: [ 29580] 0 29580 1744 1 57344 179 0 rpc.mountd Mar 14 07:57:28 localhost kernel: [ 29709] 0 29709 26776 12 61440 47 0 ltpstress.sh Mar 14 07:57:28 localhost kernel: [ 29748] 0 29748 26444 24 57344 8 0 sar Mar 14 07:57:28 localhost kernel: [ 29750] 0 29750 26653 38 57344 0 0 sadc Mar 14 07:57:28 localhost kernel: [ 29753] 0 29753 26991 286 45056 17 0 ltpstress.sh Mar 14 07:57:28 localhost kernel: [ 29759] 0 29759 730 23 40960 14 0 ltp-pan Mar 14 07:57:28 localhost kernel: [ 29760] 0 29760 730 17 45056 12 0 ltp-pan Mar 14 07:57:28 localhost kernel: [ 29761] 0 29761 730 38 40960 13 0 ltp-pan Mar 14 07:57:28 localhost kernel: [ 29762] 0 29762 26427 0 53248 21 0 sleep Mar 14 07:57:28 localhost kernel: [ 5990] 0 5990 5614 72 81920 300 0 sshd Mar 14 07:57:28 localhost kernel: [ 6009] 0 6009 5615 6 86016 367 0 sshd Mar 14 07:57:28 localhost kernel: [ 6018] 0 6018 27173 6 57344 418 0 bash Mar 14 07:57:28 localhost kernel: [ 6032] 0 6032 3180 3 69632 183 0 sftp-server Mar 14 07:57:28 localhost kernel: [ 26683] 89 26683 5030 248 81920 0 0 pickup Mar 14 07:57:28 localhost kernel: [ 27207] 0 27207 1512 96 49152 0 0 growfiles Mar 14 07:57:28 localhost kernel: [ 27245] 0 27245 26427 19 57344 0 0 sleep Mar 14 07:57:28 localhost kernel: [ 27435] 0 27435 1459 72 45056 0 0 rename14 Mar 14 07:57:28 localhost kernel: [ 27436] 0 27436 1459 73 45056 0 0 rename14 Mar 14 07:57:28 localhost kernel: [ 27437] 0 27437 1459 73 45056 0 0 rename14 Mar 14 07:57:28 localhost kernel: [ 27445] 0 27445 26427 18 57344 0 0 sleep Mar 14 07:57:28 localhost kernel: [ 27532] 0 27532 583568 554 761856 0 0 float_exp_log Mar 14 07:57:28 localhost kernel: Out of memory: Kill process 1352 (tuned) score 0 or sacrifice child Mar 14 07:57:28 localhost kernel: Killed process 1352 (tuned) total-vm:433508kB, anon-rss:3388kB, file-rss:0kB, shmem-rss:0kB Mar 14 07:57:28 localhost kernel: oom_reaper: reaped process 1352 (tuned), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB Mar 14 07:57:28 localhost systemd: tuned.service: main process exited, code=killed, status=9/KILL Mar 14 18:01:13 localhost kernel: Booting Linux on physical CPU 0x0000000000 [0x481fd010] Mar 14 18:01:13 localhost kernel: Linux version 4.19.91-25.2.an7.aarch64 (mockbuild@6493ae2847964ef88ec8b03132eeb3bb) (gcc version 9.1.1 20190605 (Red Hat 9.1.1-2) (GCC)) #1 SMP Wed Jan 5 16:35:31 CST 2022 Mar 14 18:01:13 localhost kernel: Machine model: linux,dummy-virt Mar 14 18:01:13 localhost kernel: efi: Getting EFI parameters from FDT: Mar 14 18:01:13 localhost kernel: efi: EFI v2.70 by EDK II Mar 14 18:01:13 localhost kernel: efi: SMBIOS 3.0=0x43f4e0000 ACPI 2.0=0x43bb80000 MEMATTR=0x43dff1518 MEMRESERVE=0x43c103198 '''
Created attachment 172 [details] 20220314_测试日志