Bug 277 - 【AnolisOS7.9】【rc1】【aarch64】【ltpstress稳定性】飞腾2500服务器运行ltpstress进行稳定性测试,运行4天后,机器卡死,键鼠无法操作,重新插拔亦如此,无法ping通。
Summary: 【AnolisOS7.9】【rc1】【aarch64】【ltpstress稳定性】飞腾2500服务器运行ltpstress进行稳定性测试,运行4天后,机器...
Status: CONFIRMED
Alias: None
Product: Anolis OS 7
Classification: Anolis OS
Component: Images&Installations (show other bugs) Images&Installations
Version: 7.9
Hardware: aarch64 Linux
: P2-High S2-major
Target Milestone: ---
Assignee: yunqi-zwt
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-05 15:02 UTC by chenjianglei
Modified: 2022-03-14 16:26 UTC (History)
3 users (show)

See Also:


Attachments
系统日志 (3.50 MB, application/gzip)
2022-01-05 15:02 UTC, chenjianglei
Details
工具日志 (22.57 MB, text/x-log)
2022-01-05 15:03 UTC, chenjianglei
Details
tmp日志 (5.30 KB, application/gzip)
2022-01-05 15:03 UTC, chenjianglei
Details
复测ltpstres.log (8.92 MB, text/x-log)
2022-01-24 14:28 UTC, chenjianglei
Details
复测varlog (2.03 MB, application/gzip)
2022-01-24 14:28 UTC, chenjianglei
Details
开始及系统卡死时间 (291.00 KB, image/jpeg)
2022-01-24 14:36 UTC, chenjianglei
Details
20220314_测试日志 (366.98 KB, application/x-gzip)
2022-03-14 16:26 UTC, wanqian
Details

Note You need to log in before you can comment on or make changes to this bug.
Description chenjianglei uniontech_group 2022-01-05 15:02:25 UTC
Created attachment 84 [details]
系统日志

[环境]

硬件:aarch64架构的飞腾-2500服务器

cpu: Phytium,FT-2500/128

内存:128G

硬盘:2T

软件:AnolisOS7.9_rc1_aarch64系统iso

[前置条件]

系统安装成功,运行正常。



[步骤]

1、配置、编译、安装ltp-full-20160510

2、终端运行命令./ltpstress.sh -d /home/sardata -p -l /home/ltpstress.log -I /home/iofile -i 5 -t 168 -S -n


[实际结果]

运行4天后,机器卡死,键鼠无法操作,重新插拔亦如此,无法ping通。


[期望结果]

ltp-full运行完成,系统未出现卡死,崩溃等情况,运行正常,通过率超过97%。



详情见日志附件。
Comment 1 chenjianglei uniontech_group 2022-01-05 15:03:00 UTC
Created attachment 85 [details]
工具日志
Comment 2 chenjianglei uniontech_group 2022-01-05 15:03:38 UTC
Created attachment 86 [details]
tmp日志
Comment 3 chenjianglei uniontech_group 2022-01-24 14:28:06 UTC
Created attachment 133 [details]
复测ltpstres.log
Comment 4 chenjianglei uniontech_group 2022-01-24 14:28:47 UTC
Created attachment 134 [details]
复测varlog
Comment 5 chenjianglei uniontech_group 2022-01-24 14:35:23 UTC
开始运行时间及系统卡死时间请看截图,机器卡死无法操作,无法ping通,此次复测是基于GUI的。
Comment 6 chenjianglei uniontech_group 2022-01-24 14:36:24 UTC
Created attachment 135 [details]
开始及系统卡死时间
Comment 7 yunqi-zwt alibaba_cloud_group 2022-02-21 20:12:56 UTC
分析了下系统日志, 并没有看到内核crash相关的日志。没发现什么有效的线索。另外, 测试的这个版本kdump 还配置不正确的那个版本。

Jan 19 04:50:16 localhost systemd: Started Job spooling tools.
Jan 19 04:50:17 localhost kdumpctl: No memory reserved for crash kernel
Jan 19 04:50:17 localhost kdumpctl: Starting kdump: [FAILED]
Jan 19 04:50:17 localhost systemd: Started Command Scheduler.
Jan 19 04:50:17 localhost systemd: kdump.service: main process exited, code=exited, status=1/FAILURE


kdump服务没有起来。

最好拿最新的iso装机再复现试试,看看能不能拿到vmcore
Comment 8 杨晓旋 uniontech_group 2022-03-11 14:12:26 UTC
实体机+gui基本环境安装,已复测两次(7d),没有问题;
虚拟机+最小化安装(虚拟镜像),已复两次(7d),没有问题;
虚拟机+虚拟化主机基本环境安装。已复测两次,全部失败;
综上,问题现收缩到虚拟机+虚拟化主机基本环境安装的情况。
Comment 9 yunqi-zwt alibaba_cloud_group 2022-03-14 15:48:18 UTC
测试同学反馈:
9号的,下午17:33:46 开始测试ltpstress 
2022年3月14日7点57  异常
共计运行时间110小时24分钟
从测试环境分析来看,
1. 内核没有crash, 同时kdump service 工作正常
2.  出现问题的时间点是 tuned 进程因为全局oom被kill,然后不再打印系统日志,直到vm被重启。同时 ltpstress 进程也不再打印文件内容(iofile).

系统日志里面出现了多次oom, 目前猜测是由于oom导致的系统无响应。但是,具体触发原因,无从分析。

结论:
由于该问题出现的原因不是内核crash,同时,只有在安装了  虚拟机+虚拟化主机基本环境安装 才可以复现该问题, 这个 虚拟机+虚拟化主机基本环境安装 目前没有应用场景,虚拟机目前都是最小化安装。 
基于以上,我不建议再继续跟踪分析该问题。

异常时的日志如下, 注意: 测试环境由于没有设置os时间,所以看到的os日志的时间-8小时才是现实时间。:

'''
Mar 14 07:57:28 localhost kernel: [   1025]     0  1025    37980        3    94208      135             0 gssproxy
Mar 14 07:57:28 localhost kernel: [   1030]     0  1030    31253       10    94208      211             0 abrt-watch-log
Mar 14 07:57:28 localhost kernel: [   1038]     0  1038     4204        5    69632      205             0 rngd
Mar 14 07:57:28 localhost kernel: [   1058]     0  1058    26848       81    53248       21             0 ksmtuned
Mar 14 07:57:28 localhost kernel: [   1091]     0  1091    88145      211   135168      857             0 NetworkManager
Mar 14 07:57:28 localhost kernel: [   1351]     0  1351     4450        3    73728      255         -1000 sshd
Mar 14 07:57:28 localhost kernel: [   1352]     0  1352   108377      847   163840     2094             0 tuned
Mar 14 07:57:28 localhost kernel: [   1357]     0  1357   109821     2446   360448      113             0 rsyslogd
Mar 14 07:57:28 localhost kernel: [   1363]     0  1363    26925       24    53248       20             0 rhsmcertd
Mar 14 07:57:28 localhost kernel: [   1369]     0  1369   121906       44   225280     1215             0 libvirtd
Mar 14 07:57:28 localhost kernel: [   1390]     0  1390      986        7    45056       45             0 atd
Mar 14 07:57:28 localhost kernel: [   1412]     0  1412     4432        2    73728      238             0 login
Mar 14 07:57:28 localhost kernel: [   1413]     0  1413    30441        2    94208      243             0 login
Mar 14 07:57:28 localhost kernel: [   1690]     0  1690     5004       22    77824      237             0 master
Mar 14 07:57:28 localhost kernel: [   1697]    89  1697     5077       19    73728      235             0 qmgr
Mar 14 07:57:28 localhost kernel: [   1735]    99  1735     2038        3    53248       98             0 dnsmasq
Mar 14 07:57:28 localhost kernel: [   1736]     0  1736     2031        1    53248       95             0 dnsmasq
Mar 14 07:57:28 localhost kernel: [   1891]     0  1891    27194        3    69632      438             0 bash
Mar 14 07:57:28 localhost kernel: [   1973]     0  1973    27247       49    61440      425             0 bash
Mar 14 07:57:28 localhost kernel: [   7460]     0  7460    27008       26    69632      126             0 crond
Mar 14 07:57:28 localhost kernel: [  29557]     0 29557     1099        0    45056       55             0 xinetd
Mar 14 07:57:28 localhost kernel: [  29565]     0 29565      850        0    45056       48             0 rpc.idmapd
Mar 14 07:57:28 localhost kernel: [  29567]    29 29567     1741        3    53248      208             0 rpc.statd
Mar 14 07:57:28 localhost kernel: [  29580]     0 29580     1744        1    57344      179             0 rpc.mountd
Mar 14 07:57:28 localhost kernel: [  29709]     0 29709    26776       12    61440       47             0 ltpstress.sh
Mar 14 07:57:28 localhost kernel: [  29748]     0 29748    26444       24    57344        8             0 sar
Mar 14 07:57:28 localhost kernel: [  29750]     0 29750    26653       38    57344        0             0 sadc
Mar 14 07:57:28 localhost kernel: [  29753]     0 29753    26991      286    45056       17             0 ltpstress.sh
Mar 14 07:57:28 localhost kernel: [  29759]     0 29759      730       23    40960       14             0 ltp-pan
Mar 14 07:57:28 localhost kernel: [  29760]     0 29760      730       17    45056       12             0 ltp-pan
Mar 14 07:57:28 localhost kernel: [  29761]     0 29761      730       38    40960       13             0 ltp-pan
Mar 14 07:57:28 localhost kernel: [  29762]     0 29762    26427        0    53248       21             0 sleep
Mar 14 07:57:28 localhost kernel: [   5990]     0  5990     5614       72    81920      300             0 sshd
Mar 14 07:57:28 localhost kernel: [   6009]     0  6009     5615        6    86016      367             0 sshd
Mar 14 07:57:28 localhost kernel: [   6018]     0  6018    27173        6    57344      418             0 bash
Mar 14 07:57:28 localhost kernel: [   6032]     0  6032     3180        3    69632      183             0 sftp-server
Mar 14 07:57:28 localhost kernel: [  26683]    89 26683     5030      248    81920        0             0 pickup
Mar 14 07:57:28 localhost kernel: [  27207]     0 27207     1512       96    49152        0             0 growfiles
Mar 14 07:57:28 localhost kernel: [  27245]     0 27245    26427       19    57344        0             0 sleep
Mar 14 07:57:28 localhost kernel: [  27435]     0 27435     1459       72    45056        0             0 rename14
Mar 14 07:57:28 localhost kernel: [  27436]     0 27436     1459       73    45056        0             0 rename14
Mar 14 07:57:28 localhost kernel: [  27437]     0 27437     1459       73    45056        0             0 rename14
Mar 14 07:57:28 localhost kernel: [  27445]     0 27445    26427       18    57344        0             0 sleep
Mar 14 07:57:28 localhost kernel: [  27532]     0 27532   583568      554   761856        0             0 float_exp_log
Mar 14 07:57:28 localhost kernel: Out of memory: Kill process 1352 (tuned) score 0 or sacrifice child
Mar 14 07:57:28 localhost kernel: Killed process 1352 (tuned) total-vm:433508kB, anon-rss:3388kB, file-rss:0kB, shmem-rss:0kB
Mar 14 07:57:28 localhost kernel: oom_reaper: reaped process 1352 (tuned), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Mar 14 07:57:28 localhost systemd: tuned.service: main process exited, code=killed, status=9/KILL
Mar 14 18:01:13 localhost kernel: Booting Linux on physical CPU 0x0000000000 [0x481fd010]
Mar 14 18:01:13 localhost kernel: Linux version 4.19.91-25.2.an7.aarch64 (mockbuild@6493ae2847964ef88ec8b03132eeb3bb) (gcc version 9.1.1 20190605 (Red Hat 9.1.1-2) (GCC)) #1 SMP Wed Jan 5 16:35:31 CST 2022
Mar 14 18:01:13 localhost kernel: Machine model: linux,dummy-virt
Mar 14 18:01:13 localhost kernel: efi: Getting EFI parameters from FDT:
Mar 14 18:01:13 localhost kernel: efi: EFI v2.70 by EDK II
Mar 14 18:01:13 localhost kernel: efi:  SMBIOS 3.0=0x43f4e0000  ACPI 2.0=0x43bb80000  MEMATTR=0x43dff1518  MEMRESERVE=0x43c103198
'''
Comment 10 wanqian alibaba_cloud_group 2022-03-14 16:26:19 UTC
Created attachment 172 [details]
20220314_测试日志