Bug 3950 - [Anolis8.8][RHCK][x86_64][AMD] mcelog.service服务启动失败
Summary: [Anolis8.8][RHCK][x86_64][AMD] mcelog.service服务启动失败
Status: RESOLVED WONTFIX
Alias: None
Product: Anolis OS 8
Classification: Anolis OS
Component: Images&Installations (show other bugs) Images&Installations
Version: 8.8
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: maqiao
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-02-06 16:23 UTC by anolislw
Modified: 2023-04-27 10:21 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description anolislw alibaba_cloud_group 2023-02-06 16:23:33 UTC
Description of problem:
Anolis8.8 RHCK x86_64 AMD环境, mcelog.service服务启动失败

Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.systemctl status mcelog.service

Actual results:
[root@localhost anuser]# systemctl status mcelog.service
● mcelog.service - Machine Check Exception Logging Daemon
   Loaded: loaded (/usr/lib/systemd/system/mcelog.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2023-02-06 16:18:05 CST; 17s ago
  Process: 6331 ExecStart=/usr/sbin/mcelog --ignorenodev --daemon --foreground (code=exited, status=1/FAILURE)
 Main PID: 6331 (code=exited, status=1/FAILURE)

Feb 06 16:18:05 localhost.localdomain systemd[1]: Started Machine Check Exception Logging Daemon.
Feb 06 16:18:05 localhost.localdomain mcelog[6331]: mcelog: ERROR: AMD Processor family 25: mcelog does not support this proce>
Feb 06 16:18:05 localhost.localdomain mcelog[6331]: CPU is unsupported
Feb 06 16:18:05 localhost.localdomain systemd[1]: mcelog.service: Main process exited, code=exited, status=1/FAILURE
Feb 06 16:18:05 localhost.localdomain systemd[1]: mcelog.service: Failed with result 'exit-code'.

[root@localhost anuser]# arch
x86_64
[root@localhost anuser]# uname -r
4.18.0-425.10.1.an8.x86_64
[root@localhost anuser]# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.8"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.8"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.8"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"


Expected results:
mcelog[6331]: mcelog: ERROR: AMD Processor family 25: mcelog does not support this proce
mcelog[6331]: CPU is unsupported

Additional info:
[root@localhost anuser]# uname -r
4.18.0-425.10.1.an8.x86_64
[root@localhost anuser]# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.8"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.8"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.8"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"

[root@localhost anuser]# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
BIOS Vendor ID:      Alibaba Cloud
CPU family:          25
Model:               1
Model name:          AMD EPYC 7T83 64-Core Processor
BIOS Model name:     pc-i440fx-2.1
Stepping:            1
CPU MHz:             2545.218
BogoMIPS:            5090.43
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            32768K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext invpcid_single vmmcall tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat vaes vpclmulqdq rdpid fsrm
[root@localhost anuser]# cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/boot/vmlinuz-4.18.0-425.10.1.an8.x86_64 root=UUID=9af0b2b8-8abf-4cd8-af7b-33ad8c2c91cc ro crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M cryptomgr.notests cgroup.memory=nokmem rcupdate.rcu_cpu_stall_timeout=300 vring_force_dma_api rhgb quiet biosdevname=0 net.ifnames=0 console=tty0 console=ttyS0,115200n8 noibrs nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295
[root@localhost anuser]# arch
x86_64
[root@localhost anuser]# cat /etc/image-id
image_name="Anolis OS 8.8 RHCK 64 bit"
image_id="anolisos_8_8_x64_20G_rhck_community_alibase_20230203.vhd"
release_date="20230203154040"
[root@localhost anuser]#
Comment 1 anolislw alibaba_cloud_group 2023-02-06 16:27:01 UTC
类似的单子 https://bugzilla.openanolis.cn/show_bug.cgi?id=1677
Comment 2 maqiao alibaba_cloud_group 2023-02-06 16:47:34 UTC
1. 该问题与内核无关,是mcelog组件的问题
2. 根据bug 1677,这个是mcelog不支持此类AMD cpu导致的,不做修复
Comment 3 wdy_d_zuec 2023-04-27 10:21:54 UTC
我这里也出现类似的情况:

Linux unity3d 4.18.0-425.13.1.0.1.an8.x86_64 #1 SMP Thu Feb 23 10:06:51 CST 2023 x86_64 x86_64 x86_64 GNU/Linux


操作系统第二天凌晨系统出现操作无响应。界面时间停留在:6:45

Apr 26 16:49:30 unity3d systemd[1]: Starting man-db-cache-update.service...
Apr 26 16:49:32 unity3d systemd[1]: man-db-cache-update.service: Succeeded.
Apr 26 16:49:32 unity3d systemd[1]: Started man-db-cache-update.service.
Apr 26 16:49:32 unity3d systemd[1]: run-r68f31fe19a654205b51c9e18cd60ea66.service: Succeeded.
Apr 26 17:20:45 unity3d systemd-logind[1217]: New session 35 of user root.
Apr 26 17:20:45 unity3d systemd[1]: Started Session 35 of user root.
Apr 26 17:21:06 unity3d systemd-logind[1217]: Session 35 logged out. Waiting for processes to exit.
Apr 26 17:21:06 unity3d systemd[1]: session-35.scope: Succeeded.
Apr 26 17:21:06 unity3d systemd-logind[1217]: Removed session 35.
Apr 27 00:00:15 unity3d systemd[1]: Starting update of the root trust anchor for DNSSEC validation in unbound...
Apr 27 00:00:16 unity3d systemd[1]: unbound-anchor.service: Succeeded.
Apr 27 00:00:16 unity3d systemd[1]: Started update of the root trust anchor for DNSSEC validation in unbound.
Apr 27 09:18:53 unity3d kernel: Command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-425.13.1.0.1.an8.x86_64 root=/dev/mapper/ao-root ro crashkernel=256M resume=/dev/mapper/ao-swap rd.lvm.lv=ao/root r
d.lvm.lv=ao/swap rhgb quiet
Apr 27 09:18:53 unity3d kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Apr 27 09:18:53 unity3d kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Apr 27 09:18:53 unity3d kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Apr 27 09:18:53 unity3d kernel: x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
Apr 27 09:18:53 unity3d kernel: x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.
Apr 27 09:18:53 unity3d kernel: signal: max sigframe size: 1776
Apr 27 09:18:53 unity3d kernel: BIOS-provided physical RAM map:
Apr 27 09:18:53 unity3d kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
Apr 27 09:18:53 unity3d kernel: BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
Apr 27 09:18:53 unity3d kernel: BIOS-e820: [mem 0x0000000000100000-0x0000000003ffffff] usable
Apr 27 09:18:53 unity3d kernel: BIOS-e820: [mem 0x0000000004000000-0x0000000004009fff] ACPI NVS
Apr 27 09:18:53 unity3d kernel: BIOS-e820: [mem 0x000000000400a000-0x0000000009d7ffff] usable
Apr 27 09:18:53 unity3d kernel: BIOS-e820: [mem 0x0000000009d80000-0x0000000009ffffff] reserved
Apr 27 09:18:53 unity3d kernel: BIOS-e820: [mem 0x000000000a000000-0x000000000affffff] usable
Apr 27 09:18:53 unity3d kernel: BIOS-e820: [mem 0x000000000b000000-0x000000000b01ffff] reserved
Apr 27 09:18:53 unity3d kernel: BIOS-e820: [mem 0x000000000b020000-0x00000000dd5c0fff] usable
Apr 27 09:18:53 unity3d kernel: BIOS-e820: [mem 0x00000000dd5c1000-0x00000000dd700fff] reserved
Apr 27 09:18:53 unity3d kernel: BIOS-e820: [mem 0x00000000dd701000-0x00000000ddaf5fff] usable
Apr 27 09:18:53 unity3d kernel: BIOS-e820: [mem 0x00000000ddaf6000-0x00000000ddbccfff] ACPI NVS