Description of problem: anolis8.4 anck/rhck 多台实例出现CPU不支持导致mcelog.service启动失败,以ecs.g6a.large为例 # systemctl status mcelog.service ● mcelog.service - Machine Check Exception Logging Daemon Loaded: loaded (/usr/lib/systemd/system/mcelog.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Wed 2022-07-13 10:49:05 CST; 1 day 23h ago Process: 28887 ExecStart=/usr/sbin/mcelog --ignorenodev --daemon --foreground (code=exited, status=1/FAILURE) Main PID: 28887 (code=exited, status=1/FAILURE) Jul 13 10:49:05 iZ2zehqvie1pc2r43idzwxZ systemd[1]: Started Machine Check Exception Logging Daemon. Jul 13 10:49:05 iZ2zehqvie1pc2r43idzwxZ systemd[1]: mcelog.service: Main process exited, code=exited, status=1/FAILURE Jul 13 10:49:05 iZ2zehqvie1pc2r43idzwxZ mcelog[28887]: mcelog: ERROR: AMD Processor family 23: mcelog does not support this processor. P> Jul 13 10:49:05 iZ2zehqvie1pc2r43idzwxZ mcelog[28887]: CPU is unsupported Jul 13 10:49:05 iZ2zehqvie1pc2r43idzwxZ systemd[1]: mcelog.service: Failed with result 'exit-code'. # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD BIOS Vendor ID: Alibaba Cloud CPU family: 23 Model: 49 Model name: AMD EPYC 7H12 64-Core Processor BIOS Model name: pc-i440fx-2.1 Stepping: 0 CPU MHz: 2595.124 BogoMIPS: 5190.24 Virtualization: AMD-V Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 16384K NUMA node0 CPU(s): 0,1 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap sha_ni xsaveopt xsavec xgetbv1 arat npt nrip_save Version-Release number of selected component (if applicable): # cat /etc/image-id image_name="Anolis OS 8.4 ANCK 64 bit" image_id="anolisos_8_4_x64_20G_anck_alibase_20220704.vhd" release_date="20220704131731" # uname -a Linux iZ2zehqvie1pc2r43idzwxZ 4.19.91-26.an8.x86_64 #1 SMP Tue May 24 13:10:09 CST 2022 x86_64 x86_64 x86_64 GNU/Linux How reproducible: 总是 Steps to Reproduce: 1.启动实例 2.systemctl status mcelog.service 3. Actual results: mcelog服务无法启动 Expected results: mcelog服务正常启动 Additional info:
补充 # cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7H12 64-Core Processor stepping : 0 microcode : 0x1000065 cpu MHz : 2595.124 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap sha_ni xsaveopt xsavec xgetbv1 arat npt nrip_save bugs : fxsave_leak sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 5190.24 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: processor : 1 vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7H12 64-Core Processor stepping : 0 microcode : 0x1000065 cpu MHz : 2595.124 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap sha_ni xsaveopt xsavec xgetbv1 arat npt nrip_save bugs : fxsave_leak sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass bogomips : 5190.24 TLB size : 3072 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management:
看systemd的日志,是cpu不支持mce相关的feature,请@GuanJun帮忙确认一下
在mcelog代码mcelog.c if (seen == ALL) { if (!strcmp(vendor,"AuthenticAMD")) { if (family == 15) { cputype = CPU_K8; } else if (family >= 16) { Eprintf("ERROR: AMD Processor family %d: mcelog does not support this processor. Please use the edac_mce_amd module instead.\n", family); return 0; } 可以看到 AuthenticAMD型号,CPU family大于等于16都不支持。
已和相关人员确认,不影响发布
在7月份镜像测试中 anolisos_8_6_x64_20G_anck_alibase_20220727.vhd镜像仍存在该问题 包含实例:ecs.g7a.large,ecs.g7a.32xlarge,ecs.g6a.large,ecs.g6a.32xlarge 以ecs.g7a.32xlarge为例 # uname -a Linux iZbp1h8z0fafm11yk1sia6Z 4.19.91-26.an8.x86_64 #1 SMP Tue May 24 13:10:09 CST 2022 x86_64 x86_64 x86_64 GNU/Linux # cat /etc/image-id image_name="Anolis OS 8.6 ANCK 64 bit" image_id="anolisos_8_6_x64_20G_anck_alibase_20220727.vhd" release_date="20220727152503" # cat /etc/os-release NAME="Anolis OS" VERSION="8.6" ID="anolis" ID_LIKE="rhel fedora centos" VERSION_ID="8.6" PLATFORM_ID="platform:an8" PRETTY_NAME="Anolis OS 8.6" ANSI_COLOR="0;31" HOME_URL="https://openanolis.cn/" # systemctl status mcelog.service ● mcelog.service - Machine Check Exception Logging Daemon Loaded: loaded (/usr/lib/systemd/system/mcelog.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Thu 2022-08-04 10:02:13 CST; 1h 15min ago Main PID: 1491 (code=exited, status=1/FAILURE) Aug 04 10:02:13 iZbp1h4fzft73vyolidr2uZ systemd[1]: Started Machine Check Exception Logging Daemon. Aug 04 10:02:13 iZbp1h4fzft73vyolidr2uZ mcelog[1491]: mcelog: ERROR: AMD Processor family 25: mcelog does not support this processor. Pl> Aug 04 10:02:13 iZbp1h4fzft73vyolidr2uZ mcelog[1491]: CPU is unsupported Aug 04 10:02:13 iZbp1h4fzft73vyolidr2uZ systemd[1]: mcelog.service: Main process exited, code=exited, status=1/FAILURE Aug 04 10:02:13 iZbp1h4fzft73vyolidr2uZ systemd[1]: mcelog.service: Failed with result 'exit-code'.
9月份镜像测试问题情况相同
8.9RC1镜像情况相同 [root@localhost ~]# systemctl start mcelog [root@localhost ~]# systemctl status mcelog.service ● mcelog.service - Machine Check Exception Logging Daemon Loaded: loaded (/usr/lib/systemd/system/mcelog.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Tue 2024-02-20 13:32:55 CST; 5s ago Process: 7923 ExecStart=/usr/sbin/mcelog --ignorenodev --daemon --foreground (code=exited, status=1/FAILURE) Main PID: 7923 (code=exited, status=1/FAILURE) 2月 20 13:32:55 localhost.localdomain systemd[1]: Started Machine Check Exception Logging Daemon. 2月 20 13:32:55 localhost.localdomain mcelog[7923]: mcelog: ERROR: Hygon Processor family 24: mcelog does not support this processor. Please use the edac_mce_amd module instead. 2月 20 13:32:55 localhost.localdomain mcelog[7923]: CPU is unsupported 2月 20 13:32:55 localhost.localdomain systemd[1]: mcelog.service: Main process exited, code=exited, status=1/FAILURE 2月 20 13:32:55 localhost.localdomain systemd[1]: mcelog.service: Failed with result 'exit-code'. [root@localhost ~]# systemctl start mcelog.service