Bug 1677 - [anolis8][anck][x86_64]mcelog: ERROR: AMD Processor family 23: mcelog does not support this processor,mcelog.service处于failed状态
Summary: [anolis8][anck][x86_64]mcelog: ERROR: AMD Processor family 23: mcelog does no...
Status: CONFIRMED
Alias: None
Product: Anolis OS 8
Classification: Anolis OS
Component: kernel - anck-4.19 (show other bugs) kernel - anck-4.19
Version: 8.4
Hardware: x86_64 Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: Bixuan Cui
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-07-15 10:39 UTC by liuyaqing
Modified: 2024-02-20 13:35 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description liuyaqing alibaba_cloud_group 2022-07-15 10:39:33 UTC
Description of problem:
anolis8.4 anck/rhck 多台实例出现CPU不支持导致mcelog.service启动失败,以ecs.g6a.large为例


# systemctl status mcelog.service
● mcelog.service - Machine Check Exception Logging Daemon
   Loaded: loaded (/usr/lib/systemd/system/mcelog.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2022-07-13 10:49:05 CST; 1 day 23h ago
  Process: 28887 ExecStart=/usr/sbin/mcelog --ignorenodev --daemon --foreground (code=exited, status=1/FAILURE)
 Main PID: 28887 (code=exited, status=1/FAILURE)

Jul 13 10:49:05 iZ2zehqvie1pc2r43idzwxZ systemd[1]: Started Machine Check Exception Logging Daemon.
Jul 13 10:49:05 iZ2zehqvie1pc2r43idzwxZ systemd[1]: mcelog.service: Main process exited, code=exited, status=1/FAILURE
Jul 13 10:49:05 iZ2zehqvie1pc2r43idzwxZ mcelog[28887]: mcelog: ERROR: AMD Processor family 23: mcelog does not support this processor.  P>
Jul 13 10:49:05 iZ2zehqvie1pc2r43idzwxZ mcelog[28887]: CPU is unsupported
Jul 13 10:49:05 iZ2zehqvie1pc2r43idzwxZ systemd[1]: mcelog.service: Failed with result 'exit-code'.



# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
BIOS Vendor ID:      Alibaba Cloud
CPU family:          23
Model:               49
Model name:          AMD EPYC 7H12 64-Core Processor
BIOS Model name:     pc-i440fx-2.1
Stepping:            0
CPU MHz:             2595.124
BogoMIPS:            5190.24
Virtualization:      AMD-V
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            16384K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap sha_ni xsaveopt xsavec xgetbv1 arat npt nrip_save

Version-Release number of selected component (if applicable):
# cat /etc/image-id
image_name="Anolis OS 8.4 ANCK 64 bit"
image_id="anolisos_8_4_x64_20G_anck_alibase_20220704.vhd"
release_date="20220704131731"

# uname -a
Linux iZ2zehqvie1pc2r43idzwxZ 4.19.91-26.an8.x86_64 #1 SMP Tue May 24 13:10:09 CST 2022 x86_64 x86_64 x86_64 GNU/Linux


How reproducible:
总是

Steps to Reproduce:
1.启动实例
2.systemctl status mcelog.service
3.

Actual results:
mcelog服务无法启动

Expected results:
mcelog服务正常启动

Additional info:
Comment 1 liuyaqing alibaba_cloud_group 2022-07-15 11:19:43 UTC
补充

# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 49
model name      : AMD EPYC 7H12 64-Core Processor
stepping        : 0
microcode       : 0x1000065
cpu MHz         : 2595.124
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap sha_ni xsaveopt xsavec xgetbv1 arat npt nrip_save
bugs            : fxsave_leak sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 5190.24
TLB size        : 3072 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 49
model name      : AMD EPYC 7H12 64-Core Processor
stepping        : 0
microcode       : 0x1000065
cpu MHz         : 2595.124
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 rdseed adx smap sha_ni xsaveopt xsavec xgetbv1 arat npt nrip_save
bugs            : fxsave_leak sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 5190.24
TLB size        : 3072 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management:
Comment 2 maqiao_mq alibaba_cloud_group 2022-07-15 11:26:38 UTC
看systemd的日志,是cpu不支持mce相关的feature,请@GuanJun帮忙确认一下
Comment 3 Bixuan Cui 2022-07-15 11:57:31 UTC
 在mcelog代码mcelog.c
 if (seen == ALL) {
                        if (!strcmp(vendor,"AuthenticAMD")) {
                                if (family == 15) {
                                        cputype = CPU_K8;
                                } else if (family >= 16) {
                                        Eprintf("ERROR: AMD Processor family %d: mcelog does not support this processor.  Please use the edac_mce_amd module instead.\n", family);
                                        return 0;
                                }

可以看到 AuthenticAMD型号,CPU family大于等于16都不支持。
Comment 4 gaochang alibaba_cloud_group 2022-07-15 12:05:42 UTC
已和相关人员确认,不影响发布
Comment 5 liuyaqing alibaba_cloud_group 2022-08-04 11:21:10 UTC
在7月份镜像测试中 anolisos_8_6_x64_20G_anck_alibase_20220727.vhd镜像仍存在该问题
包含实例:ecs.g7a.large,ecs.g7a.32xlarge,ecs.g6a.large,ecs.g6a.32xlarge

以ecs.g7a.32xlarge为例
# uname -a
Linux iZbp1h8z0fafm11yk1sia6Z 4.19.91-26.an8.x86_64 #1 SMP Tue May 24 13:10:09 CST 2022 x86_64 x86_64 x86_64 GNU/Linux

# cat /etc/image-id
image_name="Anolis OS 8.6 ANCK 64 bit"
image_id="anolisos_8_6_x64_20G_anck_alibase_20220727.vhd"
release_date="20220727152503"

# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.6"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.6"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.6"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"


# systemctl status mcelog.service
● mcelog.service - Machine Check Exception Logging Daemon
   Loaded: loaded (/usr/lib/systemd/system/mcelog.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2022-08-04 10:02:13 CST; 1h 15min ago
 Main PID: 1491 (code=exited, status=1/FAILURE)

Aug 04 10:02:13 iZbp1h4fzft73vyolidr2uZ systemd[1]: Started Machine Check Exception Logging Daemon.
Aug 04 10:02:13 iZbp1h4fzft73vyolidr2uZ mcelog[1491]: mcelog: ERROR: AMD Processor family 25: mcelog does not support this processor.  Pl>
Aug 04 10:02:13 iZbp1h4fzft73vyolidr2uZ mcelog[1491]: CPU is unsupported
Aug 04 10:02:13 iZbp1h4fzft73vyolidr2uZ systemd[1]: mcelog.service: Main process exited, code=exited, status=1/FAILURE
Aug 04 10:02:13 iZbp1h4fzft73vyolidr2uZ systemd[1]: mcelog.service: Failed with result 'exit-code'.
Comment 6 liuyaqing alibaba_cloud_group 2022-09-28 14:42:21 UTC
9月份镜像测试问题情况相同
Comment 7 sunqingwei uniontech_group 2024-02-20 13:35:11 UTC
8.9RC1镜像情况相同
[root@localhost ~]# systemctl start mcelog
[root@localhost ~]# systemctl status mcelog.service
● mcelog.service - Machine Check Exception Logging Daemon
   Loaded: loaded (/usr/lib/systemd/system/mcelog.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2024-02-20 13:32:55 CST; 5s ago
  Process: 7923 ExecStart=/usr/sbin/mcelog --ignorenodev --daemon --foreground (code=exited, status=1/FAILURE)
 Main PID: 7923 (code=exited, status=1/FAILURE)

2月 20 13:32:55 localhost.localdomain systemd[1]: Started Machine Check Exception Logging Daemon.
2月 20 13:32:55 localhost.localdomain mcelog[7923]: mcelog: ERROR: Hygon Processor family 24: mcelog does not support this processor.  Please use the edac_mce_amd module instead.
2月 20 13:32:55 localhost.localdomain mcelog[7923]: CPU is unsupported
2月 20 13:32:55 localhost.localdomain systemd[1]: mcelog.service: Main process exited, code=exited, status=1/FAILURE
2月 20 13:32:55 localhost.localdomain systemd[1]: mcelog.service: Failed with result 'exit-code'.
[root@localhost ~]# systemctl start mcelog.service