[缺陷描述]: perf-sanity-tests下“Parsing of PMU event table metrics”用例fail 测试日志: # perf test "Parsing of PMU event table metrics" 10: PMU events : 10.3: Parsing of PMU event table metrics : FAILED! 10.4: Parsing of PMU event table metrics with fake PMUs : Ok [环境信息]: perf版本: # perf -v perf version 5.10.134-320.git.04d8c84896c6.an8.x86_64 内核信息: # uname -r 5.10.134-320.git.04d8c84896c6.an8.x86_64 操作系统信息: # cat /etc/os-release NAME="Anolis OS" VERSION="8.8" ID="anolis" ID_LIKE="rhel fedora centos" VERSION_ID="8.8" PLATFORM_ID="platform:an8" PRETTY_NAME="Anolis OS 8.8" ANSI_COLOR="0;31" HOME_URL="https://openanolis.cn/" cpu信息: # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel BIOS Vendor ID: Alibaba Cloud CPU family: 6 Model: 106 Model name: Intel(R) Xeon(R) Platinum 8369B CPU @ 2.70GHz BIOS Model name: pc-i440fx-2.1 Stepping: 6 CPU MHz: 2699.998 BogoMIPS: 5399.99 Hypervisor vendor: KVM Virtualization type: full L1d cache: 48K L1i cache: 32K L2 cache: 1280K L3 cache: 49152K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm arch_capabilities 内存信息: # free -h total used free shared buff/cache available Mem: 15Gi 283Mi 13Gi 1.0Mi 1.3Gi 14Gi Swap: 0B 0B 0B [期望结果]: 用例pass [实际结果]: 用例fail [问题发生概率]:必现 [复现步骤]: 1. 安装跟当前内核匹配的最新的perf和python3-perf包 2. perf test -v "Parsing of PMU event table metrics" [原因分析]: 1. 该用例首次fail是在2月25号晚的nightly出现,观察了几天之后一直fail 2. 该用例只在ECS上fail,物理机上是pass的 3. 可能跟开发在26号合入的这个commit相关 https://gitee.com/anolis/cloud-kernel/pulls/1285
这个用例在an8 5.10-134-14 rc1 amd geona ecs上也是fail的 # uname -r 5.10.134-14_rc1.an8.x86_64 测试报错日志: Parsing-of-PMU-event-table-metrics: Fail Parsing-of-PMU-event-table-metrics-with-fake-PMUs: Pass
Created attachment 662 [details] Parsing_of_PMU_event_table_metrics_log
(In reply to shanxifanshi from comment #2) > Created attachment 662 [details] > Parsing_of_PMU_event_table_metrics_log --附件中记录了详细的测试日志,里面有一些解析报错的打印 Parse event failed metric 'l3_read_miss_latency' id 'xi_ccx_sdp_req1' expr '(xi_sys_fill_latency * 16) / xi_ccx_sdp_req1' Error string 'parser error' help '(null)' Parse event failed metric 'l3_read_miss_latency' id 'xi_sys_fill_latency' expr '(xi_sys_fill_latency * 16) / xi_ccx_sdp_req1' Error string 'parser error' help '(null)' Found metric 'macro_ops_dispatched' Parse event failed metric 'macro_ops_dispatched' id 'de_dis_cops_from_decoder.disp_op_type.any_fp_dispatch' expr 'de_dis_cops_from_decoder.disp_op_type.any_integer_dispatch + de_dis_cops_from_decoder.disp_op_type.any_fp_dispatch' Error string 'parser error' help '(null)' Parse event failed metric 'macro_ops_dispatched' id 'de_dis_cops_from_decoder.disp_op_type.any_integer_dispatch' expr 'de_dis_cops_from_decoder.disp_op_type.any_integer_dispatch + de_dis_cops_from_decoder.disp_op_type.any_fp_dispatch' Error string 'parser error' help '(null)' Found metric 'nps1_die_to_dram' Parse event failed metric 'nps1_die_to_dram' id 'dram_channel_data_controller_4' expr 'dram_channel_data_controller_0 + dram_channel_data_controller_1 + dram_channel_data_controller_2 + dram_channel_data_controller_3 + dram_channel_data_controller_4 + dram_channel_data_controller_5 + dram_channel_data_controller_6 + dram_channel_data_controller_7' Error string 'parser error' help '(null)' Parse event failed metric 'nps1_die_to_dram' id 'dram_channel_data_controller_1' expr 'dram_channel_data_controller_0 + dram_channel_data_controller_1 + dram_channel_data_controller_2 + dram_channel_data_controller_3 + dram_channel_data_controller_4 + dram_channel_data_controller_5 + dram_channel_data_controller_6 + dram_channel_data_controller_7' Error string 'parser error' help '(null)' Parse event failed metric 'nps1_die_to_dram' id 'dram_channel_data_controller_6' expr 'dram_channel_data_controller_0 + dram_channel_data_controller_1 + dram_channel_data_controller_2 + dram_channel_data_controller_3 + dram_channel_data_controller_4 + dram_channel_data_controller_5 + dram_channel_data_controller_6 + dram_channel_data_controller_7' Error string 'parser error' help '(null)' Parse event failed metric 'nps1_die_to_dram' id 'dram_channel_data_controller_3' expr 'dram_channel_data_controller_0 + dram_channel_data_controller_1 + dram_channel_data_controller_2 + dram_channel_data_controller_3 + dram_channel_data_controller_4 + dram_channel_data_controller_5 + dram_channel_data_controller_6 + dram_channel_data_controller_7' Error string 'parser error' help '(null)'
@yongfeng, 帮忙确认一下,回退perf是不是可以通过,跟如下bug类似。可能是同原因导致 https://bugzilla.openanolis.cn/show_bug.cgi?id=4247
(In reply to yunmeng365524 from comment #4) > @yongfeng, 帮忙确认一下,回退perf是不是可以通过,跟如下bug类似。可能是同原因导致 > https://bugzilla.openanolis.cn/show_bug.cgi?id=4247 先用最新的perf,验证了2次,用例是fail的;相同机器,切换回25号晚编译的perf,用例pass,感觉还是内核代码修改引入的问题。 最新的perf测试结果,用例fail: # perf test "Parsing of PMU event table metrics" 10: PMU events : 10.3: Parsing of PMU event table metrics : FAILED! 10.4: Parsing of PMU event table metrics with fake PMUs : Ok # perf test "Parsing of PMU event table metrics" 10: PMU events : 10.3: Parsing of PMU event table metrics : FAILED! 10.4: Parsing of PMU event table metrics with fake PMUs : Ok # perf -v perf version 5.10.134-327.git.2ed1510fd4be.an8.x86_64 25号晚编译的perf测试结果,用例pass # rpm -ivh --force http://172.16.0.24/kernel/Anolis8/ANCK-5.10/x86_64/20230225213803_318/perf-5.10.134-318.git.cf524d74aa27.an8.x86_64.rpm Retrieving http://172.16.0.24/kernel/Anolis8/ANCK-5.10/x86_64/20230225213803_318/perf-5.10.134-318.git.cf524d74aa27.an8.x86_64.rpm Verifying... ################################# [100%] Preparing... ################################# [100%] Updating / installing... 1:perf-5.10.134-318.git.cf524d74aa2################################# [100%] # perf -v perf version 5.10.134-318.git.cf524d74aa27.an8.x86_64 # perf test "Parsing of PMU event table metrics" 10: PMU events : 10.3: Parsing of PMU event table metrics : Ok 10.4: Parsing of PMU event table metrics with fake PMUs : Ok
已修复,PR: https://e.gitee.com/openanolis/repos/anolis/cloud-kernel/pulls/1399
已修复
该用例只在ecs上fail,在社区ecs,安装最新的nightly内核和perf包验证,用例pass,问题解决,bug关闭 # perf test 'Parsing of PMU event table metrics' 10: PMU events : 10.3: Parsing of PMU event table metrics : Ok 10.4: Parsing of PMU event table metrics with fake PMUs : Ok # perf -v perf version 5.10.134-331.git.78c79f18363d.an8.x86_64 # uname -r 5.10.134-331.git.78c79f18363d.an8.x86_64