Bug 4255 - [Anck 5.10 nightly/ANCK-5.10-14-rc1][Anolis8][x86_64][ECS]perf-sanity-tests下“Parsing of PMU event table metrics”用例fail
Summary: [Anck 5.10 nightly/ANCK-5.10-14-rc1][Anolis8][x86_64][ECS]perf-sanity-tests下“...
Status: CLOSED FIXED
Alias: None
Product: Antest
Classification: Infrastructures
Component: 测试用例 (show other bugs) 测试用例
Version: unspecified
Hardware: x86_64 Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: zhangjing
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-02-28 13:59 UTC by shanxifanshi
Modified: 2023-07-25 15:18 UTC (History)
8 users (show)

See Also:


Attachments
Parsing_of_PMU_event_table_metrics_log (176.49 KB, text/plain)
2023-03-02 15:04 UTC, shanxifanshi
Details

Note You need to log in before you can comment on or make changes to this bug.
Description shanxifanshi alibaba_cloud_group 2023-02-28 13:59:06 UTC
[缺陷描述]:
perf-sanity-tests下“Parsing of PMU event table metrics”用例fail

测试日志:

# perf test "Parsing of PMU event table metrics"
10: PMU events                                                      :
10.3: Parsing of PMU event table metrics                            : FAILED!
10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

[环境信息]:
perf版本:
# perf -v
perf version 5.10.134-320.git.04d8c84896c6.an8.x86_64

内核信息:
# uname -r
5.10.134-320.git.04d8c84896c6.an8.x86_64

操作系统信息:
# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.8"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.8"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.8"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"

cpu信息:
# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
BIOS Vendor ID:      Alibaba Cloud
CPU family:          6
Model:               106
Model name:          Intel(R) Xeon(R) Platinum 8369B CPU @ 2.70GHz
BIOS Model name:     pc-i440fx-2.1
Stepping:            6
CPU MHz:             2699.998
BogoMIPS:            5399.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           48K
L1i cache:           32K
L2 cache:            1280K
L3 cache:            49152K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm arch_capabilities


内存信息:
# free -h
              total        used        free      shared  buff/cache   available
Mem:           15Gi       283Mi        13Gi       1.0Mi       1.3Gi        14Gi
Swap:            0B          0B          0B

[期望结果]:
用例pass

[实际结果]:
用例fail

[问题发生概率]:必现

[复现步骤]:	
1. 安装跟当前内核匹配的最新的perf和python3-perf包
2. perf test -v "Parsing of PMU event table metrics"

[原因分析]:
1. 该用例首次fail是在2月25号晚的nightly出现,观察了几天之后一直fail
2. 该用例只在ECS上fail,物理机上是pass的
3. 可能跟开发在26号合入的这个commit相关
https://gitee.com/anolis/cloud-kernel/pulls/1285
Comment 1 shanxifanshi alibaba_cloud_group 2023-03-02 15:02:48 UTC
这个用例在an8 5.10-134-14 rc1 amd geona ecs上也是fail的

# uname -r
5.10.134-14_rc1.an8.x86_64

测试报错日志:
Parsing-of-PMU-event-table-metrics: Fail
Parsing-of-PMU-event-table-metrics-with-fake-PMUs: Pass
Comment 2 shanxifanshi alibaba_cloud_group 2023-03-02 15:04:22 UTC
Created attachment 662 [details]
Parsing_of_PMU_event_table_metrics_log
Comment 3 shanxifanshi alibaba_cloud_group 2023-03-02 15:05:35 UTC
(In reply to shanxifanshi from comment #2)
> Created attachment 662 [details]
> Parsing_of_PMU_event_table_metrics_log

--附件中记录了详细的测试日志,里面有一些解析报错的打印
Parse event failed metric 'l3_read_miss_latency' id 'xi_ccx_sdp_req1' expr '(xi_sys_fill_latency * 16) / xi_ccx_sdp_req1'
Error string 'parser error' help '(null)'
Parse event failed metric 'l3_read_miss_latency' id 'xi_sys_fill_latency' expr '(xi_sys_fill_latency * 16) / xi_ccx_sdp_req1'
Error string 'parser error' help '(null)'
Found metric 'macro_ops_dispatched'
Parse event failed metric 'macro_ops_dispatched' id 'de_dis_cops_from_decoder.disp_op_type.any_fp_dispatch' expr 'de_dis_cops_from_decoder.disp_op_type.any_integer_dispatch + de_dis_cops_from_decoder.disp_op_type.any_fp_dispatch'
Error string 'parser error' help '(null)'
Parse event failed metric 'macro_ops_dispatched' id 'de_dis_cops_from_decoder.disp_op_type.any_integer_dispatch' expr 'de_dis_cops_from_decoder.disp_op_type.any_integer_dispatch + de_dis_cops_from_decoder.disp_op_type.any_fp_dispatch'
Error string 'parser error' help '(null)'
Found metric 'nps1_die_to_dram'
Parse event failed metric 'nps1_die_to_dram' id 'dram_channel_data_controller_4' expr 'dram_channel_data_controller_0 + dram_channel_data_controller_1 + dram_channel_data_controller_2 + dram_channel_data_controller_3 + dram_channel_data_controller_4 + dram_channel_data_controller_5 + dram_channel_data_controller_6 + dram_channel_data_controller_7'
Error string 'parser error' help '(null)'
Parse event failed metric 'nps1_die_to_dram' id 'dram_channel_data_controller_1' expr 'dram_channel_data_controller_0 + dram_channel_data_controller_1 + dram_channel_data_controller_2 + dram_channel_data_controller_3 + dram_channel_data_controller_4 + dram_channel_data_controller_5 + dram_channel_data_controller_6 + dram_channel_data_controller_7'
Error string 'parser error' help '(null)'
Parse event failed metric 'nps1_die_to_dram' id 'dram_channel_data_controller_6' expr 'dram_channel_data_controller_0 + dram_channel_data_controller_1 + dram_channel_data_controller_2 + dram_channel_data_controller_3 + dram_channel_data_controller_4 + dram_channel_data_controller_5 + dram_channel_data_controller_6 + dram_channel_data_controller_7'
Error string 'parser error' help '(null)'
Parse event failed metric 'nps1_die_to_dram' id 'dram_channel_data_controller_3' expr 'dram_channel_data_controller_0 + dram_channel_data_controller_1 + dram_channel_data_controller_2 + dram_channel_data_controller_3 + dram_channel_data_controller_4 + dram_channel_data_controller_5 + dram_channel_data_controller_6 + dram_channel_data_controller_7'
Error string 'parser error' help '(null)'
Comment 4 yunmeng365524 2023-03-04 21:42:46 UTC
@yongfeng, 帮忙确认一下,回退perf是不是可以通过,跟如下bug类似。可能是同原因导致
https://bugzilla.openanolis.cn/show_bug.cgi?id=4247
Comment 5 shanxifanshi alibaba_cloud_group 2023-03-06 09:50:41 UTC
(In reply to yunmeng365524 from comment #4)
> @yongfeng, 帮忙确认一下,回退perf是不是可以通过,跟如下bug类似。可能是同原因导致
> https://bugzilla.openanolis.cn/show_bug.cgi?id=4247

先用最新的perf,验证了2次,用例是fail的;相同机器,切换回25号晚编译的perf,用例pass,感觉还是内核代码修改引入的问题。

最新的perf测试结果,用例fail:
# perf test "Parsing of PMU event table metrics"
10: PMU events                                                      :
10.3: Parsing of PMU event table metrics                            : FAILED!
10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

# perf test "Parsing of PMU event table metrics"
10: PMU events                                                      :
10.3: Parsing of PMU event table metrics                            : FAILED!
10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

# perf -v
perf version 5.10.134-327.git.2ed1510fd4be.an8.x86_64

25号晚编译的perf测试结果,用例pass
# rpm -ivh --force http://172.16.0.24/kernel/Anolis8/ANCK-5.10/x86_64/20230225213803_318/perf-5.10.134-318.git.cf524d74aa27.an8.x86_64.rpm
Retrieving http://172.16.0.24/kernel/Anolis8/ANCK-5.10/x86_64/20230225213803_318/perf-5.10.134-318.git.cf524d74aa27.an8.x86_64.rpm
Verifying...                          ################################# [100%]
Preparing...                          ################################# [100%]
Updating / installing...
   1:perf-5.10.134-318.git.cf524d74aa2################################# [100%]

# perf -v
perf version 5.10.134-318.git.cf524d74aa27.an8.x86_64

# perf test "Parsing of PMU event table metrics"
10: PMU events                                                      :
10.3: Parsing of PMU event table metrics                            : Ok
10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
Comment 6 zhangjing alibaba_cloud_group 2023-03-10 11:47:17 UTC
已修复,PR: https://e.gitee.com/openanolis/repos/anolis/cloud-kernel/pulls/1399
Comment 7 zhangjing alibaba_cloud_group 2023-03-10 12:32:00 UTC
已修复
Comment 8 shanxifanshi alibaba_cloud_group 2023-03-10 14:06:52 UTC
该用例只在ecs上fail,在社区ecs,安装最新的nightly内核和perf包验证,用例pass,问题解决,bug关闭

# perf test 'Parsing of PMU event table metrics'
10: PMU events                                                      :
10.3: Parsing of PMU event table metrics                            : Ok
10.4: Parsing of PMU event table metrics with fake PMUs             : Ok

# perf -v
perf version 5.10.134-331.git.78c79f18363d.an8.x86_64

# uname -r
5.10.134-331.git.78c79f18363d.an8.x86_64