Bug 4435 - [Anck 5.10 nightly][Anolis8][x86_64]kernel-selftests:执行net目录下devlink_port_split.py用例fail,IndexError: list index out of range
Summary: [Anck 5.10 nightly][Anolis8][x86_64]kernel-selftests:执行net目录下devlink_port_spl...
Status: NEW
Alias: None
Product: Antest
Classification: Infrastructures
Component: 测试用例 (show other bugs) 测试用例
Version: unspecified
Hardware: x86_64 Linux
: P4-Low S4-trivial
Target Milestone: ---
Assignee: Jacob
QA Contact:
URL:
Whiteboard:
Keywords:
: 4426 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-03-08 14:03 UTC by shanxifanshi
Modified: 2023-03-16 19:08 UTC (History)
10 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description shanxifanshi alibaba_cloud_group 2023-03-08 14:03:10 UTC
[缺陷描述]:
kernel-selftests:执行net目录下devlink_port_split.py用例fail,IndexError: list index out of range

测试日志:
# ./devlink_port_split.py
Traceback (most recent call last):
  File "./devlink_port_split.py", line 277, in <module>
    main()
  File "./devlink_port_split.py", line 242, in main
    dev = list(devs.keys())[0]
IndexError: list index out of range

[环境信息]:
复现环境:
anck 5.10 x86 ecs/非mellanox网卡物理机

复现概率:
必现

内核信息:
# uname -r
5.10.134-329.git.a83585bed7a8.an8.x86_64

操作系统信息:
# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.8"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.8"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.8"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"

cpu信息:
# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
BIOS Vendor ID:      Alibaba Cloud
CPU family:          6
Model:               106
Model name:          Intel(R) Xeon(R) Platinum 8369B CPU @ 2.70GHz
BIOS Model name:     pc-i440fx-2.1
Stepping:            6
CPU MHz:             2699.998
BogoMIPS:            5399.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           48K
L1i cache:           32K
L2 cache:            1280K
L3 cache:            49152K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm arch_capabilities

内存信息:
# free -h
              total        used        free      shared  buff/cache   available
Mem:           15Gi       228Mi        13Gi       1.0Mi       1.5Gi        14Gi
Swap:            0B          0B          0B

软件包信息:
# rpm -qf /usr/sbin/devlink
iproute-5.15.0-4.0.2.an8.1.x86_64

# rpm -q iproute
iproute-5.15.0-4.0.2.an8.1.x86_64

[复现步骤]:
下载当前内核对应的kernel源码包
rpm -ivh xxx.src.rpm  默认安装到/root下
yum-builddep -y rpmbuild/SPECS/kernel.spec   自动安装前置依赖包,需要yum-utils
rpmbuild -bp ./rpmbuild/SPECS/kernel.spec   # 这个步骤会打相关的patch, 解压缩tar包,生成BUILD目录
cd rpmbuild/BUILD/kernel-xxx/linux-xxx/  
cd  /tools/testing/selftests/net
make

执行测试用例
./devlink_port_split.py

[期望结果]:
用例pass

[实际结果]:
用例fail

[原因分析]:
1. 用例之所以会fail,是通过devlink -j dev show未能获取到网卡信息,导致python脚本抛出异常
# devlink -j dev show   ---查询到的信息为空,正常情况下会输出网卡的pci信息
{"dev":{}} 

2. 通过以下命令获取网卡/端口信息,同样会获取失败
devlink -j port show
devlink dev show
devlink dev param show
devlink dev info

3. 从实际的调研结果来看,devlink无法获取virtio_net、ixgbe等网卡信息,疑似只支持获取mellanox网卡信息
Comment 1 yunmeng365524 2023-03-09 15:46:52 UTC
*** Bug 4426 has been marked as a duplicate of this bug. ***
Comment 2 shanxifanshi alibaba_cloud_group 2023-03-16 18:29:37 UTC
分别对比以下几天的nightly任务,查询/sysinfo/post下的机器信息,有如下发现:
1. 在devlink_port_split.py用例pass时,机器上多了一个netdevsim类型的eth1网卡(该网卡可能是由其他用例残留)
2. 用例fail时,机器上是不存在eth1网卡的
3. 经过验证,devlink是支持获取netdevsim类型的网卡信息的

综上,这个问题应该一直存在,并非新增问题,只是因为用例间的干扰,导致以前未发现。

3月5号nightly任务(用例pass):
https://tone.openanolis.cn/ws/jfupduzb/test_result/54622

3月7号nightly任务(用例fail)
https://tone.openanolis.cn/ws/jfupduzb/test_result/55189

3月15号nightly任务:(用例pass)
https://tone.openanolis.cn/ws/jfupduzb/test_result/56643
Comment 3 Jacob alibaba_cloud_group 2023-03-16 19:08:23 UTC
看起来是测试用例的问题。
https://github.com/torvalds/linux/commit/25173dd4093a24e977e2af9cd5654c205bf13547

按照这个patch提供的解决方法,复测后在没有devlink的时候会skip
Comment 4 Jacob alibaba_cloud_group 2023-03-16 19:08:40 UTC
看起来是测试用例的问题。
https://github.com/torvalds/linux/commit/25173dd4093a24e977e2af9cd5654c205bf13547

按照这个patch提供的解决方法,复测后在没有devlink的时候会skip