Bug 5452 - [ANCK-5.10-15][Anolis8][x86_64][nightly]kernel-selftests:执行mptcp目录下simult_flows.sh用例概率fail,balanced bwidth with unbalanced delay - reverse direction15367 max 15353 [ fail ]
Summary: [ANCK-5.10-15][Anolis8][x86_64][nightly]kernel-selftests:执行mptcp目录下simult_flo...
Status: CLOSED FIXED
Alias: None
Product: ANCK 5.10 Dev
Classification: ANCK
Component: net (show other bugs) net
Version: 5.10.y-15
Hardware: x86_64 Linux
: P3-Medium S3-normal
Target Milestone: rc
Assignee: 宁畅
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-07 18:17 UTC by shanxifanshi
Modified: 2023-07-10 14:33 UTC (History)
12 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description shanxifanshi alibaba_cloud_group 2023-06-07 18:17:10 UTC
[缺陷描述]:
kernel-selftests:执行mptcp目录下simult_flows.sh用例概率fail,报错“balanced bwidth with unbalanced delay - reverse direction15367 max 15353  [ fail ]”; 在物理机上比较容易复现,测试10次,差不多能有1次fail,虚拟机测试30次左右都pass

测试日志:
# ./simult_flows.sh
balanced bwidth                                     4579 max 5005 [ OK ]
balanced bwidth - reverse direction                 4687 max 5005 [ OK ]
balanced bwidth with unbalanced delay               4580 max 5005 [ OK ]
balanced bwidth with unbalanced delay - reverse direction  5226 max 5005  [ fail ]
client exit code 0, server 0

netns ns3-0-N4ekuv socket stat for 10004:
State            Recv-Q            Send-Q                       Local Address:Port                       Peer Address:Port            Process

netns ns1-0-N4ekuv socket stat for 10004:
State            Recv-Q        Send-Q                   Local Address:Port                  Peer Address:Port         Process
TIME-WAIT        0             0                     10.0.2.1%ns1eth2:54659                     10.0.3.3:10004         timer:(timewait,59sec,0)

TIME-WAIT        0             0                             10.0.1.1:49214                     10.0.3.3:10004         timer:(timewait,59sec,0)

-rw------- 1 root root 81920 Jun  7 11:24 /tmp/tmp.baAFBz398m
-rw------- 1 root root 81920 Jun  7 11:25 /tmp/tmp.QyPkOVOtNN
-rw------- 1 root root 8388608 Jun  7 11:25 /tmp/tmp.8wVu5PXJYS
-rw------- 1 root root 8388608 Jun  7 11:24 /tmp/tmp.EwM9tHXWSj

[环境信息]:
复现环境:
anck 5.10 x86 物理机

复现概率:
必现

内核信息:
# uname -r
5.10.134-8.git.21e574ab8.an8.x86_64

操作系统信息:
# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.8"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.8"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.8"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"

cpu信息:
# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              24
On-line CPU(s) list: 0-23
Thread(s) per core:  2
Core(s) per socket:  12
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
BIOS Vendor ID:      Intel
CPU family:          6
Model:               63
Model name:          Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
BIOS Model name:     Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Stepping:            2
CPU MHz:             2292.150
CPU max MHz:         2500.0000
CPU min MHz:         1200.0000
BogoMIPS:            4988.74
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            30720K
NUMA node0 CPU(s):   0-23
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts md_clear flush_l1d

内存信息:
# free -h
              total        used        free      shared  buff/cache   available
Mem:           62Gi       2.3Gi        57Gi       186Mi       3.4Gi        59Gi
Swap:         2.0Gi        88Mi       1.9Gi

[复现步骤]:
下载当前内核对应的kernel源码包
rpm -ivh xxx.src.rpm  默认安装到/root下
yum-builddep -y rpmbuild/SPECS/kernel.spec   自动安装前置依赖包,需要yum-utils
rpmbuild -bp ./rpmbuild/SPECS/kernel.spec   # 这个步骤会打相关的patch, 解压缩tar包,生成BUILD目录
cd rpmbuild/BUILD/kernel-xxx/linux-xxx/  
cd  /tools/testing/selftests/net/mptcp
make

执行测试用例
./simult_flows.sh

[期望结果]:
用例pass

[实际结果]:
用例fail

[原因分析]:
调研了一下,主要有以下几个规律
1. 用例是概率fail,测试10次,差不多能有1次fail

2. 只在物理机上fail,虚拟机上测试30次无fail,物理机跑pass情况下,需要1分钟左右用例执行完,虚拟机上不到20s就可以跑完1次测试

3. 上游也存在相似的问题
https://github.com/multipath-tcp/mptcp_net-next/issues/137

4. 上游有代码修复,将上游patch合入后,本地验证,测试30次无fail
https://github.com/0day-ci/linux/commit/36a818c55249e2eef25c77ee237d7845b1cc90ed#diff-91a4329f2f1ce6d7b24fc23d4a622018149edcfce0dc745ac50a1a5caa4881a8
Comment 1 yunmeng365524 2023-06-19 21:43:45 UTC
与5468类似
Comment 2 小龙 admin 2023-06-25 11:38:55 UTC
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/1782
Comment 3 宁畅 alibaba_cloud_group 2023-06-25 16:54:56 UTC
已修复。PR: https://gitee.com/anolis/cloud-kernel/pulls/1782
Comment 4 shanxifanshi alibaba_cloud_group 2023-07-10 14:33:33 UTC
rc3版本物理机,验证30次均pass,问题解决,关闭bug

# uname -r
5.10.134-15_rc3.an8.x86_64

###########Begin 30 test###############
balanced bwidth                                   14627 max 15353 [ OK ]
balanced bwidth - reverse direction               14633 max 15353 [ OK ]
balanced bwidth with unbalanced delay             14587 max 15353 [ OK ]
balanced bwidth with unbalanced delay - reverse direction14693 max 15353 [ OK ]
###########End 30 test#################