Bug 5468 - [ANCK-5.10-15][Anolis8][x86_64][nightly]kernel-selftests:执行mptcp目录下mptcp_connect.sh用例概率fail,MPTCP copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 0)
Summary: [ANCK-5.10-15][Anolis8][x86_64][nightly]kernel-selftests:执行mptcp目录下mptcp_conn...
Status: CLOSED FIXED
Alias: None
Product: ANCK 5.10 Dev
Classification: ANCK
Component: net (show other bugs) net
Version: 5.10.y-15
Hardware: x86_64 Linux
: P3-Medium S3-normal
Target Milestone: rc
Assignee: 宁畅
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-08 17:52 UTC by shanxifanshi
Modified: 2023-07-10 13:49 UTC (History)
14 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description shanxifanshi alibaba_cloud_group 2023-06-08 17:52:00 UTC
[缺陷描述]:
kernel-selftests:执行mptcp目录下mptcp_connect.sh用例概率fail,报错“MPTCP   copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 0)”; 在物理机上比较容易复现,测试10次,差不多能有1-3次fail,虚拟机测试30次左右都pass

测试日志:
# ./mptcp_connect.sh
INFO: set ns3-6481858f-idnrtx dev ns3eth2: ethtool -K tso off gso off gro off
INFO: set ns4-6481858f-idnrtx dev ns4eth3: ethtool -K tso off gso off
Created /tmp/tmp.5N0g6MS91X (size 1898524       /tmp/tmp.5N0g6MS91X) containing data sent by client
Created /tmp/tmp.2tH752FxIc (size 4223004       /tmp/tmp.2tH752FxIc) containing data sent by server
New MPTCP socket can be blocked via sysctl              [ OK ]
setsockopt(..., TCP_ULP, "mptcp", ...) blocked  [ OK ]
INFO: validating network environment with pings
INFO: Using loss of 0.07% delay 22 ms reorder 93% 31% with delay 5ms on ns3eth4
ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP   (duration    23ms) [ OK ]
ns1 MPTCP -> ns1 (10.0.1.1:10001      ) TCP     (duration    22ms) [ OK ]
ns1 TCP   -> ns1 (10.0.1.1:10002      ) MPTCP   (duration    21ms) [ OK ]
ns1 MPTCP -> ns1 (dead:beef:1::1:10003) MPTCP   (duration    23ms) [ OK ]
ns1 MPTCP -> ns1 (dead:beef:1::1:10004) TCP     (duration    21ms) [ OK ]
ns1 TCP   -> ns1 (dead:beef:1::1:10005) MPTCP   (duration    22ms) [ OK ]
ns1 MPTCP -> ns2 (10.0.1.2:10006      ) MPTCP   (duration    26ms) [ OK ]
ns1 MPTCP -> ns2 (dead:beef:1::2:10007) MPTCP   (duration    44ms) [ OK ]
ns1 MPTCP -> ns2 (10.0.2.1:10008      ) MPTCP   (duration    28ms) [ OK ]
ns1 MPTCP -> ns2 (dead:beef:2::1:10009) MPTCP   (duration    29ms) [ OK ]
ns1 MPTCP -> ns3 (10.0.2.2:10010      ) MPTCP   (duration   300ms) [ OK ]
ns1 MPTCP -> ns3 (dead:beef:2::2:10011) MPTCP   (duration   255ms) [ OK ]
ns1 MPTCP -> ns3 (10.0.3.2:10012      ) MPTCP   (duration   275ms) [ OK ]
ns1 MPTCP -> ns3 (dead:beef:3::2:10013) MPTCP   (duration   263ms) [ OK ]
ns1 MPTCP -> ns4 (10.0.3.1:10014      ) MPTCP   (duration   263ms) [ OK ]
ns1 MPTCP -> ns4 (dead:beef:3::1:10015) MPTCP   (duration   251ms) [ OK ]
ns2 MPTCP -> ns1 (10.0.1.1:10016      ) MPTCP   (duration    46ms) [ OK ]
ns2 MPTCP -> ns1 (dead:beef:1::1:10017) MPTCP   (duration    29ms) [ OK ]
ns2 MPTCP -> ns3 (10.0.2.2:10018      ) MPTCP   (duration   257ms) [ OK ]
ns2 MPTCP -> ns3 (dead:beef:2::2:10019) MPTCP   (duration   241ms) [ OK ]
ns2 MPTCP -> ns3 (10.0.3.2:10020      ) MPTCP   (duration   245ms) [ OK ]
ns2 MPTCP -> ns3 (dead:beef:3::2:10021) MPTCP   (duration   258ms) [ OK ]
ns2 MPTCP -> ns4 (10.0.3.1:10022      ) MPTCP   (duration   268ms) [ OK ]
ns2 MPTCP -> ns4 (dead:beef:3::1:10023) MPTCP   (duration   276ms) [ OK ]
ns3 MPTCP -> ns1 (10.0.1.1:10024      ) MPTCP   (duration   454ms) [ OK ]
ns3 MPTCP -> ns1 (dead:beef:1::1:10025) MPTCP   (duration   258ms) [ OK ]
ns3 MPTCP -> ns2 (10.0.1.2:10026      ) MPTCP   (duration   280ms) [ OK ]
ns3 MPTCP -> ns2 (dead:beef:1::2:10027) MPTCP   (duration   242ms) [ OK ]
ns3 MPTCP -> ns2 (10.0.2.1:10028      ) MPTCP   (duration   519ms) [ OK ]
ns3 MPTCP -> ns2 (dead:beef:2::1:10029) MPTCP   (duration   253ms) [ OK ]
ns3 MPTCP -> ns4 (10.0.3.1:10030      ) MPTCP   (duration    65ms) [ OK ]
ns3 MPTCP -> ns4 (dead:beef:3::1:10031) MPTCP   (duration    42ms) [ OK ]
ns4 MPTCP -> ns1 (10.0.1.1:10032      ) MPTCP   (duration   474ms) [ OK ]
ns4 MPTCP -> ns1 (dead:beef:1::1:10033) MPTCP   (duration   360ms) [ OK ]
ns4 MPTCP -> ns2 (10.0.1.2:10034      ) MPTCP   (duration   768ms) [ OK ]
ns4 MPTCP -> ns2 (dead:beef:1::2:10035) MPTCP   (duration   440ms) [ OK ]
ns4 MPTCP -> ns2 (10.0.2.1:10036      ) MPTCP   copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 0)
(duration 30311ms) [ FAIL ] client exit code 2, server 0

netns ns2-6481858f-idnrtx socket stat for 10036:
State               Recv-Q           Send-Q                     Local Address:Port                        Peer Address:Port            Process
TIME-WAIT           0                0                               10.0.2.1:10036                           10.0.3.1:34106            timer:(timewait,59sec,0)


netns ns4-6481858f-idnrtx socket stat for 10036:
State              Recv-Q           Send-Q                      Local Address:Port                        Peer Address:Port            Process
LAST-ACK           0                1                                10.0.3.1:34106                           10.0.2.1:10036            timer:(on,205ms,0)
         ts sack cubic wscale:7,7 rto:223 rtt:22.232/0.048 ato:40 mss:1448 pmtu:1500 rcvmss:1420 advmss:1448 cwnd:1385 bytes_sent:1898524 bytes_acked:1898525 bytes_received:4223005 segs_out:1812 segs_in:3747 data_segs_out:1392 data_segs_in:3116 send 721655272bps lastsnd:30122 lastrcv:30049 lastack:30049 pacing_rate 1443294312bps delivery_rate 135796920bps delivered:1393 busy:218ms unacked:1 reordering:4 reord_seen:3 rcv_rtt:22.88 rcv_space:14480 rcv_ssthresh:3143668 minrtt:22.017 tcp-ulp-mptcp flags:Mmec token:0000(id:0)/822a9d0c(id:0) seq:a080289ed129332 sfseq:406001 ssnoff:bdf081e2 maplen:101c
ns4 MPTCP -> ns2 (dead:beef:2::1:10037) MPTCP   (duration   992ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.2.2:10038      ) MPTCP   (duration    45ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:2::2:10039) MPTCP   (duration    32ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.3.2:10040      ) MPTCP   (duration    46ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10041) MPTCP   (duration    56ms) [ OK ]
Time: 46 seconds

[环境信息]:
复现环境:
anck 5.10 x86 物理机

复现概率:
必现

内核信息:
# uname -r
5.10.134-9.git.05dbf5c52.an8.x86_64

操作系统信息:
# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.8"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.8"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.8"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"

cpu信息:
# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              24
On-line CPU(s) list: 0-23
Thread(s) per core:  2
Core(s) per socket:  12
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
BIOS Vendor ID:      Intel
CPU family:          6
Model:               63
Model name:          Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
BIOS Model name:     Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Stepping:            2
CPU MHz:             2292.150
CPU max MHz:         2500.0000
CPU min MHz:         1200.0000
BogoMIPS:            4988.74
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            30720K
NUMA node0 CPU(s):   0-23
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts md_clear flush_l1d

内存信息:
# free -h
              total        used        free      shared  buff/cache   available
Mem:           62Gi       2.3Gi        57Gi       186Mi       3.4Gi        59Gi
Swap:         2.0Gi        88Mi       1.9Gi

[复现步骤]:
下载当前内核对应的kernel源码包
rpm -ivh xxx.src.rpm  默认安装到/root下
yum-builddep -y rpmbuild/SPECS/kernel.spec   自动安装前置依赖包,需要yum-utils
rpmbuild -bp ./rpmbuild/SPECS/kernel.spec   # 这个步骤会打相关的patch, 解压缩tar包,生成BUILD目录
cd rpmbuild/BUILD/kernel-xxx/linux-xxx/  
cd  /tools/testing/selftests/net/mptcp
make

执行测试用例
./mptcp_connect.sh

[期望结果]:
用例pass

[实际结果]:
用例fail

[原因分析]:

上游也存在相似的问题
https://github.com/multipath-tcp/mptcp_net-next/issues/230
Comment 1 shanxifanshi alibaba_cloud_group 2023-06-09 09:35:10 UTC
本地合入下面这个patch后,在同一个物理机上测试30次均pass

https://patchwork.kernel.org/project/mptcp/patch/20221219075048.255811-6-imagedong@tencent.com/
Comment 2 yunmeng365524 2023-06-19 21:42:25 UTC
问题明显,且已经找到相似patch,请开发同学帮忙确认。
Comment 3 小龙 admin 2023-06-25 10:34:09 UTC
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/1781
Comment 4 宁畅 alibaba_cloud_group 2023-06-25 16:54:38 UTC
已经修复。PR: https://gitee.com/anolis/cloud-kernel/pulls/1781
Comment 5 shanxifanshi alibaba_cloud_group 2023-07-10 13:48:43 UTC
在rc3内核的物理机执行30次均pass,问题解决,bug关闭

# uname -r
5.10.134-15_rc3.an8.x86_64

###########Begin 30 test###############
INFO: set ns3-64ab824a-xuw1HF dev ns3eth2: ethtool -K tso off gso off gro off
INFO: set ns4-64ab824a-xuw1HF dev ns4eth3: ethtool -K  gso off gro off
Created /tmp/tmp.6Ft6UUKr5s (size 5508124       /tmp/tmp.6Ft6UUKr5s) containing data sent by client
Created /tmp/tmp.ygsMekJDZv (size 8003612       /tmp/tmp.ygsMekJDZv) containing data sent by server
New MPTCP socket can be blocked via sysctl              [ OK ]
setsockopt(..., TCP_ULP, "mptcp", ...) blocked  [ OK ]
INFO: validating network environment with pings
INFO: Using loss of 0.32% delay 13 ms reorder 95% 58% with delay 3ms on ns3eth4
ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP   (duration    45ms) [ OK ]
ns1 MPTCP -> ns1 (10.0.1.1:10001      ) TCP     (duration    48ms) [ OK ]
ns1 TCP   -> ns1 (10.0.1.1:10002      ) MPTCP   (duration    41ms) [ OK ]
ns1 MPTCP -> ns1 (dead:beef:1::1:10003) MPTCP   (duration    40ms) [ OK ]
ns1 MPTCP -> ns1 (dead:beef:1::1:10004) TCP     (duration    53ms) [ OK ]
ns1 TCP   -> ns1 (dead:beef:1::1:10005) MPTCP   (duration    63ms) [ OK ]
ns1 MPTCP -> ns2 (10.0.1.2:10006      ) MPTCP   (duration    38ms) [ OK ]
ns1 MPTCP -> ns2 (dead:beef:1::2:10007) MPTCP   (duration    57ms) [ OK ]
ns1 MPTCP -> ns2 (10.0.2.1:10008      ) MPTCP   (duration    67ms) [ OK ]
ns1 MPTCP -> ns2 (dead:beef:2::1:10009) MPTCP   (duration    55ms) [ OK ]
ns1 MPTCP -> ns3 (10.0.2.2:10010      ) MPTCP   (duration   240ms) [ OK ]
ns1 MPTCP -> ns3 (dead:beef:2::2:10011) MPTCP   (duration   372ms) [ OK ]
ns1 MPTCP -> ns3 (10.0.3.2:10012      ) MPTCP   (duration   482ms) [ OK ]
ns1 MPTCP -> ns3 (dead:beef:3::2:10013) MPTCP   (duration   226ms) [ OK ]
ns1 MPTCP -> ns4 (10.0.3.1:10014      ) MPTCP   (duration   309ms) [ OK ]
ns1 MPTCP -> ns4 (dead:beef:3::1:10015) MPTCP   (duration  1945ms) [ OK ]
ns2 MPTCP -> ns1 (10.0.1.1:10016      ) MPTCP   (duration    43ms) [ OK ]
ns2 MPTCP -> ns1 (dead:beef:1::1:10017) MPTCP   (duration    43ms) [ OK ]
ns2 MPTCP -> ns3 (10.0.2.2:10018      ) MPTCP   (duration  1398ms) [ OK ]
ns2 MPTCP -> ns3 (dead:beef:2::2:10019) MPTCP   (duration  1714ms) [ OK ]
ns2 MPTCP -> ns3 (10.0.3.2:10020      ) MPTCP   (duration   234ms) [ OK ]
ns2 MPTCP -> ns3 (dead:beef:3::2:10021) MPTCP   (duration   287ms) [ OK ]
ns2 MPTCP -> ns4 (10.0.3.1:10022      ) MPTCP   (duration   209ms) [ OK ]
ns2 MPTCP -> ns4 (dead:beef:3::1:10023) MPTCP   (duration   224ms) [ OK ]
ns3 MPTCP -> ns1 (10.0.1.1:10024      ) MPTCP   (duration   269ms) [ OK ]
ns3 MPTCP -> ns1 (dead:beef:1::1:10025) MPTCP   (duration  1009ms) [ OK ]
ns3 MPTCP -> ns2 (10.0.1.2:10026      ) MPTCP   (duration   219ms) [ OK ]
ns3 MPTCP -> ns2 (dead:beef:1::2:10027) MPTCP   (duration   446ms) [ OK ]
ns3 MPTCP -> ns2 (10.0.2.1:10028      ) MPTCP   (duration   751ms) [ OK ]
ns3 MPTCP -> ns2 (dead:beef:2::1:10029) MPTCP   (duration  2206ms) [ OK ]
ns3 MPTCP -> ns4 (10.0.3.1:10030      ) MPTCP   (duration    44ms) [ OK ]
ns3 MPTCP -> ns4 (dead:beef:3::1:10031) MPTCP   (duration    44ms) [ OK ]
ns4 MPTCP -> ns1 (10.0.1.1:10032      ) MPTCP   (duration   189ms) [ OK ]
ns4 MPTCP -> ns1 (dead:beef:1::1:10033) MPTCP   (duration   367ms) [ OK ]
ns4 MPTCP -> ns2 (10.0.1.2:10034      ) MPTCP   (duration  2534ms) [ OK ]
ns4 MPTCP -> ns2 (dead:beef:1::2:10035) MPTCP   (duration  2108ms) [ OK ]
ns4 MPTCP -> ns2 (10.0.2.1:10036      ) MPTCP   (duration  1959ms) [ OK ]
ns4 MPTCP -> ns2 (dead:beef:2::1:10037) MPTCP   (duration  1990ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.2.2:10038      ) MPTCP   (duration    46ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:2::2:10039) MPTCP   (duration    43ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.3.2:10040      ) MPTCP   (duration    43ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10041) MPTCP   (duration    44ms) [ OK ]
Time: 28 seconds
###########End 30 test#################
Comment 6 shanxifanshi alibaba_cloud_group 2023-07-10 13:49:25 UTC
见上面评论,验证OK,关闭bug