Bug 11530 - mellanox bonding failover fail with Loss of RoCE connection
Summary: mellanox bonding failover fail with Loss of RoCE connection
Status: NEW
Alias: None
Product: ANCK 6.6 Dev
Classification: ANCK
Component: drivers (show other bugs) drivers
Version: 6.6.y-3
Hardware: All Linux
: P3-Medium S2-major
Target Milestone: ---
Assignee: GuixinLiu
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-10-25 14:38 UTC by wangchuanguo_lc
Modified: 2024-10-25 14:39 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description wangchuanguo_lc inspur_group 2024-10-25 14:38:02 UTC
Description of problem:
mlx CX5 NIC(2 interface, bonding together) connect with another mlx CX5 NIC through a Fibre Channel switch, ib_send_bw client would fail when bonding failover.

Version-Release number of selected component (if applicable):
the mlx module in 4.19/5.10/6.6/6.12 all have this issue.

How reproducible:
server:
modprobe bonding
echo "+bond0" > /sys/class/net/bonding_masters
echo 1 > /sys/class/net/bond0/bonding/mode
echo 100 > /sys/class/net/bond0/bonding/miimon
ip link set bond0 up
ifenslave bond0 enP3p1s0f0 enP3p1s0f1
ip addr add 5.5.5.1/24 dev bond0
#ib server
ib_send_bw -n 1000000


client:
modprobe bonding
echo "+bond0" > /sys/class/net/bonding_masters
echo 1 > /sys/class/net/bond0/bonding/mode
echo 100 > /sys/class/net/bond0/bonding/miimon
ip link set bond0 up
ifenslave bond0 enP3p1s0f0 enP3p1s0f1
ip addr add 5.5.5.2/24 dev bond0
#ib client
ib_send_bw 5.5.5.1 -n 1000000

#Simulate a link disconnection by bringing down the active slave network interface on the client side.
ip link set enP3p1s0f0 down

ib_send_bw 5.5.5.1 -n 1000000 fail:
Completion with error at client
Failed status 12:wr_id 0 syndrom 0x81
scnt=40053 ccnt=39925
Comment 1 wangchuanguo_lc inspur_group 2024-10-25 14:39:48 UTC
The official driver 5.8-LTS does not have this problem