Bug 27529 - 海光cpu上由于pcie保序造成的网络硬件带宽瓶颈
Summary: 海光cpu上由于pcie保序造成的网络硬件带宽瓶颈
Status: NEW
Alias: None
Product: ANCK 5.10 Dev
Classification: ANCK
Component: net (show other bugs) net
Version: unspecified
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: XuanZhuo
QA Contact:
URL:
Whiteboard:
Keywords: Performance
Depends on:
Blocks:
 
Reported: 2025-12-05 14:46 UTC by deven_zhu
Modified: 2025-12-05 17:00 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description deven_zhu 2025-12-05 14:46:34 UTC
在海光平台上,当业务运行所在的numa node和网卡所在的numa node不一致时,业务的发包和网卡收包流程并发时会发生严重的pcie保序问题,这是由于网卡的DMA read 操作与 DMA write 操作的内存分布在不同的 numa 节点,PCIe保序机制会强制两个操作相互等待、串行执行,因此在该场景下会导致网络带宽下降严重,网络性能大幅下降。

测试环境:
  两台机器通过两个物理网卡(100Gb/s)直连,每台机器上的这两个直连物理网卡组成一个mode=4的bonding网卡,这两个物理网卡都在numa node5上

测试命令:
  numactl -N 4 iperf -c 10.1.0.13 -i 5 -t 1000 -p 6001 -P 16

测试结果:
  当只有一台机器发送数据时,bond4网络最大带宽可达176Gb/s
  两台机器互相发送数据时,bond4网络最大带宽只能达到46Gb/s
Comment 1 小龙 admin 2025-12-05 17:00:08 UTC
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/6119