Bug 4050 - 【4.19】网络性能测试时相比rhel8内核有性能回退
Summary: 【4.19】网络性能测试时相比rhel8内核有性能回退
Status: RESOLVED FIXED
Alias: None
Product: ANCK 4.19 Dev
Classification: ANCK
Component: net (show other bugs) net
Version: 4.19-026.x
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: XuanZhuo
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-02-13 20:00 UTC by cuishw
Modified: 2023-02-28 19:37 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description cuishw inspur_group 2023-02-13 20:00:57 UTC
Description of problem:

使用4.19内核进行性能测试的过程中发现有性能回退,并且软中断占用CPU高达50%

1、执行mpstat 5 10,可以发现%soft CPU占用高达50%
2、执行perf top,发现queued_spin_lock_slowpath函数执行占70%
3、执行bpftrace -e 'kprobe:queued_spin_lock_slowpath { @[kstack] = count(); }'
,可以从堆栈中发现占用最多的代码路径是packet_rcv+797
4、通过crash /dev/mem /lib/debug/lib/modules/4.19.91-26.4.3.kos5.x86_64/vmlinux,执行dis -s packet_rcv+797,可以发现代码在:
  2125    drop_n_acct:
  2126          is_drop_n_account = true;
  2127          spin_lock(&sk->sk_receive_queue.lock);---->这里
* 2128          po->stats.stats1.tp_drops++;
  2129          atomic_inc(&sk->sk_drops);
  2130          spin_unlock(&sk->sk_receive_queue.lock);
5、当网络丢包时会执行到这里,当丢包比较多时会引起锁竞争,上游已解决这个问题:net/packet: fix overflow in tpacket_rcv
commit 8e8e2951e3095732d7e780c241f61ea130955a57 upstream
Comment 1 maqiao alibaba_cloud_group 2023-02-28 19:37:39 UTC
merged: https://gitee.com/anolis/cloud-kernel/pulls/1184