Bug 8557 - net/smc: hung in epoll_wait for EPOLL_OUT event when sndbuffer have spare space
Summary: net/smc: hung in epoll_wait for EPOLL_OUT event when sndbuffer have spare space
Status: NEW
Alias: None
Product: ANCK 5.10 Dev
Classification: ANCK
Component: net (show other bugs) net
Version: 5.10.y-16
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: XuanZhuo
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-03-16 12:15 UTC by wangguangguan
Modified: 2024-04-08 13:02 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description wangguangguan alibaba_cloud_group 2024-03-16 12:15:56 UTC
Description of problem:

open messaging benchmark + kafka + SMC-R测试出现epoll wait无法唤醒的情况。

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 小龙 admin 2024-03-16 12:27:55 UTC
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/2894
Comment 2 dust.li alibaba_cloud_group 2024-04-08 13:02:29 UTC
(In reply to wangguangguan from comment #0)
> Description of problem:
> 
> open messaging benchmark + kafka + SMC-R测试出现epoll wait无法唤醒的情况。
> 
> Version-Release number of selected component (if applicable):
> 
> 
> How reproducible:
> 
> 
> Steps to Reproduce:
> 1.
> 2.
> 3.
> 
> Actual results:
> 
> 
> Expected results:
> 
> 
> Additional info:

该问题触发条件smc_tx_sendmsg发生缓冲区满,且依赖epoll_out唤醒再次发包,且只有一个inflght io。尤其是发送数据大于发送缓冲区的大小的情况下出现概率高。测试case中kafka每次发送>= 256K数据,发送缓冲区大小为128K,每次发生都能把缓冲区填满然后等待下一个epoll_out,因此能必现此问题。