Bug 816 - [ANCK 4.19-devel]rpc bug:sunrpc.tcp_max_slot_table_entries值较低时,kill xprt task大概率导致nfs 夯住
Summary: [ANCK 4.19-devel]rpc bug:sunrpc.tcp_max_slot_table_entries值较低时,kill xprt task...
Status: CONFIRMED
Alias: None
Product: ANCK 4.19 Dev
Classification: ANCK
Component: net (show other bugs) net
Version: 4.19-025.x
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: XuanZhuo
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-04-07 15:37 UTC by Bixuan Cui
Modified: 2022-04-07 16:29 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bixuan Cui 2022-04-07 15:37:24 UTC
Description of problem:
sunrpc.tcp_max_slot_table_entries值较低时,kill xprt task大概率导致nfs夯住

Version-Release number of selected component (if applicable):
4.19.91-25.1.al7.x86_64

How reproducible:


Steps to Reproduce:
1.
echo "options sunrpc tcp_slot_table_entries=2" >> /etc/modprobe.d/sunrpc.conf
echo "options sunrpc tcp_max_slot_table_entries=2" >>  /etc/modprobe.d/sunrpc.conf
and reboot

2.remount nfsv3: sudo mount -t nfs -o vers=3,nolock,proto=tcp 127.0.0.1:/root/nfs_server/ /root/nfs_client/

3.Run case
# cat case.sh 
while true
do
    ls &
done &

while true
do
    killall ls
done &

[root@cuibixuan nfs_client]# cd /root/nfs_client/; sh case.sh

4. Run 'killall sh' to kill case after waiting 10 seconds

5. The nfs hang and xprt->state is 0x212(XPRT_CONGESTED):
[root@cuibixuan sunrpc]# cat /sys/kernel/debug/sunrpc/rpc_xprt/4/info 
netid: tcp
addr:  127.0.0.1
port:  2049
state: 0x212

[root@cuibixuan sunrpc]# cat /sys/kernel/debug/sunrpc/rpc_clnt/4/tasks 
  155 0080    -11 0x4 0x0        0 rpc_default_ops [sunrpc] nfsv3 GETATTR a:call_reserveresult [sunrpc] q:xprt_backlog
15418 0080    -11 0x4 0x0        0 rpc_default_ops [sunrpc] nfsv3 GETATTR a:call_reserveresult [sunrpc] q:xprt_backlog

6. Enable ftrace
 # echo 1 > /sys/kernel/debug/tracing/events/sunrpc/enable
 # cat /sys/kernel/debug/tracing/trace
	      ...
              ls-18792 [001] ....   668.541062: rpc_task_run_action: task:28885@4 flags=0080 state=0005 status=0 action=call_start [sunrpc]
              ls-18792 [001] ....   668.541063: rpc_request: task:28885@4 nfsv3 GETATTR (sync)
              ls-18792 [001] ....   668.541063: rpc_task_run_action: task:28885@4 flags=0080 state=0005 status=0 action=call_reserve [sunrpc]
              ls-18792 [001] ....   668.541065: rpc_task_sleep: task:28885@4 flags=0080 state=0005 status=-11 timeout=0 queue=xprt_backlog
              ls-18792 [000] ....   670.815066: rpc_task_wakeup: task:28885@4 flags=0180 state=0006 status=-512 timeout=0 queue=xprt_backlog
              ls-18792 [000] ....   670.815069: rpc_task_run_action: task:28885@4 flags=0180 state=0005 status=-512 action=rpc_exit_task [sunrpc]
              ls-18846 [001] ....   682.302970: rpc_task_begin: task:28886@4 flags=0080 state=0004 status=0 action=          (null)
              ls-18846 [001] ....   682.302972: rpc_task_run_action: task:28886@4 flags=0080 state=0005 status=0 action=call_start [sunrpc]
              ls-18846 [001] ....   682.302973: rpc_request: task:28886@4 nfsv3 GETATTR (sync)
              ls-18846 [001] ....   682.302973: rpc_task_run_action: task:28886@4 flags=0080 state=0005 status=0 action=call_reserve [sunrpc]
              ls-18846 [001] ....   682.302974: rpc_task_sleep: task:28886@4 flags=0080 state=0005 status=-11 timeout=0 queue=xprt_backlog
              ls-18846 [000] ....   684.690924: rpc_task_wakeup: task:28886@4 flags=0180 state=0006 status=-512 timeout=0 queue=xprt_backlog
              ls-18846 [000] ....   684.690928: rpc_task_run_action: task:28886@4 flags=0180 state=0005 status=-512 action=rpc_exit_task [sunrpc]

Actual results:
nfs夯住,命令卡住。
xprt->state is 0x212(XPRT_CONGESTED):
[root@cuibixuan sunrpc]# cat /sys/kernel/debug/sunrpc/rpc_xprt/4/info 
netid: tcp
addr:  127.0.0.1
port:  2049
state: 0x212

[root@cuibixuan sunrpc]# cat /sys/kernel/debug/sunrpc/rpc_clnt/4/tasks 
  155 0080    -11 0x4 0x0        0 rpc_default_ops [sunrpc] nfsv3 GETATTR a:call_reserveresult [sunrpc] q:xprt_backlog
15418 0080    -11 0x4 0x0        0 rpc_default_ops [sunrpc] nfsv3 GETATTR a:call_reserveresult [sunrpc] q:xprt_backlog


Expected results:
nfs不会夯住,任务正常退出

Additional info:
Comment 1 Bixuan Cui 2022-04-07 16:29:51 UTC
社区高版本补丁
e877a88d1f069edced4160792f42c2a8e2dba942 SUNRPC in case of backlog, hand free slots directly to waiting task
e86be3a04bc4aeaf12f93af35f08f8d4385bcd98 SUNRPC: More fixes for backlog congestion
可解决此问题,已在4.19.91-25.1.al7.x86_64验证。