Bug 19213 - 机器运行一段时间,就会出现很多D状态进程,重启机器就好了,检测硬件没问题
Summary: 机器运行一段时间,就会出现很多D状态进程,重启机器就好了,检测硬件没问题
Status: NEW
Alias: None
Product: ANCK 5.10 Dev
Classification: ANCK
Component: fs (show other bugs) fs
Version: 5.10.y-14
Hardware: x86_64 Linux
: P3-Medium S2-major
Target Milestone: ---
Assignee: Ferry Meng
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-03-05 11:42 UTC by jinriyang
Modified: 2025-03-05 13:14 UTC (History)
1 user (show)

See Also:


Attachments
所有D的进程堆栈 (1.43 MB, text/plain)
2025-03-05 11:42 UTC, jinriyang
Details

Note You need to log in before you can comment on or make changes to this bug.
Description jinriyang 2025-03-05 11:42:56 UTC
Created attachment 1319 [details]
所有D的进程堆栈

机器是用的k8s + containerd 跑的一些爬虫类业务,跑一段时间,机器就会很多D进程;
ls  /var/lib/containerd/ 的目录就会卡死;
查看xfsaild/dm-2 进程状态为D

内核版本:
5.10.134-14.an8.x86_64

所有D的进程堆栈在补充文件里


间隔30秒 cat /sys/fs/xfs/dm-2/stats/stats 数据如下:

[root@vpcnode1060 jinriyang]# cat /sys/fs/xfs/dm-2/stats/stats
extent_alloc 1160856387 2627073013 2274315025 2601590680
abt 0 0 0 0
blk_map 2754482717 3544299085 1558139656 1913385607 1111235633 2472686723 0
bmbt 0 0 0 0
dir 4292998382 888169519 885812514 1216879605
trans 4 1769964795 61
ig 821606542 755217349 790 66389193 0 64051311 1140939262
log 628940541 2635169629 13059130 262749769 258903716
push_ail 2957751021 27864 27544253 240051019 0 69331474 905355824 249707388 0 9701887
xstrat 934207619 0
rw 4278499844 741434797
attr 2321151429 18180488 6900 39576
icluster 0 203207093 630296879
vnodes 2337882 0 0 0 819266444 819266355 819266444 0
buf 3354229397 1249135975 2105093425 414525473 169560402 1249135972 0 1250048397 704
abtb2 862891831 85690381 874253348 874149785 23478 23474 62240870 21392868 19571227 24740689 218631 218275 242109 241749 1283990052
abtc2 327573388 3309870971 2606281968 2606183661 28753 28749 37859706 7295242 11439338 15393769 166838 166463 195591 195212 4122066812
bmbt2 53329022 380778920 20386537 21939183 0 0 260868 266153 259589 260714 5518 5459 5518 5459 113373479
ibt2 1641022425 3321918741 152724 89323 4 0 39911 14605 61620 23858 655 320 659 320 30496375
fibt2 2358705300 2391510186 446458702 446503478 73433 73429 21098086 58558 11688676 8876022 775034 774872 848466 848301 427452997
rmapbt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
refcntbt 133804 178496 560 552 0 0 0 0 0 0 0 0 0 0 3645
qm 0 0 0 1 7 0 1 0 0
xpc 15042580467712 13136880428141 17089657278129
defer_relog 15
debug 0
[root@vpcnode1060 jinriyang]# cat /sys/fs/xfs/dm-2/stats/stats
extent_alloc 1160856387 2627073013 2274315025 2601590680
abt 0 0 0 0
blk_map 2754482717 3544299085 1558139656 1913385607 1111235633 2472686723 0
bmbt 0 0 0 0
dir 4292998382 888169519 885812514 1216879605
trans 4 1769964795 61
ig 821606542 755217349 790 66389193 0 64051311 1140939262
log 628940541 2635169629 13059130 262750515 258903716
push_ail 2957751021 27864 27544999 240051019 0 69336696 905425948 249707388 0 9702633
xstrat 934207619 0
rw 4278499844 741434797
attr 2321151460 18180488 6900 39576
icluster 0 203207093 630296879
vnodes 2337882 0 0 0 819266444 819266355 819266444 0
buf 3354229397 1249135975 2105093425 414525473 169560402 1249135972 0 1250048397 704
abtb2 862891831 85690381 874253348 874149785 23478 23474 62240870 21392868 19571227 24740689 218631 218275 242109 241749 1283990052
abtc2 327573388 3309870971 2606281968 2606183661 28753 28749 37859706 7295242 11439338 15393769 166838 166463 195591 195212 4122066812
bmbt2 53329022 380778920 20386537 21939183 0 0 260868 266153 259589 260714 5518 5459 5518 5459 113373479
ibt2 1641022425 3321918741 152724 89323 4 0 39911 14605 61620 23858 655 320 659 320 30496375
fibt2 2358705300 2391510186 446458702 446503478 73433 73429 21098086 58558 11688676 8876022 775034 774872 848466 848301 427452997
rmapbt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
refcntbt 133804 178496 560 552 0 0 0 0 0 0 0 0 0 0 3645
qm 0 0 0 1 7 0 1 0 0
xpc 15042580467712 13136880428141 17089657278129
defer_relog 15
debug 0
Comment 1 gaoxiang alibaba_cloud_group 2025-03-05 13:13:19 UTC
另外之前的dmesg中有xfs的报错么?
目前看起来是log满了但都是pin住的buffer导致log没有drain,但原因未知。
Comment 2 gaoxiang alibaba_cloud_group 2025-03-05 13:14:21 UTC
另外你们只有用5.10.134-14.an8.x86_64版本么?其他更新的5.10版本的内核有没有问题?