Description of problem: 5.10内核跑ltp测试时出现软锁问题 Call Trace: wake_up_page_bit+0x8a/0x110 iomap_finish_ioend+0xd7/0x1c0 iomap_finish_ioends+0x7f/0xb0 xfs_end_ioend+0x6b/0x100 [xfs] xfs_end_io+0xb9/0xe0 [xfs] process_one_work+0x1a7/0x360 worker_thread+0x1fa/0x390 kthread+0x116/0x130 ret_from_fork+0x35/0x40
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/2222
(In reply to ljubomir from comment #0) > Description of problem: > > 5.10内核跑ltp测试时出现软锁问题 > Call Trace: > wake_up_page_bit+0x8a/0x110 > iomap_finish_ioend+0xd7/0x1c0 > iomap_finish_ioends+0x7f/0xb0 > xfs_end_ioend+0x6b/0x100 [xfs] > xfs_end_io+0xb9/0xe0 [xfs] > process_one_work+0x1a7/0x360 > worker_thread+0x1fa/0x390 > kthread+0x116/0x130 > ret_from_fork+0x35/0x40 Could you please provide more details? Such as kernel version, the specific ltp case and run log, etc.
(In reply to josephqi from comment #2) > (In reply to ljubomir from comment #0) > > Description of problem: > > > > 5.10内核跑ltp测试时出现软锁问题 > > Call Trace: > > wake_up_page_bit+0x8a/0x110 > > iomap_finish_ioend+0xd7/0x1c0 > > iomap_finish_ioends+0x7f/0xb0 > > xfs_end_ioend+0x6b/0x100 [xfs] > > xfs_end_io+0xb9/0xe0 [xfs] > > process_one_work+0x1a7/0x360 > > worker_thread+0x1fa/0x390 > > kthread+0x116/0x130 > > ret_from_fork+0x35/0x40 > > Could you please provide more details? > Such as kernel version, the specific ltp case and run log, etc. 内核版本5.10.134-15.2 测试脚本 #!/bin/bash /opt/ltp/runltp -f dio -l /home/dio.log -d /tmp/ -o /home/dio.out.log -t 7d & /opt/ltp/runltp -f fs -l /home/fs.log -d /tmp/ -o /home/fs.out.log -t 7d & 实际结果 Sep 21 23:09:49 localhost kernel: watchdog: BUG: soft lockup - CPU#68 stuck for 22s! [kworker/68:2:1114440] Sep 21 23:09:49 localhost kernel: Modules linked in: nfsv3 nfs_acl nfs lockd grace nfs_ssc fscache isofs cdrom binfmt_misc tun brd overlay vfat fat loop xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support iax_crypto intel_pmt_crashlog intel_pmt_telemetry intel_pmt_class ipmi_ssif kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate joydev mousedev mei_me idxd i2c_i801 isst_if_mbox_pci isst_if_mmio intel_uncore pcspkr isst_if_common intel_pmt idxd_bus i2c_smbus mei i2c_ismt acpi_ipmi ipmi_si acpi_pad acpi_power_meter xfs libcrc32c ast sd_mod drm_vram_helper sg Sep 21 23:09:49 localhost kernel: drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper ttm ahci igb nvme libahci dca drm nvme_core crc32c_intel i2c_algo_bit t10_pi libata i2c_core wmi dm_mirror dm_region_hash dm_log dm_mod fuse ipmi_devintf ipmi_msghandler [last unloaded: init_module] Sep 21 23:09:49 localhost kernel: CPU: 68 PID: 1114440 Comm: kworker/68:2 Kdump: loaded Tainted: G S W OEL 5.10.134-15.2.kos5.x86_64 #1 Sep 21 23:09:49 localhost kernel: Hardware name: Inspur NF5280-M7-A0-R0-00/NF5280-M7-A0-R0-00, BIOS 05.07.00 07/14/2023 Sep 21 23:09:49 localhost kernel: Workqueue: xfs-conv/dm-0 xfs_end_io [xfs] Sep 21 23:09:49 localhost kernel: RIP: 0010:_raw_spin_unlock_irqrestore+0xa/0x10 Sep 21 23:09:49 localhost kernel: Code: f0 0f b1 17 74 07 31 c0 c3 cc cc cc cc b8 01 00 00 00 c3 cc cc cc cc 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 c6 07 00 56 9d <c3> cc cc cc cc 90 0f 1f 44 00 00 8b 07 a9 ff 01 00 00 75 22 ba 00 Sep 21 23:09:49 localhost kernel: RSP: 0018:ffffbff6cfd63d10 EFLAGS: 00000202 Sep 21 23:09:49 localhost kernel: RAX: 0000000000000001 RBX: ffffffff94a07d00 RCX: dead000000000122 Sep 21 23:09:49 localhost kernel: RDX: ffffbff6f46bbd80 RSI: 0000000000000202 RDI: ffffffff94a07d00 Sep 21 23:09:49 localhost kernel: RBP: 0000000000000202 R08: ffffbff6f46bbd80 R09: 00000000000347c0 Sep 21 23:09:49 localhost kernel: R10: 00007ff152559eb3 R11: 0000000000000008 R12: 0000000000000180 Sep 21 23:09:49 localhost kernel: R13: ffffe01cc90b3bc0 R14: 0000000000001000 R15: 0000000000000000 Sep 21 23:09:49 localhost kernel: FS: 0000000000000000(0000) GS:ffffa0e9ffe00000(0000) knlGS:0000000000000000 Sep 21 23:09:49 localhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 21 23:09:49 localhost kernel: CR2: 000000000183d000 CR3: 00000001cf872002 CR4: 0000000002770ee0 Sep 21 23:09:49 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 21 23:09:49 localhost kernel: DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 Sep 21 23:09:49 localhost kernel: PKRU: 55555554 Sep 21 23:09:49 localhost kernel: Call Trace: Sep 21 23:09:49 localhost kernel: wake_up_page_bit+0x7a/0xe0 Sep 21 23:09:49 localhost kernel: end_page_writeback+0x7f/0x1f0 Sep 21 23:09:49 localhost kernel: iomap_finish_ioend+0x169/0x320 Sep 21 23:09:49 localhost kernel: iomap_finish_ioends+0x6f/0x90 Sep 21 23:09:49 localhost kernel: xfs_end_ioend+0x5d/0x160 [xfs] Sep 21 23:09:49 localhost kernel: ? xfs_setfilesize_ioend.constprop.12+0x50/0x50 [xfs] Sep 21 23:09:49 localhost kernel: xfs_end_io+0xa9/0xc0 [xfs] Sep 21 23:09:49 localhost kernel: process_one_work+0x19b/0x340 Sep 21 23:09:49 localhost kernel: ? process_one_work+0x340/0x340 Sep 21 23:09:49 localhost kernel: worker_thread+0x30/0x370 Sep 21 23:09:49 localhost kernel: ? process_one_work+0x340/0x340 Sep 21 23:09:49 localhost kernel: kthread+0x114/0x130 Sep 21 23:09:49 localhost kernel: ? __kthread_cancel_work+0x50/0x50 Sep 21 23:09:49 localhost kernel: ret_from_fork+0x1f/0x30 Sep 22 23:13:08 localhost kernel: Command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.10.134-15.2.kos5.x86_64 root=/dev/mapper/keyarchos00-root ro resume=/dev/mapper/keyarchos00-swap rd.lvm.lv=keyarchos00/root rd.lvm.lv=keyarchos00/swap rhgb quiet cgroup.memory=nokmem crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M
社区主线的补丁: https://github.com/torvalds/linux/commit/ebb7fb1557b1d03b906b668aa2164b51e6b7d19a 该补丁基于 folio,但 5.10 并没有支持 folio,因此需要基于该代码做一定的适配。