Bug 6665 - [ANCK 5.10]LTP测试出现软锁
Summary: [ANCK 5.10]LTP测试出现软锁
Status: NEW
Alias: None
Product: ANCK 5.10 Dev
Classification: ANCK
Component: fs (show other bugs) fs
Version: 5.10.y-15
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: Ferry Meng
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-09-21 16:56 UTC by ljubomir
Modified: 2024-04-28 09:47 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ljubomir inspur_group 2023-09-21 16:56:59 UTC
Description of problem:

5.10内核跑ltp测试时出现软锁问题
     Call Trace:
      wake_up_page_bit+0x8a/0x110
      iomap_finish_ioend+0xd7/0x1c0
      iomap_finish_ioends+0x7f/0xb0
      xfs_end_ioend+0x6b/0x100 [xfs]
      xfs_end_io+0xb9/0xe0 [xfs]
      process_one_work+0x1a7/0x360
      worker_thread+0x1fa/0x390
      kthread+0x116/0x130
      ret_from_fork+0x35/0x40
Comment 1 小龙 admin 2023-09-21 17:01:31 UTC
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/2222
Comment 2 Joseph Qi alibaba_cloud_group 2023-09-21 17:02:36 UTC
(In reply to ljubomir from comment #0)
> Description of problem:
> 
> 5.10内核跑ltp测试时出现软锁问题
>      Call Trace:
>       wake_up_page_bit+0x8a/0x110
>       iomap_finish_ioend+0xd7/0x1c0
>       iomap_finish_ioends+0x7f/0xb0
>       xfs_end_ioend+0x6b/0x100 [xfs]
>       xfs_end_io+0xb9/0xe0 [xfs]
>       process_one_work+0x1a7/0x360
>       worker_thread+0x1fa/0x390
>       kthread+0x116/0x130
>       ret_from_fork+0x35/0x40

Could you please provide more details?
Such as kernel version, the specific ltp case and run log, etc.
Comment 3 ljubomir inspur_group 2023-09-21 18:02:38 UTC
(In reply to josephqi from comment #2)
> (In reply to ljubomir from comment #0)
> > Description of problem:
> > 
> > 5.10内核跑ltp测试时出现软锁问题
> >      Call Trace:
> >       wake_up_page_bit+0x8a/0x110
> >       iomap_finish_ioend+0xd7/0x1c0
> >       iomap_finish_ioends+0x7f/0xb0
> >       xfs_end_ioend+0x6b/0x100 [xfs]
> >       xfs_end_io+0xb9/0xe0 [xfs]
> >       process_one_work+0x1a7/0x360
> >       worker_thread+0x1fa/0x390
> >       kthread+0x116/0x130
> >       ret_from_fork+0x35/0x40
> 
> Could you please provide more details?
> Such as kernel version, the specific ltp case and run log, etc.

内核版本5.10.134-15.2
测试脚本
#!/bin/bash

/opt/ltp/runltp -f dio -l /home/dio.log -d /tmp/ -o /home/dio.out.log -t 7d &
/opt/ltp/runltp -f fs -l /home/fs.log -d /tmp/ -o /home/fs.out.log -t 7d &

实际结果

Sep 21 23:09:49 localhost kernel: watchdog: BUG: soft lockup - CPU#68 stuck for 22s! [kworker/68:2:1114440]
Sep 21 23:09:49 localhost kernel: Modules linked in: nfsv3 nfs_acl nfs lockd grace nfs_ssc fscache isofs cdrom binfmt_misc tun brd overlay vfat fat loop xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support iax_crypto intel_pmt_crashlog intel_pmt_telemetry intel_pmt_class ipmi_ssif kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate joydev mousedev mei_me idxd i2c_i801 isst_if_mbox_pci isst_if_mmio intel_uncore pcspkr isst_if_common intel_pmt idxd_bus i2c_smbus mei i2c_ismt acpi_ipmi ipmi_si acpi_pad acpi_power_meter xfs libcrc32c ast sd_mod drm_vram_helper sg
Sep 21 23:09:49 localhost kernel: drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper ttm ahci igb nvme libahci dca drm nvme_core crc32c_intel i2c_algo_bit t10_pi libata i2c_core wmi dm_mirror dm_region_hash dm_log dm_mod fuse ipmi_devintf ipmi_msghandler [last unloaded: init_module]
Sep 21 23:09:49 localhost kernel: CPU: 68 PID: 1114440 Comm: kworker/68:2 Kdump: loaded Tainted: G S      W  OEL    5.10.134-15.2.kos5.x86_64 #1
Sep 21 23:09:49 localhost kernel: Hardware name: Inspur NF5280-M7-A0-R0-00/NF5280-M7-A0-R0-00, BIOS 05.07.00 07/14/2023
Sep 21 23:09:49 localhost kernel: Workqueue: xfs-conv/dm-0 xfs_end_io [xfs]
Sep 21 23:09:49 localhost kernel: RIP: 0010:_raw_spin_unlock_irqrestore+0xa/0x10
Sep 21 23:09:49 localhost kernel: Code: f0 0f b1 17 74 07 31 c0 c3 cc cc cc cc b8 01 00 00 00 c3 cc cc cc cc 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 c6 07 00 56 9d <c3> cc cc cc cc 90 0f 1f 44 00 00 8b 07 a9 ff 01 00 00 75 22 ba 00
Sep 21 23:09:49 localhost kernel: RSP: 0018:ffffbff6cfd63d10 EFLAGS: 00000202
Sep 21 23:09:49 localhost kernel: RAX: 0000000000000001 RBX: ffffffff94a07d00 RCX: dead000000000122
Sep 21 23:09:49 localhost kernel: RDX: ffffbff6f46bbd80 RSI: 0000000000000202 RDI: ffffffff94a07d00
Sep 21 23:09:49 localhost kernel: RBP: 0000000000000202 R08: ffffbff6f46bbd80 R09: 00000000000347c0
Sep 21 23:09:49 localhost kernel: R10: 00007ff152559eb3 R11: 0000000000000008 R12: 0000000000000180
Sep 21 23:09:49 localhost kernel: R13: ffffe01cc90b3bc0 R14: 0000000000001000 R15: 0000000000000000
Sep 21 23:09:49 localhost kernel: FS:  0000000000000000(0000) GS:ffffa0e9ffe00000(0000) knlGS:0000000000000000
Sep 21 23:09:49 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 21 23:09:49 localhost kernel: CR2: 000000000183d000 CR3: 00000001cf872002 CR4: 0000000002770ee0
Sep 21 23:09:49 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 21 23:09:49 localhost kernel: DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
Sep 21 23:09:49 localhost kernel: PKRU: 55555554
Sep 21 23:09:49 localhost kernel: Call Trace:
Sep 21 23:09:49 localhost kernel: wake_up_page_bit+0x7a/0xe0
Sep 21 23:09:49 localhost kernel: end_page_writeback+0x7f/0x1f0
Sep 21 23:09:49 localhost kernel: iomap_finish_ioend+0x169/0x320
Sep 21 23:09:49 localhost kernel: iomap_finish_ioends+0x6f/0x90
Sep 21 23:09:49 localhost kernel: xfs_end_ioend+0x5d/0x160 [xfs]
Sep 21 23:09:49 localhost kernel: ? xfs_setfilesize_ioend.constprop.12+0x50/0x50 [xfs]
Sep 21 23:09:49 localhost kernel: xfs_end_io+0xa9/0xc0 [xfs]
Sep 21 23:09:49 localhost kernel: process_one_work+0x19b/0x340
Sep 21 23:09:49 localhost kernel: ? process_one_work+0x340/0x340
Sep 21 23:09:49 localhost kernel: worker_thread+0x30/0x370
Sep 21 23:09:49 localhost kernel: ? process_one_work+0x340/0x340
Sep 21 23:09:49 localhost kernel: kthread+0x114/0x130
Sep 21 23:09:49 localhost kernel: ? __kthread_cancel_work+0x50/0x50
Sep 21 23:09:49 localhost kernel: ret_from_fork+0x1f/0x30
Sep 22 23:13:08 localhost kernel: Command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.10.134-15.2.kos5.x86_64 root=/dev/mapper/keyarchos00-root ro resume=/dev/mapper/keyarchos00-swap rd.lvm.lv=keyarchos00/root rd.lvm.lv=keyarchos00/swap rhgb quiet cgroup.memory=nokmem crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M
Comment 4 Joseph Qi alibaba_cloud_group 2024-04-28 09:47:10 UTC
社区主线的补丁:
https://github.com/torvalds/linux/commit/ebb7fb1557b1d03b906b668aa2164b51e6b7d19a

该补丁基于 folio,但 5.10 并没有支持 folio,因此需要基于该代码做一定的适配。