Description of problem: 集群,包含4个节点,部署本地缓存场景部署为双db模式,某个节点发生coredump [ 1351.279690] nvme0n1: [ 1352.280711] nvme0n1: [ 1352.313713] nvme0n1: [ 1352.394724] nvme0n1: [ 1354.435748] nvme0n1: p1 [ 1356.263495] nvme0n1: p1 [ 1356.291780] nvme0n1: p1 [ 1357.412220] SGI XFS with ACLs, security attributes, no debug enabled [ 1357.418867] XFS (nvme0n1p1): Mounting V5 Filesystem [ 1357.422917] XFS (nvme0n1p1): Ending clean mount [ 1376.667472] XFS (nvme0n1p1): Unmounting Filesystem [ 1376.686368] XFS (nvme0n1p1): Mounting V5 Filesystem [ 1376.689722] XFS (nvme0n1p1): Ending clean mount [ 1378.146472] XFS (nvme0n1p1): Unmounting Filesystem [ 1378.165892] XFS (nvme0n1p1): Mounting V5 Filesystem [ 1378.169507] XFS (nvme0n1p1): Ending clean mount [ 1667.199010] nvme1n1: [ 1667.228571] nvme1n1: p1 [ 1670.348062] md/raid1:md0: not clean -- starting background reconstruction [ 1670.348064] md/raid1:md0: active with 2 out of 2 mirrors [ 1670.348094] md: pers->run() failed ... [ 1670.348111] md: md0 stopped. [ 1670.380945] BUG: Bad page state in process mdadm pfn:283ff24 [ 1670.380994] page:fffff26660ffc900 count:-1 mapcount:0 mapping:0000000000000000 index:0x0 [ 1670.381028] flags: 0x197ffffc0000000() [ 1670.381049] raw: 0197ffffc0000000 dead000000000100 dead000000000200 0000000000000000 [ 1670.381079] raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000 [ 1670.381109] page dumped because: nonzero _refcount [ 1670.381130] Modules linked in: raid1 xfs libcrc32c nft_counter ip_tables nft_compat uio_pci_generic uio nf_tables vfio_pci vfio_virqfd nfnetlink vfio_iommu_type1 vfio irqbypass cuse fuse 8021q garp stp mrp llc pcc_cpufreq bonding amd64_edac_mod edac_mce_amd crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel ses glue_helper pcspkr enclosure ipmi_si ccp k10temp i2c_piix4 ipmi_watchdog sunrpc vfat fat sch_fq_codel knem(OE) sd_mod sg ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci nvme smartpqi(OE) libahci crc32c_intel nvme_core scsi_transport_sas i40e(OE) drm libata i2c_core ngbe(OE) xpmem(OE) ipmi_devintf ipmi_msghandler [ 1670.381177] CPU: 37 PID: 76406 Comm: mdadm Kdump: loaded Tainted: G OE 4.19.91-26.6.5.kos5.x86_64 #1 [ 1670.381178] Hardware name: Inspur AS13000G5-CG12/AS13000G5-CG12, BIOS 5.03.09 2022-10-26 [ 1670.381180] Call Trace: [ 1670.381194] dump_stack+0x66/0x90 [ 1670.381201] bad_page.cold.29+0xa0/0xbe [ 1670.381206] free_pcppages_bulk+0x337/0x770 [ 1670.381213] ? memcg_check_events+0xf0/0x210 [ 1670.381217] ? page_remove_rmap+0xed/0x420 [ 1670.381219] free_unref_page_list+0xf3/0x190 [ 1670.381221] release_pages+0x3a1/0x430 [ 1670.381225] tlb_flush_mmu_free+0x36/0x50 [ 1670.381227] arch_tlb_finish_mmu+0x29/0x50 [ 1670.381229] tlb_finish_mmu+0x1f/0x30 [ 1670.381232] exit_mmap+0xb7/0x150 [ 1670.381236] ? __switch_to_asm+0x35/0x70 [ 1670.381242] mmput+0x53/0x110 [ 1670.381245] do_exit+0x34a/0xba0 [ 1670.381250] ? syscall_trace_enter+0x1c2/0x2a0 [ 1670.381252] do_group_exit+0x33/0xa0 [ 1670.381254] __x64_sys_exit_group+0x14/0x20 [ 1670.381255] do_syscall_64+0x5f/0x1b0 [ 1670.381256] ? prepare_exit_to_usermode+0x4c/0xb0 [ 1670.381258] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1670.381261] RIP: 0033:0x7f49272fa8d6 [ 1670.381266] Code: Bad RIP value. [ 1670.381268] RSP: 002b:00007ffd7c76fb28 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 1670.381269] RAX: ffffffffffffffda RBX: 00007f49275bb860 RCX: 00007f49272fa8d6 [ 1670.381270] RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001 [ 1670.381271] RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffff28 [ 1670.381272] R10: 00007ffd7c76f9a8 R11: 0000000000000246 R12: 00007f49275bb860 [ 1670.381273] R13: 0000000000000001 R14: 00007f49275c4388 R15: 0000000000000000 [ 1670.381275] Disabling lock debugging due to kernel taint [ 1716.554608] BUG: unable to handle kernel paging request at 000000e7000034d4 [ 1716.554657] PGD 0 P4D 0 [ 1716.554675] Oops: 0000 [#1] SMP NOPTI [ 1716.554696] CPU: 36 PID: 77401 Comm: check_sysdisk.p Kdump: loaded Tainted: G B OE 4.19.91-26.6.5.kos5.x86_64 #1 [ 1716.554733] Hardware name: Inspur AS13000G5-CG12/AS13000G5-CG12, BIOS 5.03.09 2022-10-26 [ 1716.554771] RIP: 0010:free_pipe_info+0x55/0x90 [ 1716.554792] Code: 40 85 c0 74 36 31 db 48 63 c3 48 8d 14 80 48 8b 45 78 48 8d 34 d0 48 8b 46 10 48 85 c0 74 14 48 c7 46 10 00 00 00 00 48 89 ef <48> 8b 40 10 e8 42 32 94 00 83 c3 01 39 5d 40 77 cc 48 8b 7d 60 48 [ 1716.554853] RSP: 0018:ffffb086980bbe78 EFLAGS: 00010202 [ 1716.554875] RAX: 000000e7000034c4 RBX: 0000000000000000 RCX: 0000000000000000 [ 1716.554901] RDX: 0000000000000000 RSI: ffff98c0a41ad000 RDI: ffff98b8ec580840 [ 1716.554928] RBP: ffff98b8ec580840 R08: 0000000000000000 R09: 0000000000000000 [ 1716.554954] R10: ffffb086980bbeb0 R11: 0000000000000001 R12: ffff98b8e8379d40 [ 1716.554981] R13: ffff989cba73e410 R14: ffff98c0db6122a0 R15: ffff98b8ec2dff10 [ 1716.555009] FS: 00007f81583ac580(0000) GS:ffff98b8ef800000(0000) knlGS:0000000000000000 [ 1716.555038] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1716.555060] CR2: 000000e7000034d4 CR3: 000000283fc50000 CR4: 00000000003506e0 [ 1716.555087] Call Trace: [ 1716.555106] pipe_release+0x97/0xa0 [ 1716.555126] __fput+0xad/0x210 [ 1716.555148] task_work_run+0x84/0xa0 [ 1716.555170] exit_to_usermode_loop+0xfb/0x100 [ 1716.555191] do_syscall_64+0x178/0x1b0 [ 1716.555210] ? prepare_exit_to_usermode+0x4c/0xb0 [ 1716.555234] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1716.555257] RIP: 0033:0x7f81569256cb [ 1716.555275] Code: c3 48 8b 15 bf 97 29 00 f7 d8 64 89 02 b8 ff ff ff ff eb b8 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 89 97 29 00 f7 d8 [ 1716.555333] RSP: 002b:00007ffca8e6da78 EFLAGS: 00000213 ORIG_RAX: 0000000000000003 [ 1716.555362] RAX: 0000000000000000 RBX: 00005650a9f742d0 RCX: 00007f81569256cb [ 1716.555388] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005 [ 1716.555414] RBP: 00007ffca8e6da80 R08: 00007f81583ac580 R09: 00005650a9ea0330 [ 1716.555440] R10: 00007f8157c689e0 R11: 0000000000000213 R12: 0000000000000001 [ 1716.555467] R13: 0000000000000000 R14: 00005650a9ea0330 R15: 00007f81580280f0 [ 1716.555500] Modules linked in: raid1 xfs libcrc32c nft_counter ip_tables nft_compat uio_pci_generic uio nf_tables vfio_pci vfio_virqfd nfnetlink vfio_iommu_type1 vfio irqbypass cuse fuse 8021q garp stp mrp llc pcc_cpufreq bonding amd64_edac_mod edac_mce_amd crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel ses glue_helper pcspkr enclosure ipmi_si ccp k10temp i2c_piix4 ipmi_watchdog sunrpc vfat fat sch_fq_codel knem(OE) sd_mod sg ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci nvme smartpqi(OE) libahci crc32c_intel nvme_core scsi_transport_sas i40e(OE) drm libata i2c_core ngbe(OE) xpmem(OE) ipmi_devintf ipmi_msghandler [ 1716.557347] CR2: 000000e7000034d4 [ 1716.558126] ---[ end trace 838c7c9b2b36c78b ]--- [ 1716.558898] RIP: 0010:free_pipe_info+0x55/0x90 [ 1716.559651] Code: 40 85 c0 74 36 31 db 48 63 c3 48 8d 14 80 48 8b 45 78 48 8d 34 d0 48 8b 46 10 48 85 c0 74 14 48 c7 46 10 00 00 00 00 48 89 ef <48> 8b 40 10 e8 42 32 94 00 83 c3 01 39 5d 40 77 cc 48 8b 7d 60 48 [ 1716.561189] RSP: 0018:ffffb086980bbe78 EFLAGS: 00010202 [ 1716.561961] RAX: 000000e7000034c4 RBX: 0000000000000000 RCX: 0000000000000000 [ 1716.562736] RDX: 0000000000000000 RSI: ffff98c0a41ad000 RDI: ffff98b8ec580840 [ 1716.563501] RBP: ffff98b8ec580840 R08: 0000000000000000 R09: 0000000000000000 [ 1716.564264] R10: ffffb086980bbeb0 R11: 0000000000000001 R12: ffff98b8e8379d40 [ 1716.565031] R13: ffff989cba73e410 R14: ffff98c0db6122a0 R15: ffff98b8ec2dff10 [ 1716.565788] FS: 00007f81583ac580(0000) GS:ffff98b8ef800000(0000) knlGS:0000000000000000 [ 1716.566541] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1716.567282] CR2: 000000e7000034d4 CR3: 000000283fc50000 CR4: 00000000003506e0 [ 1716.568023] Kernel panic - not syncing: Fatal exception [ 1716.633434] Kernel Offset: 0x31000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) 系统环境: 架构:x86_64 CPU: Hygon C86 7265 24-core Processor
4.19.91-26.6.5.kos5.x86_64 这个内核是龙蜥的内核不?龙蜥的内核不一定存在这个问题。