Bug 5358 - BUG:集群部署本地双DB,触发内核coredump
Summary: BUG:集群部署本地双DB,触发内核coredump
Status: NEW
Alias: None
Product: ANCK 4.19 Dev
Classification: ANCK
Component: mm (show other bugs) mm
Version: 4.19-026.x
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: wangrongwei
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-01 16:17 UTC by ljubomir
Modified: 2023-06-01 21:56 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ljubomir inspur_group 2023-06-01 16:17:01 UTC
Description of problem:
集群,包含4个节点,部署本地缓存场景部署为双db模式,某个节点发生coredump

[ 1351.279690]  nvme0n1:
[ 1352.280711]  nvme0n1:
[ 1352.313713]  nvme0n1:
[ 1352.394724]  nvme0n1:
[ 1354.435748]  nvme0n1: p1
[ 1356.263495]  nvme0n1: p1
[ 1356.291780]  nvme0n1: p1
[ 1357.412220] SGI XFS with ACLs, security attributes, no debug enabled
[ 1357.418867] XFS (nvme0n1p1): Mounting V5 Filesystem
[ 1357.422917] XFS (nvme0n1p1): Ending clean mount
[ 1376.667472] XFS (nvme0n1p1): Unmounting Filesystem
[ 1376.686368] XFS (nvme0n1p1): Mounting V5 Filesystem
[ 1376.689722] XFS (nvme0n1p1): Ending clean mount
[ 1378.146472] XFS (nvme0n1p1): Unmounting Filesystem
[ 1378.165892] XFS (nvme0n1p1): Mounting V5 Filesystem
[ 1378.169507] XFS (nvme0n1p1): Ending clean mount
[ 1667.199010]  nvme1n1:
[ 1667.228571]  nvme1n1: p1
[ 1670.348062] md/raid1:md0: not clean -- starting background reconstruction
[ 1670.348064] md/raid1:md0: active with 2 out of 2 mirrors
[ 1670.348094] md: pers->run() failed ...
[ 1670.348111] md: md0 stopped.
[ 1670.380945] BUG: Bad page state in process mdadm  pfn:283ff24
[ 1670.380994] page:fffff26660ffc900 count:-1 mapcount:0 mapping:0000000000000000 index:0x0
[ 1670.381028] flags: 0x197ffffc0000000()
[ 1670.381049] raw: 0197ffffc0000000 dead000000000100 dead000000000200 0000000000000000
[ 1670.381079] raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000
[ 1670.381109] page dumped because: nonzero _refcount
[ 1670.381130] Modules linked in: raid1 xfs libcrc32c nft_counter ip_tables nft_compat uio_pci_generic uio nf_tables vfio_pci vfio_virqfd nfnetlink vfio_iommu_type1 vfio irqbypass cuse fuse 8021q garp stp mrp llc pcc_cpufreq bonding amd64_edac_mod edac_mce_amd crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel ses glue_helper pcspkr enclosure ipmi_si ccp k10temp i2c_piix4 ipmi_watchdog sunrpc vfat fat sch_fq_codel knem(OE) sd_mod sg ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci nvme smartpqi(OE) libahci crc32c_intel nvme_core scsi_transport_sas i40e(OE) drm libata i2c_core ngbe(OE) xpmem(OE) ipmi_devintf ipmi_msghandler
[ 1670.381177] CPU: 37 PID: 76406 Comm: mdadm Kdump: loaded Tainted: G           OE     4.19.91-26.6.5.kos5.x86_64 #1
[ 1670.381178] Hardware name: Inspur AS13000G5-CG12/AS13000G5-CG12, BIOS 5.03.09 2022-10-26
[ 1670.381180] Call Trace:
[ 1670.381194]  dump_stack+0x66/0x90
[ 1670.381201]  bad_page.cold.29+0xa0/0xbe
[ 1670.381206]  free_pcppages_bulk+0x337/0x770
[ 1670.381213]  ? memcg_check_events+0xf0/0x210
[ 1670.381217]  ? page_remove_rmap+0xed/0x420
[ 1670.381219]  free_unref_page_list+0xf3/0x190
[ 1670.381221]  release_pages+0x3a1/0x430
[ 1670.381225]  tlb_flush_mmu_free+0x36/0x50
[ 1670.381227]  arch_tlb_finish_mmu+0x29/0x50
[ 1670.381229]  tlb_finish_mmu+0x1f/0x30
[ 1670.381232]  exit_mmap+0xb7/0x150
[ 1670.381236]  ? __switch_to_asm+0x35/0x70
[ 1670.381242]  mmput+0x53/0x110
[ 1670.381245]  do_exit+0x34a/0xba0
[ 1670.381250]  ? syscall_trace_enter+0x1c2/0x2a0
[ 1670.381252]  do_group_exit+0x33/0xa0
[ 1670.381254]  __x64_sys_exit_group+0x14/0x20
[ 1670.381255]  do_syscall_64+0x5f/0x1b0
[ 1670.381256]  ? prepare_exit_to_usermode+0x4c/0xb0
[ 1670.381258]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1670.381261] RIP: 0033:0x7f49272fa8d6
[ 1670.381266] Code: Bad RIP value.
[ 1670.381268] RSP: 002b:00007ffd7c76fb28 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 1670.381269] RAX: ffffffffffffffda RBX: 00007f49275bb860 RCX: 00007f49272fa8d6
[ 1670.381270] RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
[ 1670.381271] RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffff28
[ 1670.381272] R10: 00007ffd7c76f9a8 R11: 0000000000000246 R12: 00007f49275bb860
[ 1670.381273] R13: 0000000000000001 R14: 00007f49275c4388 R15: 0000000000000000
[ 1670.381275] Disabling lock debugging due to kernel taint
[ 1716.554608] BUG: unable to handle kernel paging request at 000000e7000034d4
[ 1716.554657] PGD 0 P4D 0 
[ 1716.554675] Oops: 0000 [#1] SMP NOPTI
[ 1716.554696] CPU: 36 PID: 77401 Comm: check_sysdisk.p Kdump: loaded Tainted: G    B      OE     4.19.91-26.6.5.kos5.x86_64 #1
[ 1716.554733] Hardware name: Inspur AS13000G5-CG12/AS13000G5-CG12, BIOS 5.03.09 2022-10-26
[ 1716.554771] RIP: 0010:free_pipe_info+0x55/0x90
[ 1716.554792] Code: 40 85 c0 74 36 31 db 48 63 c3 48 8d 14 80 48 8b 45 78 48 8d 34 d0 48 8b 46 10 48 85 c0 74 14 48 c7 46 10 00 00 00 00 48 89 ef <48> 8b 40 10 e8 42 32 94 00 83 c3 01 39 5d 40 77 cc 48 8b 7d 60 48
[ 1716.554853] RSP: 0018:ffffb086980bbe78 EFLAGS: 00010202
[ 1716.554875] RAX: 000000e7000034c4 RBX: 0000000000000000 RCX: 0000000000000000
[ 1716.554901] RDX: 0000000000000000 RSI: ffff98c0a41ad000 RDI: ffff98b8ec580840
[ 1716.554928] RBP: ffff98b8ec580840 R08: 0000000000000000 R09: 0000000000000000
[ 1716.554954] R10: ffffb086980bbeb0 R11: 0000000000000001 R12: ffff98b8e8379d40
[ 1716.554981] R13: ffff989cba73e410 R14: ffff98c0db6122a0 R15: ffff98b8ec2dff10
[ 1716.555009] FS:  00007f81583ac580(0000) GS:ffff98b8ef800000(0000) knlGS:0000000000000000
[ 1716.555038] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1716.555060] CR2: 000000e7000034d4 CR3: 000000283fc50000 CR4: 00000000003506e0
[ 1716.555087] Call Trace:
[ 1716.555106]  pipe_release+0x97/0xa0
[ 1716.555126]  __fput+0xad/0x210
[ 1716.555148]  task_work_run+0x84/0xa0
[ 1716.555170]  exit_to_usermode_loop+0xfb/0x100
[ 1716.555191]  do_syscall_64+0x178/0x1b0
[ 1716.555210]  ? prepare_exit_to_usermode+0x4c/0xb0
[ 1716.555234]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1716.555257] RIP: 0033:0x7f81569256cb
[ 1716.555275] Code: c3 48 8b 15 bf 97 29 00 f7 d8 64 89 02 b8 ff ff ff ff eb b8 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 89 97 29 00 f7 d8
[ 1716.555333] RSP: 002b:00007ffca8e6da78 EFLAGS: 00000213 ORIG_RAX: 0000000000000003
[ 1716.555362] RAX: 0000000000000000 RBX: 00005650a9f742d0 RCX: 00007f81569256cb
[ 1716.555388] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
[ 1716.555414] RBP: 00007ffca8e6da80 R08: 00007f81583ac580 R09: 00005650a9ea0330
[ 1716.555440] R10: 00007f8157c689e0 R11: 0000000000000213 R12: 0000000000000001
[ 1716.555467] R13: 0000000000000000 R14: 00005650a9ea0330 R15: 00007f81580280f0
[ 1716.555500] Modules linked in: raid1 xfs libcrc32c nft_counter ip_tables nft_compat uio_pci_generic uio nf_tables vfio_pci vfio_virqfd nfnetlink vfio_iommu_type1 vfio irqbypass cuse fuse 8021q garp stp mrp llc pcc_cpufreq bonding amd64_edac_mod edac_mce_amd crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel ses glue_helper pcspkr enclosure ipmi_si ccp k10temp i2c_piix4 ipmi_watchdog sunrpc vfat fat sch_fq_codel knem(OE) sd_mod sg ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci nvme smartpqi(OE) libahci crc32c_intel nvme_core scsi_transport_sas i40e(OE) drm libata i2c_core ngbe(OE) xpmem(OE) ipmi_devintf ipmi_msghandler
[ 1716.557347] CR2: 000000e7000034d4
[ 1716.558126] ---[ end trace 838c7c9b2b36c78b ]---
[ 1716.558898] RIP: 0010:free_pipe_info+0x55/0x90
[ 1716.559651] Code: 40 85 c0 74 36 31 db 48 63 c3 48 8d 14 80 48 8b 45 78 48 8d 34 d0 48 8b 46 10 48 85 c0 74 14 48 c7 46 10 00 00 00 00 48 89 ef <48> 8b 40 10 e8 42 32 94 00 83 c3 01 39 5d 40 77 cc 48 8b 7d 60 48
[ 1716.561189] RSP: 0018:ffffb086980bbe78 EFLAGS: 00010202
[ 1716.561961] RAX: 000000e7000034c4 RBX: 0000000000000000 RCX: 0000000000000000
[ 1716.562736] RDX: 0000000000000000 RSI: ffff98c0a41ad000 RDI: ffff98b8ec580840
[ 1716.563501] RBP: ffff98b8ec580840 R08: 0000000000000000 R09: 0000000000000000
[ 1716.564264] R10: ffffb086980bbeb0 R11: 0000000000000001 R12: ffff98b8e8379d40
[ 1716.565031] R13: ffff989cba73e410 R14: ffff98c0db6122a0 R15: ffff98b8ec2dff10
[ 1716.565788] FS:  00007f81583ac580(0000) GS:ffff98b8ef800000(0000) knlGS:0000000000000000
[ 1716.566541] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1716.567282] CR2: 000000e7000034d4 CR3: 000000283fc50000 CR4: 00000000003506e0
[ 1716.568023] Kernel panic - not syncing: Fatal exception
[ 1716.633434] Kernel Offset: 0x31000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)


系统环境:
架构:x86_64
CPU: Hygon C86 7265 24-core Processor
Comment 1 wangrongwei alibaba_cloud_group 2023-06-01 21:56:38 UTC
4.19.91-26.6.5.kos5.x86_64 
这个内核是龙蜥的内核不?龙蜥的内核不一定存在这个问题。