[缺陷描述]: 倚天安装 64k 内核后环境无法正常启动,串口有Call Trace,folio_remove_rmap_ptes+0x68/0x140 [机器信息]: 环境:物理机 机型:倚天 原内核版本: # uname -r 6.6.71-3_rc2.al8.aarch64 内存信息: # free -h total used free shared buff/cache available Mem: 503Gi 3.0Gi 476Gi 12Mi 23Gi 497Gi Swap: 2.0Gi 0B 2.0Gi cpu信息: # lscpu Architecture: aarch64 Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 1 Core(s) per socket: 128 Socket(s): 1 NUMA node(s): 2 Vendor ID: ARM BIOS Vendor ID: T-HEAD Model: 0 Model name: Neoverse-N2 BIOS Model name: Yitian710-128 Stepping: r0p0 CPU MHz: 2750.001 BogoMIPS: 100.00 L1d cache: 64K L1i cache: 64K L2 cache: 1024K L3 cache: 65536K NUMA node0 CPU(s): 0-63 NUMA node1 CPU(s): 64-127 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh CMDLINE: #cat /proc/cmdline BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-6.6.71-3_rc2.al8.aarch64 root=UUID=1e1d9fc1-be93-4b6b-bb50-9f86448f8a4d ro biosdevname=0 rd.driver.pre=ahci console=ttyS0,115200 fsck.repair=yes cgroup.memory=nokmem crashkernel=0M-2G:0M,2G-64G:256M,64G-:384M iommu.passthrough=1 iommu.strict=0 ssbd=force-off nospectre_bhb no_hash_pointers transparent_hugepage_tmpfs=always thp_shmem=64K:always thp_anon=64K:always thp_file=2M:always+exec [重现步骤]: rpm -ivh --force http://koji.alibaba-inc.com/kojifiles/work/tasks/1956/731956/kernel-6.6.71-3.64k_rc1.al8.aarch64.rpm rpm -ivh --force https://koji.alibaba-inc.com/kojifiles/work/tasks/1956/731956/kernel-devel-6.6.71-3.64k_rc1.al8.aarch64.rpm rpm -ivh --force https://koji.alibaba-inc.com/kojifiles/work/tasks/1956/731956/kernel-headers-6.6.71-3.64k_rc1.al8.aarch64.rpm reboot [期望结果]: 机器可以正常启动,串口没有Call Trace [实际结果]: 无法正常启动,串口有Call Trace,串口日志如下: 17.450903] swapper pgtable: 64k pages, 48-bit VAs, pgdp=00000000f1bb0000 [ 17.539629] [ffffffc10020001c] pgd=00000000f2960003, p4d=00000000f2960003, pud=00000000f2960003, pmd=10000447fff50003, pte=0000000000000000 [ 17.539638] Internal error: Oops: 0000000096000007 [#1] SMP [ 17.557702] Modules linked in: libcrc32c(E) nfnetlink(E) ipmi_ssif(E) coresight_catu(E) crct10dif_ce(E) ghash_ce(E) sm4_ce_gcm(E) sm4_ce_ccm(E) sm4_ce(E) mlx5_ib(E) sm4_ce_cipher(E) sm4(E) sm3_ce(E) ib_uverbs(E) sha1_ce(E) ib_core(E) sbsa_gwdt(E) acpi_ipmi(E) arm_spe_pmu(E) arm_smmuv3_pmu(E) ipmi_si(E) alibaba_uncore_drw_pmu(E) coresight_stm(E) stm_core(E) coresight_etm4x(E) coresight_funnel(E) coresight_tmc(E) coresight(E) vfat(E) fat(E) ast(E) i2c_algo_bit(E) drm_shmem_helper(E) mlx5_core(E) sha2_ce(E) drm_kms_helper(E) nvme(E) sha256_arm64(E) mlxfw(E) drm(E) pci_hyperv_intf(E) nvme_core(E) psample(E) sd_mod(E) t10_pi(E) sg(E) ahci(E) libahci(E) libata(E) ipmi_devintf(E) ipmi_msghandler(E) Star[ti n g 1H7a.r6d1w8942] CPU: 113 PID: 2133 Comm: (chronyd) Tainted: G E 6.6.71-3.64k_rc1.al8.aarch64 #1 re RNG Entropy Gatherer Wake threshold service... [ 17.618945] Hardware name: AlibabaCloud AS1212MG1/AS03MB07, BIOS 1.2.M1.AL.P.160.01 12/21/2023 [ 17.618947] pstate: 63401009 (nZCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--) [ 17.618950] pc : folio_remove_rmap_ptes+0x68/0x140 Starting Sy[s t e m1 7L.655500] lr : zap_present_ptes+0x210/0x618 gger Daemon... [ OK ] Reached target sshd-keygen.target. [ 17.668522] sp : ffff8000a798f720 [ 17.668523] x29: ffff8000a798f720 x28: ffff8000a798f9e0 x27: 0000000000000015 [ 17.668526] x26: ffff040016b09840 x25: 0000000000000001 x24: ffffffc1001fffc0 [ 17.668528] x23: ffff8000a798f868 x22: 0000aaaabeee0000 x21: ffff04080fa2f770 [ 1 7 . 6 6S8t5a3r1t]i nxg0: ffff040016b09840 x19: ffffffc1001fffc0 x18: ffff8000a798fa78 IBM Power Raid dump daemon... [ 17.668534] x17: 00000000ffffffff x16: ffff800080ec65c8 x15: 0000ffffa2d2ffff [ 17.704568] x14: 0000000000000000 x13: 1fffe080030748c1 x12: ffff8000a798fa78 [ 17.704570] x11: 0000000000000001 x10: ffff0400183a460c x9 : ffff800080304248 [ 17.704572] x8 : 00000000007fffff x7 : ffffffc10020001c x6 : 0000000000000001 [ OK ] Started Self [ 17.704575] x5 : 00000000ffffffff x4 : 00000000ffffffff x3 : 0000000000000012 Monitoring and R[ 1e7p.o7r3t3i056] x2 : 00000000ffffffff x1 : ffff04004ba88421 x0 : ffffffc1001fffc0 g Technology (SMART) Daemon. [FAILED] Failed to start Configure CPU power related settings. [ 17.733058] Call trace: [ 17.733060] folio_remove_rmap_ptes+0x68/0x140 [ 17.733063] zap_present_ptes+0x210/0x618 [ 17.733066] zap_pte_range+0x2fc/0x670 [ 17.776118] zap_pmd_range+0xe8/0x1c8 [ 17.776121] unmap_page_range+0xd8/0x190 See 'systemctl status cpupower.service' for details. [ 17.788456] unmap_single_vma.constprop.0+0x8c/0x108 [ 17.788459] unmap_vmas+0x84/0x3d8 [ 17.788461] exit_mmap+0xbc/0x3d0 [ 17.788462] __mmput+0x40/0x180 [[ 0 ;117;.3718m8F4A6I6L]E D put+0x6c/0x80 [0m] Failed to start TCG Core Services Daemon. [ 17.788468] exec_mmap+0x148/0x268 [ 17.815128] begin_new_exec+0x10c/0x370 [ 17.815130] load_elf_binary+0x304/0xbc8 [ 17.822864] search_binary_handler+0xd4/0x260 See 'systemctl status tcsd.s[e r 17.827211] exec_binprm+0x5c/0x1e0 ice' for details. [ 17.827212] bprm_execve.part.0+0x190/0x228 [ 17.827214] bprm_execve+0x60/0x98 [ 17.827214] do_execveat_common+0x184/0x220 [[ 0 ;1372.m8 2 7O2K1 6 ] 0_arm64_sys_execve+0x3c/0x58 m] Started Restore /run/initramfs on shutdown. [ 17.827217] do_el0_svc+0x70/0xf8 [ 17.827222] el0_svc+0x50/0x218 [ 17.862812] el0t_64_sync_handler+0xf8/0x128 [ 17.862814] el0t_64_sync+0x17c/0x180 [ 17.862817] Code: 52800243 4b0603e2 aa1303e0 f9400e61 (b9405e75) [ 17.862821] ---[ end trace 0000000000000000 ]--- [ 17.862822] Kernel panic - not syncing: Oops: Fatal exception [ 17.862825] SMP: stopping secondary CPUs [ 17.903848] Kernel Offset: disabled [ 17.903851] CPU features: 0x2,00380001,e022cd43,1047fe0b [ 17.903852] Memory Limit: none smc_fid: 84000009 INFO: mpidr:181310000, stop s-wtd. INFO: PSCI Power Domain Map: INFO: Domain Node : Level 1, parent_node 4294967295, State ON (0x0) INFO: Domain Node : Level 1, parent_node 4294967295, State ON (0x0)
机器陷入反复重启中,后面又启动成功了,可以进入系统
换了另一台倚天机器,装上内核后立刻crash了一次。现象类似,vmcore解析如下: WARNING: active task ffff040043e28000 on cpu 112 not found in PID hash KERNEL: /usr/lib/debug/usr/lib/modules/6.6.71-3.64k_rc1.al8.aarch64/vmlinux [TAINTED] DUMPFILE: /var/crash/127.0.0.1-2025-03-10-13:55:53/vmcore [PARTIAL DUMP] CPUS: 124 DATE: Mon Mar 10 13:54:44 CST 2025 UPTIME: 00:00:35 LOAD AVERAGE: 2.91, 0.77, 0.26 TASKS: 1323 NODENAME: v43c07454.sqa.na131 RELEASE: 6.6.71-3.64k_rc1.al8.aarch64 VERSION: #1 SMP PREEMPT_DYNAMIC Fri Feb 28 10:42:23 CST 2025 MACHINE: aarch64 (unknown Mhz) MEMORY: 256 GB PANIC: "Unable to handle kernel paging request at virtual address ffffffc10020001c" PID: 11601 COMMAND: "user_account.sh" TASK: ffff040043e28000 [THREAD_INFO: ffff040043e28000] CPU: 112 STATE: TASK_RUNNING (PANIC) crash> bt PID: 11601 TASK: ffff040043e28000 CPU: 112 COMMAND: "user_account.sh" #0 [ffff8000dd66f240] machine_kexec at ffff80008002ffa0 #1 [ffff8000dd66f270] __crash_kexec at ffff800080152adc #2 [ffff8000dd66f3f0] crash_kexec at ffff800080153cb4 #3 [ffff8000dd66f410] die at ffff80008001f07c #4 [ffff8000dd66f4c0] die_kernel_fault at ffff800080033420 #5 [ffff8000dd66f500] __do_kernel_fault at ffff8000800336e4 #6 [ffff8000dd66f530] do_bad_area at ffff80008003377c #7 [ffff8000dd66f550] do_translation_fault at ffff800080d27cbc #8 [ffff8000dd66f560] do_mem_abort at ffff80008003352c #9 [ffff8000dd66f590] el1_abort at ffff800080d1487c #10 [ffff8000dd66f5c0] el1h_64_sync_handler at ffff800080d1696c #11 [ffff8000dd66f700] el1h_64_sync at ffff800080011304 #12 [ffff8000dd66f720] folio_remove_rmap_ptes at ffff80008032077c #13 [ffff8000dd66f750] zap_present_ptes at ffff800080304244 #14 [ffff8000dd66f7d0] zap_pte_range at ffff800080304948 #15 [ffff8000dd66f880] zap_pmd_range at ffff80008030ab94 #16 [ffff8000dd66f8f0] unmap_page_range at ffff80008030ad4c #17 [ffff8000dd66f950] unmap_single_vma.constprop.0 at ffff80008030ae90 #18 [ffff8000dd66f990] unmap_vmas at ffff80008030c5f0 #19 [ffff8000dd66fa20] exit_mmap at ffff800080313db0 #20 [ffff8000dd66fb40] __mmput at ffff80008004b05c #21 [ffff8000dd66fb70] mmput at ffff80008004b208 #22 [ffff8000dd66fb90] exec_mmap at ffff8000803d15cc #23 [ffff8000dd66fbd0] begin_new_exec at ffff8000803d3000 #24 [ffff8000dd66fc00] load_elf_binary at ffff800080449440 #25 [ffff8000dd66fce0] search_binary_handler at ffff8000803d0488 #26 [ffff8000dd66fd30] exec_binprm at ffff8000803d0b78 #27 [ffff8000dd66fd70] bprm_execve at ffff8000803d103c #28 [ffff8000dd66fdb0] bprm_execve at ffff8000803d1134 #29 [ffff8000dd66fdf0] do_execveat_common at ffff8000803d2600 #30 [ffff8000dd66fe40] __arm64_sys_execve at ffff8000803d26d8 #31 [ffff8000dd66fe60] do_el0_svc at ffff800080026e5c #32 [ffff8000dd66fe80] el0_svc at ffff800080d1660c #33 [ffff8000dd66fea0] el0t_64_sync_handler at ffff800080d16b14 #34 [ffff8000dd66ffe0] el0t_64_sync at ffff800080011608 PC: 0000ffffb89c8e4c LR: 0000aaaacf176dec SP: 0000ffffdf132cf0 X29: 0000ffffdf132cf0 X28: 0000aaab00765410 X27: 0000000000000000 X26: 00000000ffffffff X25: 0000aaab00751c30 X24: 0000aaab007b2a70 X23: 0000aaaacf26f944 X22: 0000000000000000 X21: 0000aaab007af6f0 X20: 0000aaaacf25f000 X19: 0000aaab007af790 X18: 0000aaab00740018 X17: 0000ffffb89c8e40 X16: 0000aaaacf25ed60 X15: 0000000000000030 X14: 0000000000000002 X13: 0000000000000001 X12: 0000000000000000 X11: 0000000000000000 X10: 0000000000000000 X9: 0000aaab007b2a60 X8: 00000000000000dd X7: 0000000000002791 X6: 0000000000000031 X5: 0000aaab007b2a90 X4: 0000000000000000 X3: 0000ffffb8baf5d8 X2: 0000aaab00751c30 X1: 0000aaab007b2a70 X0: 0000aaab007af790 ORIG_X0: 0000aaab007af790 SYSCALLNO: dd PSTATE: 60001000
rc2内核该问题已解决