1. Description of problem: (1)该问题发生在按键热拔事件中,由于remove_board函数中先执行pciehp_set_indicators设置电源指示灯熄灭,再执行pciehp_unconfigure_device进行卸载。考虑到某些外设卸载时间较长(需要5-9秒),因此电源指示灯熄灭后用户立即拔盘,就会因为OS内卸载流程未完成导致如下问题: (a)按键热拔时,按键后,电源指示灯闪烁5秒熄灭,此时立即拔盘。系统日志会报错“PCIe Bus Error”,导致OS内卸载进程阻塞在i40e_remove函数,这时候卸载进程还未释放pci互斥锁(mutex)。与此同时,OS内会调用pcie_do_recovery函数尝试恢复,但卸载进程还未释放互斥锁,导致pcie_do_recovery流程中获取互斥锁失败,因此pcie_do_recovery进程也阻塞。 (b)如果继续执行热插操作,同样会因为获取pci互斥锁失败,使插盘初始化进程阻塞在pci_lock_rescan_remove函数,最终导致系统无法加载识别插入的磁盘。 (2)外设卸载耗时的测试数据: [root@host vtu]# time echo 0 > power [ 3697.552681] pci 0001:01:00.3: Removing from iommu group 39 [ 3699.780722] pci 0001:01:00.2: Removing from iommu group 37 [ 3701.496712] pci 0001:01:00.1: Removing from iommu group 34 [ 3701.508163] pci 0001:01:00.0: Removed PTP clock [ 3703.300751] pci 0001:01:00.0: Removing from iommu group 29 real 0m9.044s user 0m0.000s sys 0m0.000s (3)pcie热插拔函数调用流程分析: a. 先按键热拔操作 pciehp_ist //pcie中断处理函数 -->if (events & PCI_EXP_SLTSTA_ABP) -->pciehp_handle_button_press -->if (events & DISABLE_SLOT) -->pciehp_handle_disable_request -->pciehp_disable_slot --> __pciehp_disable_slot -->remove_board -->pciehp_set_indicators //先设置电源指示灯熄灭 -->pciehp_unconfigure_device //然后执行卸载 -->pci_lock_rescan_remove ... ------------> 拔盘导致pcie bus error, 此时还没有释放pci互斥锁 -->pci_unlock_rescan_remove b. 然后热插操作 pciehp_ist -->if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC)) -->pciehp_handle_presence_or_link_change -->link_active = pciehp_check_link_active(ctrl) -->case OFF_STATE -->ctrl_info(ctrl, "Slot(%s): Card present\n",slot_name(ctrl)); -->pciehp_enable_slot --> __pciehp_enable_slot -->board_added -->pciehp_set_indicators //先设置电源指示灯常亮 -->pciehp_configure_device //然后设备初始化 -->pci_lock_rescan_remove -----------> 获取pci互斥锁失败 ... -->pci_unlock_rescan_remove 2. Version-Release number of selected component (if applicable): 3. How reproducible: 该问题必现 4.Steps to Reproduce: (1)执行按键热拔操作,先按键,电源指示灯熄灭则立即拔盘; (2)执行热插操作. 5. Actual results: (1)按键热拔时,按键后,电源指示灯闪烁5秒熄灭,此时立即拔盘。系统日志会报错“PCIe Bus Error”,导致OS内卸载进程阻塞在i40e_remove函数,这时候卸载进程还未释放pci互斥锁(mutex)。与此同时,OS内会调用pcie_do_recovery函数尝试恢复设备,但卸载进程还未释放互斥锁,导致pcie_do_recovery流程中获取互斥锁失败,因此pcie_do_recovery进程也阻塞。 按键拔盘后系统日志报错“PCIe Bus Error”: Dec 11 05:40:55 host kernel: [ 525.057572] pcieport 000d:80:00.0: pciehp: Slot(3): Attention button pressed Dec 11 05:40:55 host kernel: [ 525.064640] pcieport 000d:80:00.0: pciehp: Slot(3): Powering off due to button press Dec 11 05:40:59 host systemd[1]: Started Runs mimosa scheduler timed cpu-rt-runtime-task1. Dec 11 05:40:59 host systemd[1]: cpu-rt-runtime-task1.service: Succeeded. Dec 11 05:41:00 host kernel: [ 530.080038] i40e 000d:81:00.3: removed PHC from enP13s3f3. Dec 11 05:41:02 host kernel: [ 531.602363] pcieport 000d:80:00.0: AER: Multiple Corrected error received: 000d:80:00.0 Dec 11 05:41:02 host kernel: [ 531.653349] pcieport 000d:80:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Dec 11 05:41:02 host kernel: [ 531.662911] pcieport 000d:80:00.0: device [1cf2:8301] error status/mask=00000001/00002000 Dec 11 05:41:02 host kernel: [ 531.671263] pcieport 000d:80:00.0: [ 0] RxErr (First) Dec 11 05:41:02 host kernel: [ 531.678066] pcieport 000d:80:00.0: DPC: containment event, status:0x0001 source:0x0000 Dec 11 05:41:02 host kernel: [ 531.685986] pcieport 000d:80:00.0: DPC: unmasked uncorrectable error detected Dec 11 05:41:02 host kernel: [ 531.693122] pcieport 000d:80:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) Dec 11 05:41:02 host kernel: [ 531.704251] pcieport 000d:80:00.0: device [1cf2:8301] error status/mask=00004000/00108000 Dec 11 05:41:02 host kernel: [ 531.712609] pcieport 000d:80:00.0: [14] CmpltTO (First) Dec 11 05:41:02 host kernel: [ 531.719413] i40e 000d:81:00.0: i40e_pci_error_detected: error 2 Dec 11 05:41:02 host kernel: [ 531.744183] i40e 000d:81:00.0: VSI seid 400 Tx ring 383 disable timeout 卸载进程和pcie_do_recovery进程的阻塞堆栈信息如下: [root@host vtu]# ps aux | grep 'D ' root 1765 0.0 0.0 0 0 ? D 05:32 0:00 [irq/310-pciehp] root 1768 0.2 0.0 0 0 ? D 05:32 0:01 [irq/311-pciehp] root 1769 0.0 0.0 0 0 ? D 05:32 0:00 [irq/311-pcie-dp] [root@host vtu]# cat /proc/1768/stack [<0>] __switch_to+0x108/0x168 [<0>] i40e_remove+0x7c/0x420 [i40e] [<0>] pci_device_remove+0x44/0xe8 [<0>] device_release_driver_internal+0x110/0x1f0 [<0>] device_release_driver+0x1c/0x28 [<0>] pci_stop_bus_device+0x84/0xd8 [<0>] pci_stop_and_remove_bus_device+0x1c/0x30 [<0>] pciehp_unconfigure_device+0x90/0x158 [<0>] pciehp_disable_slot+0x84/0x150 [<0>] pciehp_handle_disable_request+0x50/0x70 [<0>] pciehp_ist+0x210/0x260 [<0>] irq_thread_fn+0x30/0xa0 [<0>] irq_thread+0x178/0x308 [<0>] kthread+0x114/0x118 [root@host vtu]# cat /proc/1769/stack [<0>] __switch_to+0x108/0x168 [<0>] report_error_detected+0x30/0x120 [<0>] report_frozen_detected+0x1c/0x28 [<0>] pci_walk_bus+0x68/0xc0 [<0>] pci_walk_bridge+0x20/0x40 [<0>] pcie_do_recovery+0x168/0x290 [<0>] dpc_handler+0x34/0x48 [<0>] irq_thread_fn+0x30/0xa0 [<0>] irq_thread+0x178/0x308 [<0>] kthread+0x114/0x118 Dec 11 05:52:58 host kernel: [ 1247.647752] INFO: task irq/311-pcie-dp:1769 blocked for more than 606 seconds. Dec 11 05:52:58 host kernel: [ 1247.654975] Tainted: GF OE 5.10.134-14.zncgsl6.aarch64 #1 Dec 11 05:52:58 host kernel: [ 1247.662020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 11 05:52:58 host kernel: [ 1247.669842] task:irq/311-pcie-dp state:D stack: 0 pid: 1769 ppid: 2 flags:0x00000208 Dec 11 05:52:58 host kernel: [ 1247.678185] Call trace: Dec 11 05:52:58 host kernel: [ 1247.680632] __switch_to+0x108/0x168 Dec 11 05:52:58 host kernel: [ 1247.684203] __schedule+0x2f0/0x7f8 Dec 11 05:52:58 host kernel: [ 1247.687685] schedule+0x48/0xd0 Dec 11 05:52:58 host kernel: [ 1247.690816] schedule_preempt_disabled+0x14/0x20 Dec 11 05:52:58 host kernel: [ 1247.695424] __mutex_lock.isra.9+0x260/0x5a0 Dec 11 05:52:58 host kernel: [ 1247.699685] __mutex_lock_slowpath+0x18/0x20 Dec 11 05:52:58 host kernel: [ 1247.703946] mutex_lock+0x48/0x60 Dec 11 05:52:58 host kernel: [ 1247.707251] report_error_detected+0x30/0x120 Dec 11 05:52:58 host kernel: [ 1247.711598] report_frozen_detected+0x1c/0x28 Dec 11 05:52:58 host kernel: [ 1247.715949] pci_walk_bus+0x68/0xc0 Dec 11 05:52:58 host kernel: [ 1247.719425] pci_walk_bridge+0x20/0x40 Dec 11 05:52:58 host kernel: [ 1247.723165] pcie_do_recovery+0x168/0x290 Dec 11 05:52:58 host kernel: [ 1247.727168] dpc_handler+0x34/0x48 Dec 11 05:52:58 host kernel: [ 1247.730564] irq_thread_fn+0x30/0xa0 Dec 11 05:52:58 host kernel: [ 1247.734133] irq_thread+0x178/0x308 Dec 11 05:52:58 host kernel: [ 1247.737615] kthread+0x114/0x118 (2)继续执行按键热插操作,同样会因为获取pci互斥锁失败,使插盘初始化进程阻塞在pci_lock_rescan_remove函数,最终导致系统无法加载识别插入的磁盘。系统日志中堆栈信息如下: Dec 11 06:03:04 host kernel: [ 1853.855730] INFO: task irq/310-pciehp:1765 blocked for more than 606 seconds. Dec 11 06:03:04 host kernel: [ 1853.862868] Tainted: GF OE 5.10.134-14.zncgsl6.aarch64 #1 Dec 11 06:03:04 host kernel: [ 1853.869910] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 11 06:03:04 host kernel: [ 1853.877731] task:irq/310-pciehp state:D stack: 0 pid: 1765 ppid: 2 flags:0x00000208 Dec 11 06:03:04 host kernel: [ 1853.886075] Call trace: Dec 11 06:03:04 host kernel: [ 1853.888521] __switch_to+0x108/0x168 Dec 11 06:03:04 host kernel: [ 1853.892090] __schedule+0x2f0/0x7f8 Dec 11 06:03:04 host kernel: [ 1853.895572] schedule+0x48/0xd0 Dec 11 06:03:04 host kernel: [ 1853.898700] schedule_preempt_disabled+0x14/0x20 Dec 11 06:03:04 host kernel: [ 1853.903310] __mutex_lock.isra.9+0x260/0x5a0 Dec 11 06:03:04 host kernel: [ 1853.907575] __mutex_lock_slowpath+0x18/0x20 Dec 11 06:03:04 host kernel: [ 1853.911838] mutex_lock+0x48/0x60 Dec 11 06:03:04 host kernel: [ 1853.915142] pci_lock_rescan_remove+0x20/0x28 Dec 11 06:03:04 host kernel: [ 1853.919493] pciehp_configure_device+0x2c/0x140 Dec 11 06:03:04 host kernel: [ 1853.924017] pciehp_handle_presence_or_link_change+0x200/0x3e8 Dec 11 06:03:04 host kernel: [ 1853.929842] pciehp_ist+0x220/0x260 Dec 11 06:03:04 host kernel: [ 1853.933327] irq_thread_fn+0x30/0xa0 Dec 11 06:03:04 host kernel: [ 1853.936897] irq_thread+0x178/0x308 Dec 11 06:03:04 host kernel: [ 1853.940379] kthread+0x114/0x118 Dec 11 06:03:04 host kernel: [ 1853.943600] INFO: task irq/311-pcie-dp:1769 blocked for more than 1212 seconds. Dec 11 06:03:04 host kernel: [ 1853.950902] Tainted: GF OE 5.10.134-14.zncgsl6.aarch64 #1 Dec 11 06:03:04 host kernel: [ 1853.957942] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 11 06:03:04 host kernel: [ 1853.965761] task:irq/311-pcie-dp state:D stack: 0 pid: 1769 ppid: 2 flags:0x00000208 Dec 11 06:03:04 host kernel: [ 1853.974103] Call trace: Dec 11 06:03:04 host kernel: [ 1853.976542] __switch_to+0x108/0x168 Dec 11 06:03:04 host kernel: [ 1853.980109] __schedule+0x2f0/0x7f8 Dec 11 06:03:04 host kernel: [ 1853.983591] schedule+0x48/0xd0 Dec 11 06:03:04 host kernel: [ 1853.986720] schedule_preempt_disabled+0x14/0x20 Dec 11 06:03:04 host kernel: [ 1853.991328] __mutex_lock.isra.9+0x260/0x5a0 Dec 11 06:03:04 host kernel: [ 1853.995588] __mutex_lock_slowpath+0x18/0x20 Dec 11 06:03:04 host kernel: [ 1853.999851] mutex_lock+0x48/0x60 Dec 11 06:03:04 host kernel: [ 1854.003157] report_error_detected+0x30/0x120 Dec 11 06:03:04 host kernel: [ 1854.007506] report_frozen_detected+0x1c/0x28 Dec 11 06:03:04 host kernel: [ 1854.011856] pci_walk_bus+0x68/0xc0 Dec 11 06:03:04 host kernel: [ 1854.015331] pci_walk_bridge+0x20/0x40 Dec 11 06:03:04 host kernel: [ 1854.019071] pcie_do_recovery+0x168/0x290 Dec 11 06:03:04 host kernel: [ 1854.023074] dpc_handler+0x34/0x48 Dec 11 06:03:04 host kernel: [ 1854.026467] irq_thread_fn+0x30/0xa0 Dec 11 06:03:04 host kernel: [ 1854.030034] irq_thread+0x178/0x308 Dec 11 06:03:04 host kernel: [ 1854.033516] kthread+0x114/0x118 6. Expected results: 按键热插拔操作成功执行,能够成功卸载并加载磁盘。 7. Additional info: 龙蜥社区对nvme热插拔流程做了优化,更改了remove_board函数中灭灯(pciehp_set_indicators)和卸载(pciehp_unconfigure_device)的顺序,补丁链接为: https://gitee.com/anolis/cloud-kernel/commit/c58940f9cdb18b2ab386ef953f73b0345b2c3845 回退上述补丁后,pcie按键热插拔测试验证通过。
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3101
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3103
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3302
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3303
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3339