Bug 25255 - [devel-6.6] usb: storage: Fix deadlock issue when remove TBT dock in system sleep state
Summary: [devel-6.6] usb: storage: Fix deadlock issue when remove TBT dock in system s...
Status: NEW
Alias: None
Product: ANCK 6.6 Dev
Classification: ANCK
Component: X86 (show other bugs) X86
Version: unspecified
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: Guanjun
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-09-22 15:18 UTC by LeoLiu-oc
Modified: 2025-09-22 15:31 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description LeoLiu-oc zhaoxin_group 2025-09-22 15:18:39 UTC
zhaoxin inclusion
category: feature

-------------------

On USB4 support platform, plug TBT3 dock into typec port, then plug an
ext4 format Udisk into the TBT3 dock. Then put system into Hibernation
after device is enumerated. If plug out TBT3 dock during sleep state then
wakeup system, the system may randomly encounter deadlocks during restore
phase of hibernation.
And restore cannot be successfully completed finally.

More explanations about deadlocks are as follows:
This TBT3 dock that consists of a PCIe switch and a PCIe endpoint.
  RP-- 00.0-+              [8086:15ef]  Upstream Port
        +-02.0-+       [8086:15ef]  Downstream Port
        |      +-00.0  [8086:15f0]  Thunderbolt 3 USB Controller
        +-04.0         [8086:15ef]  Downstream Port

During the resume process, the PCI driver detected that the switch under
RP was disconnected, so it started hot unplugging processing, which will
remove the entire PCIe hierarchy behind RP.

The removal process is as follows:
pciehp_unconfigure_device
    pci_stop_and_remove_bus_device
        pci_stop_dev
    ...
            xhci_pci_remove
                usb_remove_hcd
                    usb_disconnect
                        usb_disable_device
                            usb_unbind_interface
                                usb_stor_disconnect
                                    quiesce_and_remove_host
                                        scsi_remove_host
                                            scsi_forget_host
                                                __scsi_remove_device
                                                    sd_remove

del_gendisk
    invalidate_partition
        fsync_bdev
            sync_filesystem
                __sync_filesystem
                    sb->s_op->sync_fs
                        ext4_sync_fs
                            blkdev_issue_flush
                                submit_bio_wait
                                    submit_bio
                                        generic_make_request
                                            blk_queue_enter

Finally, it will stuck on the blk_queue_enter function and will never
return, As request queue not mark dying and only pm request is allowed.
On the other hand, udisk and sd device resume also need to get device_lock
which has already been obtained during the remove process. Therefore, a
deadlock will occur here.

To fix this issue, when deleting a SCSI device, if it is detected that the
device was suprise removed, mark the device's request queue as dying. At
the same time, add callback function to usb storage driver to identify
surprise remove.
Signed-off-by: leoliu-oc <leoliu-oc@zhaoxin.com>
Comment 1 小龙 admin 2025-09-22 15:31:20 UTC
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/5799