Bug 4491 - [acpi] ACPI: NFIT: a potential deadlock during NFIT teardown
Summary: [acpi] ACPI: NFIT: a potential deadlock during NFIT teardown
Status: NEW
Alias: None
Product: ANCK 5.10 Dev
Classification: ANCK
Component: general/others (show other bugs) general/others
Version: 5.10.y-9
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: liusong
QA Contact: shuming
URL:
Whiteboard:
Keywords: Bugfix
Depends on:
Blocks:
 
Reported: 2023-03-11 14:16 UTC by liusong
Modified: 2023-03-13 09:58 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description liusong 2023-03-11 14:16:35 UTC
Description of problem:

Lockdep reports that acpi_nfit_shutdown() may deadlock against an
opportune acpi_nfit_scrub(). acpi_nfit_scrub () is run from inside a
'work' and therefore has already acquired workqueue-internal locks. It
also acquiires acpi_desc->init_mutex. acpi_nfit_shutdown() first
acquires init_mutex, and was subsequently attempting to cancel any
pending workqueue items. This reversed locking order causes a potential
deadlock:

    ======================================================
    WARNING: possible circular locking dependency detected
    6.2.0-rc3 #116 Tainted: G           O     N
    ------------------------------------------------------
    libndctl/1958 is trying to acquire lock:
    ffff888129b461c0 ((work_completion)(&(&acpi_desc->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x43/0x450

    but task is already holding lock:
    ffff888129b460e8 (&acpi_desc->init_mutex){+.+.}-{3:3}, at: acpi_nfit_shutdown+0x87/0xd0 [nfit]

    which lock already depends on the new lock.

    ...

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(&acpi_desc->init_mutex);
                                  lock((work_completion)(&(&acpi_desc->dwork)->work));
                                  lock(&acpi_desc->init_mutex);
     lock((work_completion)(&(&acpi_desc->dwork)->work));

    *** DEADLOCK ***

Since the workqueue manipulation is protected by its own internal locking,
the cancellation of pending work doesn't need to be done under
acpi_desc->init_mutex. Move cancel_delayed_work_sync() outside the
init_mutex to fix the deadlock. Any work that starts after
acpi_nfit_shutdown() drops the lock will see ARS_CANCEL, and the
cancel_delayed_work_sync() will safely flush it out.

upstream: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/patch/?id=fb6df4366f86dd252bfa3049edffa52d17e7b895

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info: