Bug 20687 - Hygon: Fix concurrent longterm pin failure when passhrough device to CSV3 VM
Summary: Hygon: Fix concurrent longterm pin failure when passhrough device to CSV3 VM
Status: NEW
Alias: None
Product: ANCK 6.6 Dev
Classification: ANCK
Component: X86 (show other bugs) X86
Version: unspecified
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: Guanjun
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-04-26 18:04 UTC by wojiaohanliyang
Modified: 2025-04-26 19:53 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description wojiaohanliyang hygon_group 2025-04-26 18:04:05 UTC
Description of problem:

If a large number of CMA memory are configured in system (for example, the CMA memory accounts for 50% of the system memory), starting a virtual machine with device passthrough, it will call pin_user_pages_remote(..., FOLL_LONGTERM, ...) to pin memory. Normally if a page is present and in CMA area, pin_user_pages_remote() will migrate the page from CMA area to non-CMA area because of FOLL_LONGTERM flag. But the current code will cause the migration failure due to unexpected page refcounts, and eventually cause the virtual machine fail to start.

During CSV3 virtual machine startup, it will also call pin_user_pages_fast(..., FOLL_LONGTERM, ...) to pin shared memory in #NPF handler. If pin_user_pages_remote() and pin_user_pages_fast() pin a same page concurrently, it may lead to unexpected page refcounts.

To solve the problem above, we use mmap_write_lock/unlock() to serialize the execution of pin_user_pages_remote() and pin_user_pages_fast().

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 小龙 admin 2025-04-26 19:53:13 UTC
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/5163