Bug 5741 - Intel: Backport needed to fix UPI uncore discovery warnings on SPR MCC
Summary: Intel: Backport needed to fix UPI uncore discovery warnings on SPR MCC
Status: NEW
Alias: None
Product: ANCK 5.10 Dev
Classification: ANCK
Component: X86 (show other bugs) X86
Version: unspecified
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: Guanjun
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-05 16:04 UTC by yunyings
Modified: 2023-07-05 16:30 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description yunyings intel_group 2023-07-05 16:04:29 UTC
Description of problem:
The discovery table of UPI on SPR MCC is broken. The broken discovery table triggers kernel warning and call trace on SPR MCC, which is overkilled: 
“WARNING: CPU: xx PID: xx at arch/x86/events/intel/uncore_discovery.c:184 intel_uncore_has_discovery_tables+"

The fix patch series is to mitigate the issue by providing a hardcode pre-defined table, and it also refines the error handling code.

Commits from mainline kernel v6.3-rc1:
5d515ee40cb5 perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table
65248a9a9ee1 perf/x86/uncore: Add a quirk for UPI on SPR
bd9514a4d5ec perf/x86/uncore: Ignore broken units in discovery table
3af548f23610 perf/x86/uncore: Fix potential NULL pointer in uncore_get_alias_name
dbf061b26221 perf/x86/uncore: Factor out uncore_device_to_die()

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. boot devel-5.10 kernel on a SPR MCC
2. check dmesg for uncore warnings - "dmesg | grep -i uncore".

Actual results:
Warning and call trace found with:
"WARNING: CPU: 32 PID: 1151 at arch/x86/events/intel/uncore_discovery.c:184"

Expected results:
No such warnings.

Additional info:
It has been verified on Intel SPR MCC that after backport, above warning and call trace are gone. Instead new uncore information prints are seen, which is as expected for indicating BIOS contains a broken uncore discovery table:
"[ 6.801611] intel_uncore: Duplicate uncore type 3 box ID 7 is detected, Drop the duplicate uncore unit.
[ 6.801638] intel_uncore: Duplicate uncore type 1 box ID 7 is detected, Drop the duplicate uncore unit.
[ 6.801663] intel_uncore: Duplicate uncore type 2 box ID 7 is detected, Drop the duplicate uncore unit."
Comment 1 小龙 admin 2023-07-05 16:30:52 UTC
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/1834