While testing Granite Rapids (GNR) and Clearwater Forest (CWF) systems in SNC-3 mode, we encountered sched domain build errors in dmesg. The scheduler domain code did not expect asymmetric node distances from a local node to multiple nodes in a remote package. As a result, remote nodes ended up being grouped partially with local nodes with asymemtric groupings, and creating too many levels in the NUMA sched domain hierarchy. To address this, we simplify remote node distances for the purpose of sched domain construction on GNR and CWF. Specifically, we replace the individual distances to nodes within the same remote package with their average distance. This resolves the domain build errors and reduces the number of NUMA sched domain levels. The actual SLIT NUMA node distances are still preserved separately, in case they are needed when building sched domains. NUMA balancing continues to use the true distances when selecting a closer remote node for a task’s numa_group. The following two commits backported, as well as its necessary dependencies if has. - 0001-sched-Create-architecture-specific-sched-domain-dist.patch - 0002-sched-topology-Fix-sched-domain-build-error-for-GNR-.patch
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/6206