Description of problem: Since commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"), pending softirqs are no longer always handled immediately, instead, if there are pending softirqs, and ksoftirqd is in state TASK_RUNNING, the handling of the softirqs are deferred, and are instead supposed to be handled by ksoftirqd, when ksoftirqd gets scheduled. If a user space process with a real-time policy or a kernel function starts to misbehave by never relinquishing the CPU while ksoftirqd is in state TASK_RUNNING, what will happen is that all softirqs will get deferred, while ksoftirqd, which is supposed to handle the deferred softirqs, will never get to run. Real world problems I have seen so far: 1. OS hung(related to rtnl and rcu_barrier()) due to RCU_SOFTIRQ starvation 2. timekeeping watchdog issue due to TIMER_SOFTIRQ starvation 3. P99 latency issue due to NET_RX_SOFTIRQ starvation Proposal: Please consider chery-pick commit d15121be7485 ("Revert "softirq: Let ksoftirqd do its job""). RHEL 9 has done so last year. See https://access.redhat.com/errata/RHSA-2023:7370.
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/4236
done