Bug 21075 - watchdog_thresh设置为60时,需要200秒以上才触发softlockup(内核版本5.10.134-18)
Summary: watchdog_thresh设置为60时,需要200秒以上才触发softlockup(内核版本5.10.134-18)
Status: NEW
Alias: None
Product: ANCK 5.10 Dev
Classification: ANCK
Component: sched (show other bugs) sched
Version: 5.10.y-18
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: dtcccc
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-05-16 10:25 UTC by XRender
Modified: 2025-05-26 20:18 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description XRender 2025-05-16 10:25:25 UTC
Description of problem:
watchdog_thresh设置为60时,需要200秒以上才触发softlockup

Version-Release number of selected component (if applicable):

Linux localhost.localdomain 5.10.134-18.an8.x86_64 #1 SMP Fri Dec 13 16:32:58 CST 2024 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:

Steps to Reproduce:
1.echo 1 > /proc/sys/kernel/watchdog
2.echo 60 > /proc/sys/kernel/watchdog_thresh
3.echo 1 > /proc/sys/kernel/softlockup_panic
4.注入内核态死循环故障

Actual results:

200秒以上触发soft lockup
Expected results:
120秒左右触发softlockup复位

Additional info:

故障注入代码:
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/kthread.h>
#include <linux/sched.h>
#include <linux/delay.h>
#include <linux/cpumask.h>
#include <linux/timekeeping.h>

static struct task_struct *my_thread;

static int cpu_id;

static int delay_time;

module_param(cpu_id, int ,0644);

module_param(delay_time, int ,0644);

static int my_thread_fn(void *data)
{
    ktime_t start, end;
    s64 elapsed_ns;

    pr_info("[my_thread] running on CPU %d\n", smp_processor_id());

    while (!kthread_should_stop())
    {
    }

    return 0;
}

static int __init my_module_init(void)
{
    pr_info("[my_module] init\n");

    my_thread = kthread_create(my_thread_fn, NULL, "my_kthread");
    if (IS_ERR(my_thread)) {
        pr_err("[my_module] Failed to create kthread\n");
        return PTR_ERR(my_thread);
    }
    wake_up_process(my_thread);

    return 0;
}

static void __exit my_module_exit(void)
{
    pr_info("[my_module] exit\n");

    if (my_thread)
        kthread_stop(my_thread);
}

module_init(my_module_init);
module_exit(my_module_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("ChatGPT");
MODULE_DESCRIPTION("cpu loop update demo with timing");
Comment 1 dtcccc alibaba_cloud_group 2025-05-26 12:11:49 UTC
[  248.899325] [my_module] init
[  248.899371] [my_thread] running on CPU 0
[  414.615575] watchdog: BUG: soft lockup - CPU#0 stuck for 157s! [my_kthread:3225]
[  414.616225] CPU#0 Utilization every 24s during lockup:
[  414.616635]  #1: 100% system,          0% softirq,     1% hardirq,     0% idle
[  414.617142]  #2: 100% system,          0% softirq,     1% hardirq,     0% idle
[  414.617638]  #3: 100% system,          0% softirq,     1% hardirq,     0% idle
[  414.618142]  #4: 100% system,          0% softirq,     1% hardirq,     0% idle
[  414.618635]  #5: 100% system,          0% softirq,     1% hardirq,     0% idle

我这测下来是比120s要长点,但是也没到200s的地步啊
Comment 2 XRender 2025-05-26 20:18:35 UTC
代码网站:
https://mirrors.aliyun.com/anolis/8/kernel-5.10/source/Packages/?spm=a2c6h.25603864.0.0.349a715fSNXlOX
下载rpm包:
kernel-5.10.134-18.an8.src.rpm
使用编译配置:
kernel-5.10.134-x86_64.config

安装该内核后,使用前面提供的配置:
1.echo 1 > /proc/sys/kernel/watchdog
2.echo 60 > /proc/sys/kernel/watchdog_thresh
3.echo 1 > /proc/sys/kernel/softlockup_panic

注入软狗故障能看到200秒以上的时延;请确认一下问题