Bug 394 - [ANCK 4.19-devel] Host panic when running offline virtualized machine for a long time
Summary: [ANCK 4.19-devel] Host panic when running offline virtualized machine for a l...
Status: CONFIRMED
Alias: None
Product: ANCK 4.19 Dev
Classification: ANCK
Component: sched (show other bugs) sched
Version: 4.19-026.x
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: CruzZhao
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-25 17:33 UTC by baka233
Modified: 2022-01-29 15:30 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description baka233 2022-01-25 17:33:32 UTC
Description of problem:

When running offline virtualized container in a long time, the host kernel may cause divide zero crash occasionally when running __cpuacct_get_usage_result.


Version-Release number of selected component (if applicable):


How reproducible:

Running offline virtualized workload, and run `cat cpuacct.proc_stat_show` for a longtime, it may cause the panic or not.

Steps to Reproduce:


Actual results:
Host panic if the offline virtualized container run a long time.

Expected results:
Work normally

Additional info:
This bug is caused by race condition when read per_cpu `kcpustats` variable, and the non-consistent tick_user and tick_guest, make the `tick_user - tick_guest` be negative.
Comment 1 CruzZhao alibaba_cloud_group 2022-01-29 15:30:01 UTC
It's a problem of rich container, Xunlei Pang may help.