Bug 19203 - [ANCK6.6.71-3 rc1][aarch64][倚天710机器]ltp测试 用例controllers/cpuacct_1_1执行过程所使用的subgroup中cpuacct.usage值不符合预期
Summary: [ANCK6.6.71-3 rc1][aarch64][倚天710机器]ltp测试 用例controllers/cpuacct_1_1执行过程所使用的su...
Status: NEW
Alias: None
Product: Antest
Classification: Infrastructures
Component: 测试用例 (show other bugs) 测试用例
Version: unspecified
Hardware: aarch64 Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: lv0322
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-03-04 11:58 UTC by zhixin01
Modified: 2025-03-17 19:17 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description zhixin01 alibaba_cloud_group 2025-03-04 11:58:19 UTC
[缺陷描述]:
ltp测试 用例controllers/cpuacct_1_1执行过程所使用的subgroup中cpuacct.usage值不符合预期

测试日志如下:
<<<test_start>>>
tag=cpuacct_1_1 stime=1741060128
cmdline="cpuacct.sh 1 1"
contacts=""
analysis=exit
<<<test_output>>>
cpuacct 1 TINFO: Running: cpuacct.sh 1 1
cpuacct 1 TINFO: timeout per run is 0h 5m 0s
tst_pid.c:84: TINFO: Cannot read session user limits from '/sys/fs/cgroup/user.slice/user-1377975.slice/pids.max'
tst_pid.c:94: TINFO: Found limit of processes 1648358 (from /sys/fs/cgroup/pids/user.slice/user-1377975.slice/pids.max)
cpuacct 1 TINFO: task limit fulfilled (approximate need 1, limit 1646945)
cpuacct 1 TINFO: cpuacct: /sys/fs/cgroup/cpuset,cpu,cpuacct
cpuacct 1 TINFO: Creating 1 subgroups each with 1 processes
cpuacct 1 TFAIL: cpuacct.usage is not equal to 0 for 1 subgroups
cpuacct 1 TPASS: cpuacct.usage equal to subgroup*/cpuacct.usage
cpuacct 2 TINFO: removing created directories

Summary:
passed   1
failed   1
broken   0
skipped  0
warnings 0
<<<execution_status>>>
initiation_status="ok"
duration=0 termination_type=exited termination_id=1 corefile=no
cutime=4 cstime=3
<<<test_end>>>
<<<test_start>>>
tag=cpuacct_1_10 stime=1741060128
cmdline="cpuacct.sh 1 10"
contacts=""
analysis=exit
<<<test_output>>>
cpuacct 1 TINFO: Running: cpuacct.sh 1 10
cpuacct 1 TINFO: timeout per run is 0h 5m 0s
tst_pid.c:84: TINFO: Cannot read session user limits from '/sys/fs/cgroup/user.slice/user-1377975.slice/pids.max'
tst_pid.c:94: TINFO: Found limit of processes 1648358 (from /sys/fs/cgroup/pids/user.slice/user-1377975.slice/pids.max)
cpuacct 1 TINFO: task limit fulfilled (approximate need 10, limit 1646945)
cpuacct 1 TINFO: cpuacct: /sys/fs/cgroup/cpuset,cpu,cpuacct
cpuacct 1 TINFO: Creating 1 subgroups each with 10 processes
cpuacct 1 TFAIL: cpuacct.usage is not equal to 0 for 1 subgroups
cpuacct 1 TPASS: cpuacct.usage equal to subgroup*/cpuacct.usage
cpuacct 2 TINFO: removing created directories

Summary:
passed   1
failed   1
broken   0
skipped  0
warnings 0
<<<execution_status>>>
initiation_status="ok"
duration=1 termination_type=exited termination_id=1 corefile=no
cutime=20 cstime=4
<<<test_end>>>
<<<test_start>>>
tag=cpuacct_1_100 stime=1741060129
cmdline="cpuacct.sh 1 100"
contacts=""
analysis=exit
<<<test_output>>>
incrementing stop
cpuacct 1 TINFO: Running: cpuacct.sh 1 100
cpuacct 1 TINFO: timeout per run is 0h 5m 0s
tst_pid.c:84: TINFO: Cannot read session user limits from '/sys/fs/cgroup/user.slice/user-1377975.slice/pids.max'
tst_pid.c:94: TINFO: Found limit of processes 1648358 (from /sys/fs/cgroup/pids/user.slice/user-1377975.slice/pids.max)
cpuacct 1 TINFO: task limit fulfilled (approximate need 100, limit 1646945)
cpuacct 1 TINFO: cpuacct: /sys/fs/cgroup/cpuset,cpu,cpuacct
cpuacct 1 TINFO: Creating 1 subgroups each with 100 processes
cpuacct 1 TFAIL: cpuacct.usage is not equal to 0 for 1 subgroups
cpuacct 1 TPASS: cpuacct.usage equal to subgroup*/cpuacct.usage
cpuacct 2 TINFO: removing created directories

Summary:
passed   1
failed   1
broken   0
skipped  0
warnings 0
<<<execution_status>>>
initiation_status="ok"
duration=0 termination_type=exited termination_id=1 corefile=no
cutime=158 cstime=13
<<<test_end>>>

[重现概率]:
必现

[重现环境]:
环境信息:倚天710机器
100.82.243.208

#uname -r
6.6.71-3_rc1.al8.aarch64

#cat /etc/os-release
NAME="Alibaba Cloud Linux"
VERSION="3 (Soaring Falcon)"
ID="alinux"
ID_LIKE="rhel fedora centos anolis"
VERSION_ID="3"
UPDATE_ID="10"
PLATFORM_ID="platform:al8"
PRETTY_NAME="Alibaba Cloud Linux 3 (Soaring Falcon)"
ANSI_COLOR="0;31"
HOME_URL="https://www.aliyun.com/"

#lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              124
On-line CPU(s) list: 0-123
Thread(s) per core:  1
Core(s) per socket:  124
Socket(s):           1
NUMA node(s):        2
Vendor ID:           ARM
BIOS Vendor ID:      T-HEAD
Model:               0
Model name:          Neoverse-N2
BIOS Model name:     Yitian710-124
Stepping:            r0p0
CPU MHz:             2750.002
BogoMIPS:            100.00
Hypervisor vendor:   Alibaba
Virtualization type: full
L1d cache:           64K
L1i cache:           64K
L2 cache:            1024K
L3 cache:            65536K
NUMA node0 CPU(s):   0-61
NUMA node1 CPU(s):   62-123
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh

#free -h
              total        used        free      shared  buff/cache   available
Mem:          251Gi       5.7Gi       243Gi       9.0Mi       4.0Gi       245Gi
Swap:         2.0Gi       116Mi       1.9Gi

#cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-6.6.71-3_rc1.al8.aarch64 root=UUID=5d4c9cac-5324-464b-8971-09deff261ae7 ro biosdevname=0 rd.driver.pre=ahci iommu.passthrough=1 iommu.strict=0 nospectre_bhb ssbd=force-off systemd.unified_cgroup_hierarchy=0 cgroup.memory=nokmem console=ttyS0,115200 fsck.repair=yes crashkernel=0M-2G:0M,2G-256G:256M,256G-1024G:320M,1024G-:384M

#rpm -qa | grep kernel | grep 6.6.71-3_rc1.al8
kernel-devel-6.6.71-3_rc1.al8.aarch64
kernel-headers-6.6.71-3_rc1.al8.aarch64
kernel-debuginfo-6.6.71-3_rc1.al8.aarch64
kernel-6.6.71-3_rc1.al8.aarch64
kernel-debuginfo-common-aarch64-6.6.71-3_rc1.al8.aarch64

#mount |grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpuset,cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset,cpu,cpuacct)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup2 on /tmp/ltp-uAjZp8fYTs/cgroup_unified type cgroup2 (rw,relatime,memory_recursiveprot)

[重现步骤]:
# 下载并编译用例
git clone http://gitlab.alibaba-inc.com/alikernel/ltp.git -b Ali6000
cd ltp
make autotools
./configure
make
make install

# 执行用例
cd /opt/ltp
./runltp -f controllers -s cpuacct_1_1

相同问题的用例有:
cpuacct_1_10
cpuacct_10_10
cpuacct_1_100
cpuacct_100_1
cpuacct_100_100

[期望结果]:
用例执行PASS

[实际结果]:
用例执行Fail

[分析]
涉及代码如下:
vim testcases/bin/cpuacct.sh   
do_test()
{
    tst_res TINFO "Creating $max subgroups each with $nbprocess processes"

    # create and attach process to subgroups
    for i in `seq 1 $max`; do
        for j in `seq 1 $nbprocess`; do
            cpuacct_task $testpath/subgroup_$i/tasks &
            echo $! >> task_pids
        done
    done

    for pid in $(cat task_pids); do wait $pid; done
    rm -f task_pids

    acc=0
    fails=0
    for i in `seq 1 $max`; do
        tmp=`cat $testpath/subgroup_$i/cpuacct.usage`
        if [ "$tmp" -eq "0" ]; then
            fails=$((fails + 1))
        fi
        acc=$((acc + tmp))
    done

    ## check that cpuacct.usage != 0 for every subgroup
    if [ "$fails" -gt "0" ]; then
        tst_res TFAIL "cpuacct.usage is not equal to 0 for $fails subgroups"
    else
        tst_res TPASS "cpuacct.usage is not equal to 0 for every subgroup"
    fi

    ## check that ltp_subgroup/cpuacct.usage == sum ltp_subgroup/subgroup*/cpuacct.usage
    ref=`cat $testpath/cpuacct.usage`
    if [ "$ref" -ne "$acc" ]; then
        tst_res TFAIL "cpuacct.usage $ref not equal to subgroup*/cpuacct.usage $acc"
    else
        tst_res TPASS "cpuacct.usage equal to subgroup*/cpuacct.usage"
    fi
}
Comment 1 lv0322 alibaba_cloud_group 2025-03-17 19:17:43 UTC
用例问题,机器使用cgroupv1时如果是把cpuset和cpu,cpuacct挂载在一起的,需要创建cgroup后为cgroup设置cpuset.cpus和cpuset.mems,否则无法正常添加任务到cgroup中进行测试。