Bug 19046 - [ANCK6.6.71-3 rc1][aarch64][倚天710机器]alitests测试套,ali_memcg_meminfo 用例失败,FAIL: MemTotal != 2097152
Summary: [ANCK6.6.71-3 rc1][aarch64][倚天710机器]alitests测试套,ali_memcg_meminfo 用例失败,FAIL: ...
Status: CLOSED WONTFIX
Alias: None
Product: Antest
Classification: Infrastructures
Component: 测试用例 (show other bugs) 测试用例
Version: unspecified
Hardware: aarch64 Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: shuancue
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-02-24 16:45 UTC by zhixin01
Modified: 2025-03-17 21:58 UTC (History)
10 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description zhixin01 alibaba_cloud_group 2025-02-24 16:45:53 UTC
[缺陷描述]:
alitests测试套,ali_memcg_meminfo 用例失败,FAIL: MemTotal != 2097152


[重现概率]:
必现

[重现环境]:
环境信息:倚天710机器
11.163.178.238

#uname -r
6.6.71-3_rc1.al8.aarch64

#cat /etc/os-release
NAME="Alibaba Cloud Linux"
VERSION="3 (Soaring Falcon)"
ID="alinux"
ID_LIKE="rhel fedora centos anolis"
VERSION_ID="3"
UPDATE_ID="10"
PLATFORM_ID="platform:al8"
PRETTY_NAME="Alibaba Cloud Linux 3 (Soaring Falcon)"
ANSI_COLOR="0;31"
HOME_URL="https://www.aliyun.com/"

#lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              128
On-line CPU(s) list: 0-127
Thread(s) per core:  1
Core(s) per socket:  128
Socket(s):           1
NUMA node(s):        2
Vendor ID:           ARM
BIOS Vendor ID:      T-HEAD
Model:               0
Model name:          Neoverse-N2
BIOS Model name:     Yitian710-128
Stepping:            r0p0
CPU MHz:             2750.000
BogoMIPS:            100.00
Hypervisor vendor:   Alibaba
Virtualization type: full
L1d cache:           64K
L1i cache:           64K
L2 cache:            1024K
L3 cache:            65536K
NUMA node0 CPU(s):   0-63
NUMA node1 CPU(s):   64-127
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh

#free -h
              total        used        free      shared  buff/cache   available
Mem:          125Gi       3.6Gi       121Gi        12Mi       1.0Gi       122Gi
Swap:         2.0Gi          0B       2.0Gi

#cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-6.6.71-3_rc1.al8.aarch64 root=UUID=d0af582f-7147-41de-85e3-deb2e14cde99 ro biosdevname=0 rd.driver.pre=ahci iommu.passthrough=1 iommu.strict=0 nospectre_bhb ssbd=force-off systemd.unified_cgroup_hierarchy=0 cgroup.memory=nokmem console=ttyS0,115200 fsck.repair=yes crashkernel=0M-2G:0M,2G-256G:256M,256G-1024G:320M,1024G-:384M

[重现步骤]:
# 下载并编译用例
git clone http://gitlab-sp.alibaba-inc.com/os-quality/alitests.git
export CFLAGS="-fcommon"               #  gcc 10 需要添加这个
cd alitests
make autotools
./configure
make
make install

# 执行测试
/opt/ltp/runltp -f alitests -s ali_memcg_meminfo

[期望结果]:
用例执行PASS

[实际结果]:
用例执行Fail,FAIL: MemTotal != 2097152
日志如下:
<<<test_start>>>
tag=ali_memcg_meminfo stime=1740386271
cmdline="ali_memcg_meminfo"
contacts=""
analysis=exit
<<<test_output>>>
incrementing stop
tst_test.c:1066: INFO: Timeout per run is 0h 05m 00s
ali_memcg_meminfo.c:246: INFO: check /proc/meminfo of host
MemTotal: 131715712    <================= 主机内存
MemFree: 126717552
MemAvailable: 127498852
Buffers: 164308
Cached: 1396256
Slab: 868180
ali_memcg_meminfo.c:250: INFO: memory size of container: 2097152 KB
ali_memcg_meminfo.c:254: INFO: check /proc/meminfo of container
ali_memcg_meminfo.c:174: INFO: test1: read rootfs/proc/meminfo from container top memcg
MemTotal: 2097152           <==============顶层cg目录内存
MemFree: 2096896
MemAvailable: 2096896
Buffers: 0
Cached: 256
Slab: 0
ali_memcg_meminfo.c:182: PASS: correct meminfo from top memcg of container
ali_memcg_meminfo.c:184: INFO: test2: read rootfs/proc/meminfo from container sub memcg
MemTotal: 1048576        <==============子cg目录内存,预期应与顶层cg目录内存一致
MemFree: 1048320
MemAvailable: 1048320
Buffers: 0
Cached: 256
Slab: 0
ali_memcg_meminfo.c:148: FAIL: MemTotal != 2097152

Summary:
passed   1
failed   1
skipped  0
warnings 0
<<<execution_status>>>
initiation_status="ok"
duration=0 termination_type=exited termination_id=1 corefile=no
cutime=0 cstime=0
<<<test_end>>>

[原因分析]:
根据用例描述,在子cgroup目录的 meminfo中的memtoal值  预期显示应显示 顶层cgroup目录的memtoal,实际显示的是子cgroup目录的值,请确认

CONFIG_RICH_CONTAINER_CG_SWITCH未开启
#grep CONFIG_RICH_CONTAINER_CG_SWITCH /boot/config-6.6.71-3_rc1.al8.aarch64

[root@t50a07416.sqa.eu95 /var/tmp/tone/run/alitests]
#

根据https://project.aone.alibaba-inc.com/v2/project/1101661/bug/57159880,开发分析正常情况下,在子cgroup目录的 meminfo中的memtoal值  预期应显示 子cgroup目录的值。只有开启 CONFIG_RICH_CONTAINER_CG_SWITCH 并设置好富容器开关、cgroup.rich_container_source 等配置之后,才会显示 顶层cgroup目录的memtoal。
Comment 2 escape alibaba_cloud_group 2025-03-10 11:56:48 UTC
CONFIG_RICH_CONTAINER_CG_SWITCH 没有打开符合预期,可以通过 sysctl -w kernel.rich_container_enable=1 来使能富容器统计信息
Comment 3 zhixin01 alibaba_cloud_group 2025-03-11 17:09:03 UTC
(In reply to escape from comment #2)
> CONFIG_RICH_CONTAINER_CG_SWITCH 没有打开符合预期,可以通过 sysctl -w
> kernel.rich_container_enable=1 来使能富容器统计信息


通过ysctl -w kernel.rich_container_enable=1使能富容器统计信息,执行ali_memcg_meminfo还是有同样的失败,如上CONFIG_RICH_CONTAINER_CG_SWITCH 没有打开符合预期,bug单关闭