[缺陷描述]: alitests测试套,ali_memcg_meminfo 用例失败,FAIL: MemTotal != 2097152 [重现概率]: 必现 [重现环境]: 环境信息:倚天710机器 11.163.178.238 #uname -r 6.6.71-3_rc1.al8.aarch64 #cat /etc/os-release NAME="Alibaba Cloud Linux" VERSION="3 (Soaring Falcon)" ID="alinux" ID_LIKE="rhel fedora centos anolis" VERSION_ID="3" UPDATE_ID="10" PLATFORM_ID="platform:al8" PRETTY_NAME="Alibaba Cloud Linux 3 (Soaring Falcon)" ANSI_COLOR="0;31" HOME_URL="https://www.aliyun.com/" #lscpu Architecture: aarch64 Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 1 Core(s) per socket: 128 Socket(s): 1 NUMA node(s): 2 Vendor ID: ARM BIOS Vendor ID: T-HEAD Model: 0 Model name: Neoverse-N2 BIOS Model name: Yitian710-128 Stepping: r0p0 CPU MHz: 2750.000 BogoMIPS: 100.00 Hypervisor vendor: Alibaba Virtualization type: full L1d cache: 64K L1i cache: 64K L2 cache: 1024K L3 cache: 65536K NUMA node0 CPU(s): 0-63 NUMA node1 CPU(s): 64-127 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh #free -h total used free shared buff/cache available Mem: 125Gi 3.6Gi 121Gi 12Mi 1.0Gi 122Gi Swap: 2.0Gi 0B 2.0Gi #cat /proc/cmdline BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-6.6.71-3_rc1.al8.aarch64 root=UUID=d0af582f-7147-41de-85e3-deb2e14cde99 ro biosdevname=0 rd.driver.pre=ahci iommu.passthrough=1 iommu.strict=0 nospectre_bhb ssbd=force-off systemd.unified_cgroup_hierarchy=0 cgroup.memory=nokmem console=ttyS0,115200 fsck.repair=yes crashkernel=0M-2G:0M,2G-256G:256M,256G-1024G:320M,1024G-:384M [重现步骤]: # 下载并编译用例 git clone http://gitlab-sp.alibaba-inc.com/os-quality/alitests.git export CFLAGS="-fcommon" # gcc 10 需要添加这个 cd alitests make autotools ./configure make make install # 执行测试 /opt/ltp/runltp -f alitests -s ali_memcg_meminfo [期望结果]: 用例执行PASS [实际结果]: 用例执行Fail,FAIL: MemTotal != 2097152 日志如下: <<<test_start>>> tag=ali_memcg_meminfo stime=1740386271 cmdline="ali_memcg_meminfo" contacts="" analysis=exit <<<test_output>>> incrementing stop tst_test.c:1066: INFO: Timeout per run is 0h 05m 00s ali_memcg_meminfo.c:246: INFO: check /proc/meminfo of host MemTotal: 131715712 <================= 主机内存 MemFree: 126717552 MemAvailable: 127498852 Buffers: 164308 Cached: 1396256 Slab: 868180 ali_memcg_meminfo.c:250: INFO: memory size of container: 2097152 KB ali_memcg_meminfo.c:254: INFO: check /proc/meminfo of container ali_memcg_meminfo.c:174: INFO: test1: read rootfs/proc/meminfo from container top memcg MemTotal: 2097152 <==============顶层cg目录内存 MemFree: 2096896 MemAvailable: 2096896 Buffers: 0 Cached: 256 Slab: 0 ali_memcg_meminfo.c:182: PASS: correct meminfo from top memcg of container ali_memcg_meminfo.c:184: INFO: test2: read rootfs/proc/meminfo from container sub memcg MemTotal: 1048576 <==============子cg目录内存,预期应与顶层cg目录内存一致 MemFree: 1048320 MemAvailable: 1048320 Buffers: 0 Cached: 256 Slab: 0 ali_memcg_meminfo.c:148: FAIL: MemTotal != 2097152 Summary: passed 1 failed 1 skipped 0 warnings 0 <<<execution_status>>> initiation_status="ok" duration=0 termination_type=exited termination_id=1 corefile=no cutime=0 cstime=0 <<<test_end>>> [原因分析]: 根据用例描述,在子cgroup目录的 meminfo中的memtoal值 预期显示应显示 顶层cgroup目录的memtoal,实际显示的是子cgroup目录的值,请确认 CONFIG_RICH_CONTAINER_CG_SWITCH未开启 #grep CONFIG_RICH_CONTAINER_CG_SWITCH /boot/config-6.6.71-3_rc1.al8.aarch64 [root@t50a07416.sqa.eu95 /var/tmp/tone/run/alitests] # 根据https://project.aone.alibaba-inc.com/v2/project/1101661/bug/57159880,开发分析正常情况下,在子cgroup目录的 meminfo中的memtoal值 预期应显示 子cgroup目录的值。只有开启 CONFIG_RICH_CONTAINER_CG_SWITCH 并设置好富容器开关、cgroup.rich_container_source 等配置之后,才会显示 顶层cgroup目录的memtoal。
CONFIG_RICH_CONTAINER_CG_SWITCH 没有打开符合预期,可以通过 sysctl -w kernel.rich_container_enable=1 来使能富容器统计信息
(In reply to escape from comment #2) > CONFIG_RICH_CONTAINER_CG_SWITCH 没有打开符合预期,可以通过 sysctl -w > kernel.rich_container_enable=1 来使能富容器统计信息 通过ysctl -w kernel.rich_container_enable=1使能富容器统计信息,执行ali_memcg_meminfo还是有同样的失败,如上CONFIG_RICH_CONTAINER_CG_SWITCH 没有打开符合预期,bug单关闭