Bug 4432 - [Anolis23][x86_64][nightly][ANCK-5.10-14] kernel-selftests测试case:clone3.clone3_cap_checkpoint_restore测试异常
Summary: [Anolis23][x86_64][nightly][ANCK-5.10-14] kernel-selftests测试case:clone3.clone...
Status: NEW
Alias: None
Product: Anolis OS 23
Classification: Anolis OS
Component: Others (show other bugs) Others
Version: 23.0
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: happy_orange
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-08 11:23 UTC by anolislw
Modified: 2024-02-01 15:23 UTC (History)
8 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description anolislw alibaba_cloud_group 2023-03-08 11:23:49 UTC
Description of problem:
Anolis23 x86_64 ECS环境,社区nightly,kernel-selftests测试,clone3.clone3_cap_checkpoint_restore测试异常

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1)下载内核src源码包
2)rpm-i kernel-xxx.src.rpm
3) yum-builddep -y ~/rpmbuild/SPECS/kernel.spec
4) rpmbuild -bp ~/rpmbuild/SPECS/kernel.spec
5) cd ~/rpmbuild/BUILD/kernel-5.10.xxx.an23/linux-5.10.xxx.an23.x86_64/
6) cd tools/testing/selftests/clone3 && make && ./clone3_cap_checkpoint_restore

Actual results:
[root@qibo-anck14-an23-milan-tmp clone3]# ./clone3_cap_checkpoint_restore
TAP version 13
1..1
# Starting 1 tests from 1 test cases.
#  RUN           global.clone3_cap_checkpoint_restore ...
# clone3() syscall supported
# clone3_cap_checkpoint_restore.c:155:clone3_cap_checkpoint_restore:Child has PID 873879
cap_set_proc: Operation not permitted
# clone3_cap_checkpoint_restore.c:164:clone3_cap_checkpoint_restore:Expected set_capability() (-1) == 0 (0)
# clone3_cap_checkpoint_restore.c:165:clone3_cap_checkpoint_restore:Could not set CAP_CHECKPOINT_RESTORE
# clone3_cap_checkpoint_restore: Test terminated by assertion
#          FAIL  global.clone3_cap_checkpoint_restore
not ok 1 global.clone3_cap_checkpoint_restore
# FAILED: 0 / 1 tests passed.
# Totals: pass:0 fail:1 xfail:0 xpass:0 skip:0 error:0
[root@qibo-anck14-an23-milan-tmp clone3]# vim clone3_cap_checkpoint_restore


Expected results:
case pass

Additional info:
[root@qibo-anck14-an23-milan-tmp clone3]# uname -r
5.10.134-38.git.600961b9c9d4.an23.x86_64
[root@qibo-anck14-an23-milan-tmp clone3]# cat /etc/anolis-release
Anolis OS release 23
[root@qibo-anck14-an23-milan-tmp clone3]# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-5.10.134-38.git.600961b9c9d4.an23.x86_64 root=UUID=29bfdd09-2855-44bd-a225-8f2a34c15e78 ro rhgb cryptomgr.notests rcupdate.rcu_cpu_stall_timeout=300 quiet biosdevname=0 net.ifnames=0 console=tty0 console=ttyS0,115200n8 noibrs nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295 cgroup.memory=nokmem crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M
[root@qibo-anck14-an23-milan-tmp clone3]# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        4.0M     0  4.0M   0% /dev
tmpfs            16G     0   16G   0% /dev/shm
tmpfs           6.2G  548K  6.2G   1% /run
/dev/vda2        40G   13G   28G  33% /
tmpfs            16G  1.4G   14G   9% /tmp
tmpfs           3.1G     0  3.1G   0% /run/user/0
[root@qibo-anck14-an23-milan-tmp clone3]# free -g
               total        used        free      shared  buff/cache   available
Mem:              30           0          24           1           5          28
Swap:              0           0           0
[root@qibo-anck14-an23-milan-tmp clone3]# lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               AuthenticAMD
  BIOS Vendor ID:        Alibaba Cloud
  Model name:            AMD EPYC 7T83 64-Core Processor
    BIOS Model name:     pc-i440fx-2.1  CPU @ 0.0GHz
    BIOS CPU family:     1
    CPU family:          25
    Model:               1
    Thread(s) per core:  2
    Core(s) per socket:  4
    Socket(s):           1
    Stepping:            1
    BogoMIPS:            5090.43
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht s
                         yscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid t
                         sc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline
                         _timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowp
                         refetch osvw topoext invpcid_single vmmcall tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx sma
                         p clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr rdpru wbnoinvd arat vaes vpc
                         lmulqdq rdpid fsrm
Virtualization features:
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):
  L1d:                   128 KiB (4 instances)
  L1i:                   128 KiB (4 instances)
  L2:                    2 MiB (4 instances)
  L3:                    32 MiB (1 instance)
NUMA:
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-7
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected
Comment 1 yunmeng365524 2023-03-09 16:05:10 UTC
an8 上的nightly执行OK,请帮忙确认一下。
Comment 2 Alierwei alibaba_cloud_group 2023-03-13 10:29:53 UTC
an8上是OK的,有没有重新确认过这个问题,是否是环境配置相关的问题?
Comment 3 anolislw alibaba_cloud_group 2023-03-13 15:22:45 UTC
(In reply to Alierwei from comment #2)
> an8上是OK的,有没有重新确认过这个问题,是否是环境配置相关的问题?

在 Anolis8-5.10-x86_64上这个case确认了下是pass的
-------------------
  {
    "testcase": "clone3.clone3_cap_checkpoint_restore",
    "value": "Pass"
  }


但是在anolis23 5.10 x86环境下这个case是fail必现的
-----------------------
  {
    "testcase": "clone3.clone3_cap_checkpoint_restore",
    "value": "Fail"
  }
Comment 4 Alierwei alibaba_cloud_group 2023-03-24 17:48:11 UTC
报错信息是:Operation not permitted 没有权限
首先这个不是内核的问题,因为 config 配置项中,CONFIG_CHECKPOINT_RESTORE 选项是开启的,说明内核允许设置 CAP_CHECKPOINT_RESTORE 能力的,而且相同的内核 an8 上可行,an23 上不可行,可能是系统中的安全限制导致没有权限操作。我再尝试一下关掉安全相关的组件试一下
Comment 5 Alierwei alibaba_cloud_group 2023-03-27 15:41:39 UTC
最新分析:
从内核角度来看,内核这边是没有问题的,同一个内核在an23上是测试不通过的,但是在 al8 上是测试通过的。内核中相关的 CONFIG 都是使能的。

但是发现,两个OS中的 libcap 版本是不同的:
an23: libcap-2.67
al8 : libcap-2.48

我在 an23 上单独编译了 libcap,将 2.48 的版本编译安装后,测试是可以通过的;另外,我 bisect libcap 的 commit,发现是 aca0764435 commit 导致的(git://git.kernel.org/pub/scm/libs/libcap/libcap.git)

所以这个问题应该是由于 libcap 版本导致的测试不通过。也请 Base OS 开发人员同时跟进。
Comment 6 xuchunmei alibaba_cloud_group 2023-05-04 14:17:30 UTC
an8上是skip还是pass。
社区也有报同样的问题:
https://www.spinics.net/lists/kernel/msg4580891.html
Comment 7 zhangjinglin loongson_group 2024-02-01 15:23:19 UTC
[Anolis23.1][RC1][loongarch64] 也存在相同问题
使用libcap版本为libcap-2.69

软件环境:
#内核版本:
# uname -r
5.10.134-16.2_rc1.an23.loongarch64

#系统版本
# cat /etc/os-release 
NAME="Anolis OS"
VERSION="23"
ID="anolis"
VERSION_ID="23"
PLATFORM_ID="platform:an23"
PRETTY_NAME="Anolis OS 23"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"
BUG_REPORT_URL="https://bugzilla.openanolis.cn/"

iso下载:
https://build.openanolis.cn/kojifiles/output/anolis-23-20240111.16/compose/os/loongarch64/iso/anolis-23-loongarch64-dvd1-20240111.16.iso

Steps to Reproduce:
1)下载内核src源码包
2)rpm-ivh kernel-xxx.src.rpm
3) cd rpmbuild/SOURCES/linux-5.10.134-16.2_xxx.an23/tools/testing/selftests/kselftest_install
4) ./run_kselftest.sh -t clone3:clone3_cap_checkpoint_restore

Actual results:
TAP version 13
1..1
# selftests: clone3: clone3_cap_checkpoint_restore
# TAP version 13
# 1..1
# # Starting 1 tests from 1 test cases.
# #  RUN           global.clone3_cap_checkpoint_restore ...
# # clone3_cap_checkpoint_restore.c:155:clone3_cap_checkpoint_restore:Child has PID 736435
# # clone3() syscall supported
# cap_set_proc: Operation not permitted
# # clone3_cap_checkpoint_restore.c:164:clone3_cap_checkpoint_restore:Expected set_capability() (-1) == 0 (0)
# # clone3_cap_checkpoint_restore.c:165:clone3_cap_checkpoint_restore:Could not set CAP_CHECKPOINT_RESTORE
# # clone3_cap_checkpoint_restore: Test terminated by assertion
# #          FAIL  global.clone3_cap_checkpoint_restore
# not ok 1 global.clone3_cap_checkpoint_restore
# # FAILED: 0 / 1 tests passed.
# # Totals: pass:0 fail:1 xfail:0 xpass:0 skip:0 error:0
not ok 1 selftests: clone3: clone3_cap_checkpoint_restore # exit=1