Bug 8873 - [Anolis23.1 GA][Beta][ANCK-6.6.25-2][aarch64/x86_64]bcc:test_tools_smoke.py用例执行失败,报错Failed to compile BPF module <text>以及Failed to attach BPF program b'kprobe___cond_resched' to kprobe b'_cond_resched'
Summary: [Anolis23.1 GA][Beta][ANCK-6.6.25-2][aarch64/x86_64]bcc:test_tools_smoke.py用例...
Status: CLOSED WONTFIX
Alias: None
Product: Anolis OS 23
Classification: Anolis OS
Component: ---> ToBeTriaged (show other bugs) ---> ToBeTriaged
Version: 23.1
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: beta
Assignee: gaochang
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-04-24 10:38 UTC by zhixin01
Modified: 2024-05-22 10:40 UTC (History)
9 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description zhixin01 alibaba_cloud_group 2024-04-24 10:38:41 UTC
[缺陷描述]:
bcc:test_tools_smoke.py用例执行失败,报错Failed to compile BPF module <text>以及Failed to attach BPF program b'kprobe___cond_resched' to kprobe b'_cond_resched'
,x86_64与aarch64架构都存在此问题

软件版本:
# rpm -qa |grep bcc
bcc-tools-0.27.0-1.an23.aarch64
bcc-0.27.0-1.an23.aarch64
python3-bcc-0.27.0-1.an23.noarch
bcc-devel-0.27.0-1.an23.aarch64

失败日志如下:
# python test_tools_smoke.py
In file included from /virtual/main.c:2:
In file included from include/uapi/linux/ptrace.h:183:
In file included from arch/x86/include/asm/ptrace.h:5:
In file included from arch/x86/include/asm/segment.h:7:
arch/x86/include/asm/ibt.h:77:8: warning: 'nocf_check' attribute ignored; use -fcf-protection to enable the attribute [-Wignored-attributes]
extern __noendbr u64 ibt_save(bool disable);
       ^
arch/x86/include/asm/ibt.h:32:34: note: expanded from macro '__noendbr'
#define __noendbr       __attribute__((nocf_check))
                                       ^
arch/x86/include/asm/ibt.h:78:8: warning: 'nocf_check' attribute ignored; use -fcf-protection to enable the attribute [-Wignored-attributes]
extern __noendbr void ibt_restore(u64 save);
       ^
arch/x86/include/asm/ibt.h:32:34: note: expanded from macro '__noendbr'
#define __noendbr       __attribute__((nocf_check))
                                       ^
/virtual/main.c:285:19: error: incomplete definition of type 'struct request'
        dev_t dev = (next->rq_disk)->part0.__dev.devt;
                     ~~~~^
include/linux/blkdev.h:32:8: note: forward declaration of 'struct request'
struct request;
       ^
/virtual/main.c:291:19: error: incomplete definition of type 'struct request'
        key.sector = next->__sector;
                     ~~~~^
include/linux/blkdev.h:32:8: note: forward declaration of 'struct request'
struct request;
       ^
2 warnings and 2 errors generated.
Traceback (most recent call last):
  File "/root/zx/tone/run/bcc/tests/python/../../tools/alibiolatency.py", line 296, in <module>
    b = BPF(text=bpf_text)
  File "/usr/lib/python3.10/site-packages/bcc/__init__.py", line 479, in __init__
    raise Exception("Failed to compile BPF module %s" % (src_file or "<text>"))
Exception: Failed to compile BPF module <text>
======================省略部分===================================
cannot attach kprobe, probe entry may not exist
Traceback (most recent call last):
  File "/root/zx/tone/run/bcc/tests/python/../../tools/alisysdelay.py", line 232, in <module>
    b = BPF(text=bpf_text)
  File "/usr/lib/python3.10/site-packages/bcc/__init__.py", line 487, in __init__
    self._trace_autoload()
  File "/usr/lib/python3.10/site-packages/bcc/__init__.py", line 1456, in _trace_autoload
    self.attach_kprobe(
  File "/usr/lib/python3.10/site-packages/bcc/__init__.py", line 845, in attach_kprobe
    raise Exception("Failed to attach BPF program %s to kprobe %s"
Exception: Failed to attach BPF program b'kprobe___cond_resched' to kprobe b'_cond_resched', it's not traceable (either non-existing, inlined, or marked as "notrace")
========================省略==================================
FAIL: test_biotop (__main__.SmokeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/zx/tone/run/bcc/tests/python/test_tools_smoke.py", line 97, in test_biotop
    self.run_with_duration("biotop.py 1 1")
  File "/root/zx/tone/run/bcc/tests/python/test_tools_smoke.py", line 38, in run_with_duration
    self.assertEqual(0,     # clean exit
AssertionError: 0 != 1

======================================================================
FAIL: test_bitesize (__main__.SmokeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/zx/tone/run/bcc/tests/python/test_tools_smoke.py", line 100, in test_bitesize
    self.run_with_int("biotop.py")
  File "/root/zx/tone/run/bcc/tests/python/test_tools_smoke.py", line 59, in run_with_int
    self.assertTrue((rc == 0 and allow_early) or rc == 124
AssertionError: False is not true : rc was 1
Command was expected to do one of:
        Be killed by SIGINT


======================================================================
FAIL: test_slabratetop (__main__.SmokeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/zx/tone/run/bcc/tests/python/test_tools_smoke.py", line 307, in test_slabratetop
    self.run_with_duration("slabratetop.py 1 1")
  File "/root/zx/tone/run/bcc/tests/python/test_tools_smoke.py", line 38, in run_with_duration
    self.assertEqual(0,     # clean exit
AssertionError: 0 != 1

======================================================================
FAIL: test_tcpcong (__main__.SmokeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/zx/tone/run/bcc/tests/python/test_tools_smoke.py", line 368, in test_tcpcong
    self.run_with_duration("tcpcong.py 1 1")
  File "/root/zx/tone/run/bcc/tests/python/test_tools_smoke.py", line 38, in run_with_duration
    self.assertEqual(0,     # clean exit
AssertionError: 0 != 1

======================================================================
FAIL: test_tcpretrans (__main__.SmokeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/zx/tone/run/bcc/tests/python/test_tools_smoke.py", line 357, in test_tcpretrans
    self.run_with_int("tcpretrans.py")
  File "/root/zx/tone/run/bcc/tests/python/test_tools_smoke.py", line 59, in run_with_int
    self.assertTrue((rc == 0 and allow_early) or rc == 124
AssertionError: False is not true : rc was 1
Command was expected to do one of:
        Be killed by SIGINT


======================================================================
FAIL: test_tcptop (__main__.SmokeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/zx/tone/run/bcc/tests/python/test_tools_smoke.py", line 365, in test_tcptop
    self.run_with_duration("tcptop.py 1 1")
  File "/root/zx/tone/run/bcc/tests/python/test_tools_smoke.py", line 38, in run_with_duration
    self.assertEqual(0,     # clean exit
AssertionError: 0 != 1

----------------------------------------------------------------------
Ran 95 tests in 281.370s

FAILED (failures=11)

[复现概率]:
必现

[复现环境]:
内核:
# uname -r
6.6.25-2_rc1.an23.x86_64

# cat /etc/os-release
NAME="Anolis OS"
VERSION="23"
ID="anolis"
VERSION_ID="23"
PLATFORM_ID="platform:an23"
PRETTY_NAME="Anolis OS 23"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"
BUG_REPORT_URL="https://bugzilla.openanolis.cn/"

CPU信息:
# lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          52 bits physical, 57 bits virtual
  Byte Order:             Little Endian
CPU(s):                   4
  On-line CPU(s) list:    0-3
Vendor ID:                GenuineIntel
  BIOS Vendor ID:         Alibaba Cloud
  Model name:             Intel(R) Xeon(R) Platinum 8475B
    BIOS Model name:      pc-q35-df-2.1  CPU @ 0.0GHz
    BIOS CPU family:      1
    CPU family:           6
    Model:                143
    Thread(s) per core:   2
    Core(s) per socket:   2
    Socket(s):            1
    Stepping:             8
    CPU(s) scaling MHz:   84%
    CPU max MHz:          3800.0000
    CPU min MHz:          800.0000
    BogoMIPS:             5400.00
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtsc
                          p lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pdcm
                          pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ibrs_enhanced
                          fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512c
                          d sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 wbnoinvd ida arat hwp hwp_notify hwp_act_window h
                          wp_epp hwp_pkg_req avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntd
                          q rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk amx_bf16 avx512_fp16 amx_tile amx_int
                          8 arch_capabilities
Virtualization features:
  Hypervisor vendor:      KVM
  Virtualization type:    full
Caches (sum of all):
  L1d:                    96 KiB (2 instances)
  L1i:                    64 KiB (2 instances)
  L2:                     4 MiB (2 instances)
  L3:                     97.5 MiB (1 instance)
NUMA:
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-3
Vulnerabilities:
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Unknown: No mitigations
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Vulnerable
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Enhanced / Automatic IBRS, RSB filling, PBRSB-eIBRS SW sequence
  Srbds:                  Not affected
  Tsx async abort:        Not affected

内存信息:
# free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       585Mi       6.9Gi       364Mi       7.6Gi        13Gi
Swap:             0B          0B          0B


[复现步骤]:
yum install bcc-tools yum-utils rpm-build python3-pyroute2.noarch
git clone https://gitee.com/src-anolis-os/bcc.git --branch a23
cd bcc
yum-builddep -y bcc.spec
rpmbuild -D "_topdir $(pwd)" \
        -D "_sourcedir $(pwd)" \
        -D "_builddir $(pwd)" \
        -bp bcc.spec
cd bcc-0.27.0/tests/python
python test_tools_smoke.py

[预期结果]:
用例执行成功

[实际结果]:
用例执行失败

[说明]
用例开始执行报错找不到相关python脚本,根据Aone 52865923单修改了TOOLS_DIR,修改前执行用例日志如下:
# python test_tools_smoke.py
timeout: failed to run command ‘/bcc/tools/alibiolatency.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/aliexitsnoop.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/alihardirqs.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/aliext4writeslower.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/alimutexsnoop.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/alirunqinfo.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/alisoftirqs.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/alisyscount.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/alisysdelay.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/alisyslatency.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/aliworkslower.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/argdist.py’: No such file or directory
CRITICAL:root:WARNING! Test test_argdist (__main__.SmokeTests) failed, but marked as passed because it is decorated with @mayFail.
CRITICAL:root:  The reason why this mayFail was: This fails on github actions environment, and needs to be fixed
CRITICAL:root:  The failure was: "0 != 127"
CRITICAL:root:  Stacktrace: "Traceback (most recent call last):
  File "/root/zx/tone/run/bcc/tests/python/utils.py", line 35, in wrapper
    res = func(*args, **kwargs)
  File "/root/zx/tone/run/bcc/tests/python/test_tools_smoke.py", line 78, in test_argdist
    self.run_with_duration("argdist.py -v -C 'p::do_sys_open()' -n 1 -i 1")
  File "/root/zx/tone/run/bcc/tests/python/test_tools_smoke.py", line 37, in run_with_duration
    self.assertEqual(0,     # clean exit
  File "/usr/lib64/python3.10/unittest/case.py", line 845, in assertEqual
    assertion_func(first, second, msg=msg)
  File "/usr/lib64/python3.10/unittest/case.py", line 838, in _baseAssertEqual
    raise self.failureException(msg)
AssertionError: 0 != 127
"
.timeout: failed to run command ‘/bcc/tools/bashreadline.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/bindsnoop.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/biolatency.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/biosnoop.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/biotop.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/biotop.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/bpflist.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/btrfsdist.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/btrfsslower.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/cachestat.py’: No such file or directory
F.timeout: failed to run command ‘/bcc/tools/capable.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/compactsnoop.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/cpudist.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/cpuunclaimed.py’: No such file or directory
F..timeout: failed to run command ‘/bcc/tools/dcsnoop.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/dcstat.py’: No such file or directory
F.timeout: failed to run command ‘/bcc/tools/drsnoop.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/execsnoop.py’: No such file or directory
Ftimeout: failed to run command ‘/bcc/tools/ext4dist.py’: No such file or directory
...
Comment 1 zhangxinyi 2024-04-26 15:30:22 UTC
错误中一些是超时,一些是本身跑失败了原因
本身的原因有这些:
test_tcptop测试用例报错,需要确认tcp_sendpage函数能不能被trace
Exception: Failed to attach BPF program b'kprobe__tcp_sendpage' to kprobe b'tcp_sendpage', it's not traceable (either non-existing, inlined, or marked as "notrace")

test_tcpcong测试用例报错,需要确认bpf_text
"Failed to compile BPF module %s" % (src_file or "<text>"

test_slabratetop测试用例报错,需要确认bpf_text
"Failed to compile BPF module %s" % (src_file or "<text>"

test_biolatency测试用例报错,需要确认blk_account_io_done函数能不能被trace
Exception: Failed to attach BPF program b'trace_req_done' to kprobe b'blk_account_io_done', it's not traceable (either non-existing, inlined, or marked as "notrace")

test_biotop测试用例报错,需要确认blk_account_io_start函数能不能被trace
Exception: Failed to attach BPF program b'trace_pid_start' to kprobe b'blk_account_io_start', it's not traceable (either non-existing, inlined, or marked as "notrace")
Comment 2 xiangzao alibaba_cloud_group 2024-05-08 10:07:33 UTC
1. argdist.py 报错是因为 bashreadline 需要静态链接到libreadline,按照以下方法将 name=name 改为 name="/lib64/libreadline.so.8" 即可成功运行
https://github.com/iovisor/bcc/issues/1851

2. biolatency.py, biosnoop.py, biotop.py 报错是因为 blk_account_io_done 是 inline的,不能被trace

3. alisysdelay.py 报错是因为 _cond_resched 是inline的,不能被trace

4. tcpcong.py 1 1 运行无异常

5. tcpretrans.py 运行无异常

6. tcptop.py 报错是因为 tcp_sendpage 没有这个函数

7. ttysnoop.py 报错是因为老版本 ttysnoop 用例没有适配新内核版本,将bcc升级到030或者打上适配补丁后可以正常运行
https://github.com/iovisor/bcc/commit/ce5c8938c494eb03f0784c6e4ae81507139ca779

fail的用例基本是测试case没适配,目前采用bcc027版本,对应linux内核版本为6.2,当前an23已经是6.6LTS,包含了部分高版本补丁,后续建议采用029或者030版本的bcc配套用例进行测试,本例置为wontfix
Comment 3 zhixin01 alibaba_cloud_group 2024-05-08 17:18:42 UTC
(In reply to xiangzao from comment #2)
> 1. argdist.py 报错是因为 bashreadline 需要静态链接到libreadline,按照以下方法将 name=name 改为
> name="/lib64/libreadline.so.8" 即可成功运行
> https://github.com/iovisor/bcc/issues/1851
> 
> 2. biolatency.py, biosnoop.py, biotop.py 报错是因为 blk_account_io_done 是
> inline的,不能被trace
> 
> 3. alisysdelay.py 报错是因为 _cond_resched 是inline的,不能被trace
> 
> 4. tcpcong.py 1 1 运行无异常
> 
> 5. tcpretrans.py 运行无异常
> 
> 6. tcptop.py 报错是因为 tcp_sendpage 没有这个函数
> 
> 7. ttysnoop.py 报错是因为老版本 ttysnoop 用例没有适配新内核版本,将bcc升级到030或者打上适配补丁后可以正常运行
> https://github.com/iovisor/bcc/commit/
> ce5c8938c494eb03f0784c6e4ae81507139ca779
> 
> fail的用例基本是测试case没适配,目前采用bcc027版本,对应linux内核版本为6.2,当前an23已经是6.
> 6LTS,包含了部分高版本补丁,后续建议采用029或者030版本的bcc配套用例进行测试,本例置为wontfix

麻烦问下,应该下载https://gitee.com/src-anolis-os/bcc.git仓库的哪个分支代码,现在下载的a23分支代码
Comment 4 zhixin01 alibaba_cloud_group 2024-05-22 10:40:16 UTC
如上述开发定位所说:后续建议采用029或者030版本的bcc配套用例进行测试,本问题单关闭。