Bug 8859 - [Anolis23.1 GA][Beta][ANCK-6.6.25-2][aarch64/x86_64]bcc:test_clang.py失败,Failed to compile BPF module <text>
Summary: [Anolis23.1 GA][Beta][ANCK-6.6.25-2][aarch64/x86_64]bcc:test_clang.py失败,Faile...
Status: NEW
Alias: None
Product: Anolis OS 23
Classification: Anolis OS
Component: ---> ToBeTriaged (show other bugs) ---> ToBeTriaged
Version: 23.1
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: beta
Assignee: gaochang
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-04-23 17:12 UTC by zhixin01
Modified: 2024-05-15 15:36 UTC (History)
9 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description zhixin01 alibaba_cloud_group 2024-04-23 17:12:16 UTC
[缺陷描述]:
bcc:test_clang.py失败,Failed to compile BPF module <text>

软件版本:
# rpm -qa |grep bcc
bcc-tools-0.27.0-1.an23.aarch64
bcc-0.27.0-1.an23.aarch64
python3-bcc-0.27.0-1.an23.noarch
bcc-devel-0.27.0-1.an23.aarch64

失败部分日志如下:
# python test_clang.py
....../virtual/main.c:3:33: error: incomplete definition of type 'struct request'
    bpf_trace_printk("%s\n", req->rq_disk->disk_name);
                             ~~~^
include/linux/blkdev.h:32:8: note: forward declaration of 'struct request'
struct request;
       ^
1 error generated.
E........../virtual/main.c:2:1: error: field has incomplete type 'struct key_t'
BPF_HASH(drops, struct key_t);
^
/virtual/include/bcc/helpers.h:278:48: note: expanded from macro 'BPF_HASH'
  BPF_HASHX(__VA_ARGS__, BPF_HASH4, BPF_HASH3, BPF_HASH2, BPF_HASH1)(__VA_ARGS__)
                                               ^
/virtual/main.c:2:24: note: forward declaration of 'struct key_t'
BPF_HASH(drops, struct key_t);
                       ^
/virtual/main.c:2:1: error: field has incomplete type 'struct key_t'
BPF_HASH(drops, struct key_t);
^
/virtual/include/bcc/helpers.h:278:48: note: expanded from macro 'BPF_HASH'
  BPF_HASHX(__VA_ARGS__, BPF_HASH4, BPF_HASH3, BPF_HASH2, BPF_HASH1)(__VA_ARGS__)
                                               ^
/virtual/main.c:2:24: note: forward declaration of 'struct key_t'
BPF_HASH(drops, struct key_t);
                       ^
2 errors generated.
../virtual/main.c:6:12: error: cannot call non-static helper function
    return bar();
           ^
1 error generated.
./virtual/main.c:13:46: error: incomplete definition of type 'struct request'
    bpf_trace_printk("traced start %d\n", req->__data_len);
                                          ~~~^
include/linux/blkdev.h:32:8: note: forward declaration of 'struct request'
struct request;
       ^
1 error generated.
E/virtual/main.c:12:12: error: incomplete definition of type 'struct request'
    if (!rq->start_time_ns)
         ~~^
include/linux/blkdev.h:32:8: note: forward declaration of 'struct request'
struct request;
       ^
/virtual/main.c:15:12: error: incomplete definition of type 'struct request'
    if (!rq->rq_disk || rq->rq_disk->major != 5 ||
         ~~^
include/linux/blkdev.h:32:8: note: forward declaration of 'struct request'
struct request;
       ^
/virtual/main.c:15:27: error: incomplete definition of type 'struct request'
    if (!rq->rq_disk || rq->rq_disk->major != 5 ||
                        ~~^
include/linux/blkdev.h:32:8: note: forward declaration of 'struct request'
struct request;
       ^================部分省略=====================
1 error generated.
.cannot attach kprobe, probe entry may not exist
Current kernel does not have __vfs_read, try vfs_read instead
./virtual/main.c:4:14: error: incomplete definition of type 'struct request'
    if (!(req->bio->bi_flags & 1))
          ~~~^
include/linux/blkdev.h:32:8: note: forward declaration of 'struct request'
struct request;
       ^
/virtual/main.c:6:14: error: incomplete definition of type 'struct request'
    if (((req->bio->bi_flags)))
          ~~~^
include/linux/blkdev.h:32:8: note: forward declaration of 'struct request'
struct request;
       ^
2 errors generated.
E.
======================================================================
ERROR: test_char_array_probe (__main__.TestClang)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/bcc/bcc/bcc-0.27.0/tests/python/test_clang.py", line 319, in test_char_array_probe
    BPF(text=b"""#include <linux/blkdev.h>
  File "/usr/lib/python3.10/site-packages/bcc/__init__.py", line 479, in __init__
    raise Exception("Failed to compile BPF module %s" % (src_file or "<text>"))
Exception: Failed to compile BPF module <text>

======================================================================
ERROR: test_iosnoop (__main__.TestClang)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/bcc/bcc/bcc-0.27.0/tests/python/test_clang.py", line 247, in test_iosnoop
    b = BPF(text=text, debug=0)
  File "/usr/lib/python3.10/site-packages/bcc/__init__.py", line 479, in __init__
    raise Exception("Failed to compile BPF module %s" % (src_file or "<text>"))
Exception: Failed to compile BPF module <text>

[复现概率]:
必现

[复现环境]:
内核:
# uname -r
6.6.25-2_rc1.an23.x86_64

# cat /etc/os-release
NAME="Anolis OS"
VERSION="23"
ID="anolis"
VERSION_ID="23"
PLATFORM_ID="platform:an23"
PRETTY_NAME="Anolis OS 23"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"
BUG_REPORT_URL="https://bugzilla.openanolis.cn/"

CPU信息:
# lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          52 bits physical, 57 bits virtual
  Byte Order:             Little Endian
CPU(s):                   4
  On-line CPU(s) list:    0-3
Vendor ID:                GenuineIntel
  BIOS Vendor ID:         Alibaba Cloud
  Model name:             Intel(R) Xeon(R) Platinum 8475B
    BIOS Model name:      pc-q35-df-2.1  CPU @ 0.0GHz
    BIOS CPU family:      1
    CPU family:           6
    Model:                143
    Thread(s) per core:   2
    Core(s) per socket:   2
    Socket(s):            1
    Stepping:             8
    CPU(s) scaling MHz:   84%
    CPU max MHz:          3800.0000
    CPU min MHz:          800.0000
    BogoMIPS:             5400.00
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtsc
                          p lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pdcm
                          pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ibrs_enhanced
                          fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512c
                          d sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 wbnoinvd ida arat hwp hwp_notify hwp_act_window h
                          wp_epp hwp_pkg_req avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntd
                          q rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk amx_bf16 avx512_fp16 amx_tile amx_int
                          8 arch_capabilities
Virtualization features:
  Hypervisor vendor:      KVM
  Virtualization type:    full
Caches (sum of all):
  L1d:                    96 KiB (2 instances)
  L1i:                    64 KiB (2 instances)
  L2:                     4 MiB (2 instances)
  L3:                     97.5 MiB (1 instance)
NUMA:
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-3
Vulnerabilities:
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Unknown: No mitigations
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Vulnerable
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Enhanced / Automatic IBRS, RSB filling, PBRSB-eIBRS SW sequence
  Srbds:                  Not affected
  Tsx async abort:        Not affected

内存信息:
# free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       585Mi       6.9Gi       364Mi       7.6Gi        13Gi
Swap:             0B          0B          0B


[复现步骤]:
yum install bcc-tools yum-utils rpm-build python3-pyroute2.noarch
git clone https://gitee.com/src-anolis-os/bcc.git --branch a23
cd bcc
yum-builddep -y bcc.spec
rpmbuild -D "_topdir $(pwd)" \
        -D "_sourcedir $(pwd)" \
        -D "_builddir $(pwd)" \
        -bp bcc.spec
cd bcc-0.27.0/tests/python
python test_clang.py

[预期结果]:
用例执行成功

[实际结果]:
用例执行失败
Comment 1 zhangxinyi 2024-04-26 14:32:28 UTC
## 报错分析:
# 9处报错出现在BPF加载的时候,"Failed to compile BPF module“
test_char_array_probe
test_iosnoop
test_jump_table
test_probe_read3
test_probe_read4
test_probe_read_array_accesses8
test_probe_read_nested_member3
test_probe_read_whitelist1
test_probe_read_whitelist2

# test_unop_probe_read
一处报错和BPF有关 Failed to attach BPF program b'kprobe__finish_task_switch' to kprobe b'finish_task_switch', it's not traceable (either non-existing, inlined, or marked as "notrace"
test_task_switch
# 一处报错Assertation Error
test_sscanf_string

## 排查
并不是所有的BPF初始化出现问题,且简单使用BPF没有问题,判断BPF加载没问题,并不确实相关模块,可能是传入text的原因,需要针对每个case具体排查
简单使用案例:
from bcc import BPF

BPF(text='int kprobe____x64_sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print()
Comment 2 xiangzao alibaba_cloud_group 2024-05-08 17:13:45 UTC
1. test_char_array_probe 失败是因为 struct request 的定义从 linux/blkdev.h 变为 linux/blk-mq.h 且 strcut request 里不再有 rq_disk, 如果需要打印 disk_name 需要重新指定为 req->q->disk->disk_name

2. test_iosnoop 同上,需要include <linux/blk-mq.h>

3. test_jump_table 同上,需要include <linux/blk-mq.h>

4. test_probe_read3, test_probe_read4 运行未见问题

5. test_probe_read_array_accesses8 失败原因是因为 struct mm_strcut 里的 rss_stat 已经变成由 struct mm_rss_stat 变为 struct percpu_counter rss_stat[NR_MM_COUNTERS]

6. test_probe_read_nested_member3 失败是因为返回值类型对不上,需要如下commit
https://github.com/iovisor/bcc/commit/9533c2580f907ce2b185014b31524e9f36fc998e

7. test_task_switch 是因为 finish_task_switch 被 GCC10以上版本优化为 finish_task_switch.isra.0,可参考 https://bugzilla.openanolis.cn/show_bug.cgi?id=8861

8. test_unop_probe_read 同123

问题基本是测试用例不匹配
Comment 3 小龙 admin 2024-05-15 15:36:27 UTC
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3182