Bug 3365 - [ANCK-5.10-13-rc1][倚天][aarch64]perf_event_tests:huge_group_start失败,Unexpected error at 1021 Argument list too long
Summary: [ANCK-5.10-13-rc1][倚天][aarch64]perf_event_tests:huge_group_start失败,Unexpected...
Status: RESOLVED INVALID
Alias: None
Product: Anolis OS 8
Classification: Anolis OS
Component: kernel - anck-5.10 (show other bugs) kernel - anck-5.10
Version: 8.6
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: XueShuai
QA Contact: shuming
URL:
Whiteboard:
Keywords:
: 5604 (view as bug list)
Depends on:
Blocks:
 
Reported: 2022-12-06 15:21 UTC by yunhe123
Modified: 2023-06-26 11:51 UTC (History)
9 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description yunhe123 alibaba_cloud_group 2022-12-06 15:21:51 UTC
Description of problem:
perf_event_tests:huge_group_start失败,Unexpected error at 1021 Argument list too long,日志如下:

./tests/corner_cases/huge_group_start
data size=65544
Unexpected error at 1021 Argument list too long
Testing start of max event group...                        FAILED


Version-Release number of selected component (if applicable):
perf -v
perf version 5.10.134-13_rc1.an8.aarch64

版本信息:
cat /etc/os-release
NAME="Anolis OS"
VERSION="8.6"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.6"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.6"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"

内核信息:
uname -r
5.10.134-13_rc1.an8.aarch64

cpu信息:
 lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  1
Core(s) per socket:  8
Socket(s):           1
NUMA node(s):        1
Vendor ID:           ARM
BIOS Vendor ID:      Alibaba Cloud
Model:               0
BIOS Model name:     virt-rhel7.6.0
Stepping:            r0p0
CPU max MHz:         2750.0000
CPU min MHz:         2750.0000
BogoMIPS:            100.00
L1d cache:           64K
L1i cache:           64K
L2 cache:            1024K
L3 cache:            65536K
NUMA node0 CPU(s):   0-7
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh

内存信息:
free -h
              total        used        free      shared  buff/cache   available
Mem:           30Gi       295Mi        23Gi       0.0Ki       6.9Gi        29Gi
Swap:            0B          0B          0B


How reproducible:
必现


Steps to Reproduce:

git clone git://github.com/deater/perf_event_testscd perf_event_tests 
cd perf_event_tests 
make && make install
cd /lkp/benchmarks/perf_event_tests/

##huge_group_start
./tests/corner_cases/huge_group_start
data size=65544
Unexpected error at 1021 Argument list too long
Testing start of max event group...                        FAILED


Actual results:
用例执行fail

Expected results:
用例执行pass

Additional info:
Comment 1 yunhe123 alibaba_cloud_group 2022-12-06 17:38:34 UTC
git clone git://github.com/deater/perf_event_testscd perf_event_tests 
下载地址不对,修改为下列地址:
http://gitlab-sp.alibaba-inc.com/AKTF/perf_event_tests.git
Comment 2 XueShuai alibaba_cloud_group 2023-01-04 17:15:10 UTC
现象:
./tests/corner_cases/huge_group_start
data size=65544
Unexpected error at 1021 Argument list too long
Testing start of max event group...  

分析源码:
// sys_perf_event_open:
	if (!perf_event_validate_size(event)) {
		err = -E2BIG;    
		goto err_locked;
	}

其中,E2BIG定义为:
#define	E2BIG		 7	/* Argument list too long */

具体看:
// perf_event_validate_size
        __perf_event_read_size(event, event->group_leader->nr_siblings + 1);
        //...
	/*
	 * Sum the lot; should not exceed the 64k limit we have on records.
	 * Conservative limit to allow for callchains and other variable fields.
	 */
	if (event->read_size + event->header_size +
	    event->id_header_size + sizeof(struct perf_event_header) >= 16*1024)
		return false;

这里在1021个event,超出限制。

增加打印:
[  667.626187] read_size=56, header_size=0, id_header_size=0, sizeof=8
[  667.626203] read_size=56, header_size=0, id_header_size=0, sizeof=8
[  667.626264] read_size=72, header_size=0, id_header_size=0, sizeof=8
[  667.626277] read_size=88, header_size=0, id_header_size=0, sizeof=8
[  667.626289] read_size=104, header_size=0, id_header_size=0, sizeof=8
...
[  667.634312] read_size=16312, header_size=0, id_header_size=0, sizeof=8
[  667.634326] read_size=16328, header_size=0, id_header_size=0, sizeof=8
[  667.634339] read_size=16344, header_size=0, id_header_size=0, sizeof=8
[  667.634350] read_size=16360, header_size=0, id_header_size=0, sizeof=8
[  667.634365] read_size=16376, header_size=0, id_header_size=0, sizeof=8


// __perf_event_read_size
static void __perf_event_read_size(struct perf_event *event, int nr_siblings)
{
	int entry = sizeof(u64); /* value */
	int size = 0;
	int nr = 1;

	if (event->attr.read_format & PERF_FORMAT_TOTAL_TIME_ENABLED) // valid
		size += sizeof(u64);

	if (event->attr.read_format & PERF_FORMAT_TOTAL_TIME_RUNNING) // valid
		size += sizeof(u64);

	if (event->attr.read_format & PERF_FORMAT_ID) // valid
		entry += sizeof(u64);

	if (event->attr.read_format & PERF_FORMAT_LOST)
		entry += sizeof(u64);

	if (event->attr.read_format & PERF_FORMAT_GROUP) { // valid
		nr += nr_siblings;
		size += sizeof(u64);
	}

	size += entry * nr;
	event->read_size = size;
}

read_size计算方法:event->read_size = size + entry * nr;

根据测试用例:
pe.read_format=PERF_FORMAT_GROUP|PERF_FORMAT_ID|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_TOTAL_TIME_ENABLED;
https://github.com/deater/perf_event_tests/blob/master/tests/corner_cases/huge_group_start.c#L104


size=8+8+8=24
entry=8+8=16
nr_siblings初始值为1
nr初始值为1+1=2

第1个event read_size=24+16*2=56
第1021个event read_size=24+16*1022=16376,加上sizeof(struct perf_event_header)=8,
16376+8=16384>= 16*1024,perf_event_validate_size返回false,符合预期。

在upstream Linux 6.0内核测试,现象相同。
Comment 3 XueShuai alibaba_cloud_group 2023-01-04 17:37:01 UTC
[root@localhost.localdomain /root]
#uname -a
Linux localhost.localdomain 5.10.112-11.an8.aarch64 #1 SMP Tue May 24 15:54:43 CST 2022 aarch64 aarch64 aarch64 GNU/Linux

[root@localhost.localdomain /root]
#./huge_group_start
data size=65544
Unexpected error at 1021 Argument list too long
Testing start of max event group...                        FAILED

5.10.112-11.an8.aarch64存在同样的问题。
Comment 4 XueShuai alibaba_cloud_group 2023-01-04 18:32:40 UTC
(In reply to XueShuai from comment #2)
> 现象:
> ./tests/corner_cases/huge_group_start
> data size=65544
> Unexpected error at 1021 Argument list too long
> Testing start of max event group...  
> 
> 分析源码:
> // sys_perf_event_open:
> 	if (!perf_event_validate_size(event)) {
> 		err = -E2BIG;    
> 		goto err_locked;
> 	}
> 
> 其中,E2BIG定义为:
> #define	E2BIG		 7	/* Argument list too long */
> 
> 具体看:
> // perf_event_validate_size
>         __perf_event_read_size(event, event->group_leader->nr_siblings + 1);
>         //...
> 	/*
> 	 * Sum the lot; should not exceed the 64k limit we have on records.
> 	 * Conservative limit to allow for callchains and other variable fields.
> 	 */
> 	if (event->read_size + event->header_size +
> 	    event->id_header_size + sizeof(struct perf_event_header) >= 16*1024)
> 		return false;
> 
> 这里在1021个event,超出限制。
> 
> 增加打印:
> [  667.626187] read_size=56, header_size=0, id_header_size=0, sizeof=8
> [  667.626203] read_size=56, header_size=0, id_header_size=0, sizeof=8
> [  667.626264] read_size=72, header_size=0, id_header_size=0, sizeof=8
> [  667.626277] read_size=88, header_size=0, id_header_size=0, sizeof=8
> [  667.626289] read_size=104, header_size=0, id_header_size=0, sizeof=8
> ...
> [  667.634312] read_size=16312, header_size=0, id_header_size=0, sizeof=8
> [  667.634326] read_size=16328, header_size=0, id_header_size=0, sizeof=8
> [  667.634339] read_size=16344, header_size=0, id_header_size=0, sizeof=8
> [  667.634350] read_size=16360, header_size=0, id_header_size=0, sizeof=8
> [  667.634365] read_size=16376, header_size=0, id_header_size=0, sizeof=8
> 
> 
> // __perf_event_read_size
> static void __perf_event_read_size(struct perf_event *event, int nr_siblings)
> {
> 	int entry = sizeof(u64); /* value */
> 	int size = 0;
> 	int nr = 1;
> 
> 	if (event->attr.read_format & PERF_FORMAT_TOTAL_TIME_ENABLED) // valid
> 		size += sizeof(u64);
> 
> 	if (event->attr.read_format & PERF_FORMAT_TOTAL_TIME_RUNNING) // valid
> 		size += sizeof(u64);
> 
> 	if (event->attr.read_format & PERF_FORMAT_ID) // valid
> 		entry += sizeof(u64);
> 
> 	if (event->attr.read_format & PERF_FORMAT_LOST)
> 		entry += sizeof(u64);
> 
> 	if (event->attr.read_format & PERF_FORMAT_GROUP) { // valid
> 		nr += nr_siblings;
> 		size += sizeof(u64);
> 	}
> 
> 	size += entry * nr;
> 	event->read_size = size;
> }
> 
> read_size计算方法:event->read_size = size + entry * nr;
> 
> 根据测试用例:
> pe.
> read_format=PERF_FORMAT_GROUP|PERF_FORMAT_ID|PERF_FORMAT_TOTAL_TIME_RUNNING|P
> ERF_FORMAT_TOTAL_TIME_ENABLED;
> https://github.com/deater/perf_event_tests/blob/master/tests/corner_cases/
> huge_group_start.c#L104
> 
> 
> size=8+8+8=24
> entry=8+8=16
> nr_siblings初始值为1
> nr初始值为1+1=2
> 
> 第1个event read_size=24+16*2=56
> 第1021个event read_size=24+16*1022=16376,加上sizeof(struct perf_event_header)=8,
> 16376+8=16384>= 16*1024,perf_event_validate_size返回false,符合预期。
> 
> 在upstream Linux 6.0内核测试,现象相同。


@yunhe123 反馈nightly机器(鲲鹏)上跑的是PASSED,测试结果如下:

#./huge_group_start
data size=65544
Ran out of file descriptors at 1021 Too many open files
Trying to start 1021 events!
Trying to read 65536 bytes into data
Read 16360 bytes
Testing start of max event group...                        PASSED

从Too many open files可以看到,因为打开的fd超过限制,导致测试了1021个event。

通过ulimit解除fd限制,也会出现本问题。

#ulimit -n 2048

#./huge_group_start
data size=65544
Unexpected error at 1021 Argument list too long
Testing start of max event group...                        FAILED


建议:测试时指定max limit。
#./huge_group_start 1021
data size=65544
Trying to start 1021 events!
Trying to read 65536 bytes into data
Read 16360 bytes
Testing start of max event group...                        PASSED

结论:根据上面的分析,现象符合预期,置为invaild。
Comment 5 yunhe123 alibaba_cloud_group 2023-06-26 11:51:17 UTC
*** Bug 5604 has been marked as a duplicate of this bug. ***