Bug 2343 - 高并发性能调优
Summary: 高并发性能调优
Status: NEW
Alias: None
Product: Anolis OS 7
Classification: Anolis OS
Component: BaseOS Modules (show other bugs) BaseOS Modules
Version: 7.9
Hardware: x86_64 Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: Jacob
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-10-08 17:49 UTC by lzs109
Modified: 2022-10-12 15:20 UTC (History)
3 users (show)

See Also:


Attachments
测试结果 (230.12 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2022-10-08 17:49 UTC, lzs109
Details

Note You need to log in before you can comment on or make changes to this bug.
Description lzs109 2022-10-08 17:49:06 UTC
Created attachment 401 [details]
测试结果

原先使用centos7操作系统,部署微服务,然后通过修改limits.conf、sysctl.conf文件来进行高并发调优,现在迁移到龙蜥7.9上,进行相同的调优后,相同配置下龙蜥7.9并发量、稳定性都比centos7要低,性能低15%~50%,稳定性更差。

1、问题说明,测试应用性能,具体调用流程如下:
1.1、请求方将加密后的数据,通过接口调用应用服务;
1.2、应用服务将数据解密,将数据存入缓存;

2、测试环境(龙蜥)
2.1、架构如下:测试服务器上jmeter---->nginx负载均衡------>服务集群;
2.2、测试服务器:1台虚拟机,windows 2012,16核心、64G内存,jmeter5,1000并发;
2.3、Nginx服务器:1台虚拟机,CentOS 7.6,16核心、64G内存;
2.4、应用服务器:5台虚拟机,AnolisOS 7.9,16核心、64G内存,每台部署20个微服务;
2.5、网络环境:内网万兆网卡;
2.6、测试结果
并发:1000
最小:1ms
最大:8664ms
平均:36ms
事务:14532TPS
波动:大,看附件

3、测试环境(CentOS)
3.1、架构如下:测试服务器上jmeter---->nginx负载均衡------>服务集群;
3.2、测试服务器:1台虚拟机,windows 2012,16核心、64G内存,jmeter5,1000并发;
3.3、Nginx服务器:1台虚拟机,CentOS 7.6,16核心、64G内存;
3.4、应用服务器:5台虚拟机,CentOS 7.6,16核心、64G内存,每台部署20个微服务;
3.5、网络环境:内网万兆网卡;
3.6、测试结果
并发:1000
最小:1ms
最大:3323ms
平均:19ms
事务:16605TPS
波动:小,看附件
Comment 1 lzs109 2022-10-08 17:50:07 UTC
业务服务器为10台虚拟机
Comment 2 muming alibaba_cloud_group 2022-10-08 18:02:51 UTC
(In reply to lzs109 from comment #1)
> 业务服务器为10台虚拟机

1. centos和龙蜥启动的虚拟机配置是否一致?
2. 5台centos应用服务器和5台龙蜥os应用服务器和nginx的网络拓扑是否一致?
2. 应用服务器centos和龙蜥os网络相关执行状态数据是否能够在附件里面提供一份(执行netstat -s)
Comment 3 lzs109 2022-10-08 18:10:13 UTC
1. centos和龙蜥启动的虚拟机配置是否一致?
配置相同,16核心CPU、64G内存、300G硬盘

2. 5台centos应用服务器和5台龙蜥os应用服务器和nginx的网络拓扑是否一致?
分别是10台虚拟机服务器,网路拓扑一致,测试机、Nginx相同,不同的是应用服务器的操作系统

3. 应用服务器centos和龙蜥os网络相关执行状态数据是否能够在附件里面提供一份(执行netstat -s)

龙蜥服务器:
Ip:
    3571488 total packets received
    0 forwarded
    0 incoming packets discarded
    3571488 incoming packets delivered
    3622111 requests sent out
    4 outgoing packets dropped
Icmp:
    8 ICMP messages received
    0 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 8
    8 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 8
IcmpMsg:
        InType3: 8
        OutType3: 8
Tcp:
    1325 active connections openings
    692643 passive connection openings
    83 failed connection attempts
    348 connection resets received
    122 connections established
    3568120 segments received
    3623321 segments send out
    74 segments retransmited
    0 bad segments received.
    103 resets sent
Udp:
    3348 packets received
    8 packets to unknown port received.
    0 packet receive errors
    649 packets sent
    0 receive buffer errors
    0 send buffer errors
UdpLite:
TcpExt:
    11 invalid SYN cookies received
    3 resets received for embryonic SYN_RECV sockets
    457322 TCP sockets finished time wait in fast timer
    26921 delayed acks sent
    420 delayed acks further delayed because of locked socket
    Quick ack mode was activated 59 times
    59 times the listen queue of a socket overflowed
    59 SYNs to LISTEN sockets dropped
    2026 packets directly queued to recvmsg prequeue.
    7183 bytes directly received in process context from prequeue
    31602 packet headers predicted
    1387314 acknowledgments not containing data payload received
    28479 predicted acknowledgments
    3 congestion windows recovered without slow start after partial ack
    61 other TCP timeouts
    11 times receiver scheduled too late for direct processing
    TCPSpuriousRTOs: 2
    TCPBacklogDrop: 74
    TCPTimeWaitOverflow: 191067
    TCPRcvCoalesce: 7871
    TCPOFOQueue: 8
    TCPChallengeACK: 3
    TCPSYNChallenge: 2
    TCPSpuriousRtxHostQueues: 2
    TCPAutoCorking: 3474
    TCPSynRetrans: 50
    TCPOrigDataSent: 1484632
    TCPHystartTrainDetect: 1
    TCPHystartTrainCwnd: 25
IpExt:
    InMcastPkts: 2879
    OutMcastPkts: 174
    InOctets: 769399718
    OutOctets: 274717772
    InMcastOctets: 296092
    OutMcastOctets: 18410
    InNoECTPkts: 3583813
    InECT0Pkts: 3



CentOS服务器
Ip:
    169312834 total packets received
    0 forwarded
    0 incoming packets discarded
    169312833 incoming packets delivered
    170155592 requests sent out
    3 outgoing packets dropped
    10 dropped because of missing route
    1 fragments received ok
    2 fragments created
Icmp:
    14 ICMP messages received
    0 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 14
    11 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 11
IcmpMsg:
        InType3: 14
        OutType3: 11
Tcp:
    50682 active connections openings
    33405317 passive connection openings
    19544 failed connection attempts
    1 connection resets received
    61 connections established
    169297994 segments received
    169887869 segments send out
    573966 segments retransmited
    1360 bad segments received.
    157234 resets sent
Udp:
    14810 packets received
    11 packets to unknown port received.
    0 packet receive errors
    15898 packets sent
    0 receive buffer errors
    0 send buffer errors
UdpLite:
TcpExt:
    108250 invalid SYN cookies received
    218 resets received for embryonic SYN_RECV sockets
    4002549 TCP sockets finished time wait in fast timer
    252718 delayed acks sent
    76 delayed acks further delayed because of locked socket
    Quick ack mode was activated 6338 times
    20457 packets directly queued to recvmsg prequeue.
    1176 bytes directly in process context from backlog
    1131178 bytes directly received in process context from prequeue
    410688 packet headers predicted
    3260 packets header predicted and directly queued to user
    66931466 acknowledgments not containing data payload received
    372800 predicted acknowledgments
    38 times recovered from packet loss due to fast retransmit
    10 congestion windows fully recovered without slow start
    1993 congestion windows recovered without slow start after partial ack
    1 timeouts after reno fast retransmit
    241 timeouts in loss state
    40 fast retransmits
    111 retransmits in slow start
    314059 other TCP timeouts
    1 classic Reno fast retransmits failed
    11 connections reset due to unexpected data
    1 connections reset due to early user close
    19640 connections aborted due to timeout
    TCPSpuriousRTOs: 1
    TCPTimeWaitOverflow: 26741357
    TCPRcvCoalesce: 78935
    TCPOFOQueue: 917
    TCPOFOMerge: 75
    TCPChallengeACK: 1431
    TCPSYNChallenge: 1431
    TCPAutoCorking: 4553
    TCPSynRetrans: 296522
    TCPOrigDataSent: 68541696
IpExt:
    InNoRoutes: 1
    InMcastPkts: 2
    InBcastPkts: 2
    InOctets: 35974997008
    OutOctets: 12737175184
    InMcastOctets: 72
    InBcastOctets: 192
    InNoECTPkts: 169312816
    InECT0Pkts: 18
Comment 4 dust.li alibaba_cloud_group 2022-10-10 09:32:33 UTC
(In reply to lzs109 from comment #0)
> Created attachment 401 [details]
> 测试结果
> 
> 原先使用centos7操作系统,部署微服务,然后通过修改limits.conf、sysctl.conf文件来进行高并发调优,现在迁移到龙蜥7.
> 9上,进行相同的调优后,相同配置下龙蜥7.9并发量、稳定性都比centos7要低,性能低15%~50%,稳定性更差。
> 
> 1、问题说明,测试应用性能,具体调用流程如下:
> 1.1、请求方将加密后的数据,通过接口调用应用服务;
> 1.2、应用服务将数据解密,将数据存入缓存;
> 
> 2、测试环境(龙蜥)
> 2.1、架构如下:测试服务器上jmeter---->nginx负载均衡------>服务集群;
> 2.2、测试服务器:1台虚拟机,windows 2012,16核心、64G内存,jmeter5,1000并发;
> 2.3、Nginx服务器:1台虚拟机,CentOS 7.6,16核心、64G内存;
> 2.4、应用服务器:5台虚拟机,AnolisOS 7.9,16核心、64G内存,每台部署20个微服务;
> 2.5、网络环境:内网万兆网卡;
> 2.6、测试结果
> 并发:1000
> 最小:1ms
> 最大:8664ms
> 平均:36ms
> 事务:14532TPS
> 波动:大,看附件
> 
> 3、测试环境(CentOS)
> 3.1、架构如下:测试服务器上jmeter---->nginx负载均衡------>服务集群;
> 3.2、测试服务器:1台虚拟机,windows 2012,16核心、64G内存,jmeter5,1000并发;
> 3.3、Nginx服务器:1台虚拟机,CentOS 7.6,16核心、64G内存;
> 3.4、应用服务器:5台虚拟机,CentOS 7.6,16核心、64G内存,每台部署20个微服务;
> 3.5、网络环境:内网万兆网卡;
> 3.6、测试结果
> 并发:1000
> 最小:1ms
> 最大:3323ms
> 平均:19ms
> 事务:16605TPS
> 波动:小,看附件


有对比测试过 centos 7.9 吗?

centos 7 和 anolis 7 的内核版本分别是啥?

uname -r 看一下?
Comment 5 feitian200603 2022-10-10 14:32:55 UTC
应用有两部分,一部分是解压,另一部分是存入缓存可以分别看一下哪块有性能下降
1,你们后端应用的解密算法是什么,可以比较一下两个环境的解密算法的性能
2,在两个测试环境用numactl 把应用服务绑定到相同的node上看看效果
Comment 6 dust.li alibaba_cloud_group 2022-10-12 15:17:23 UTC
性能不打标的问题已经查明,有两方面:

## 1. 应用服务器(java)性能不打标。
首先,测试环境是应用服务器采用的 anolis 7.9,对应的内核版本为:3.10.0-1160.76.1.0.1.an7.x86_64。对比的 centos 7 是 centos 7.6,内核版本为 3.10.0-693.el7.x86_64。


性能差的主要原因是 nginx 和后端之间采用的是短连接,大量短连接导致大量的 TIME_WAIT 状态的连接,打满了 net.ipv4.tcp_max_tw_buckets,导致无法建立新的连接。
解决方法是通过升级到 anolis 的 4.19.91-26.4.an7.x86_64 内核,这个内核有 anolis 自研的 timewait 状态的连接快速退出的能力,具体通过配置 sysctl -w net.ipv4.tcp_tw_timeout=3 将 timewait 状态连接的超时时间从 60s 降低到 3s。升级内核并修改配置后,性能问题解决。


## 2. nginx 服务器性能不打标
只切换 nginx 不改应用服务器的情况下,测试发现性能同样不符合预期。
对比发现,anolis 的服务器上,额外安装了 libvirt,并且随着 libvirt 配置了 NAT 规则,大量的 nat 规则严重影响了短连接到性能,导致 anolis 的短连接性能不如 centos。
解决方法:
卸载 libvirt: yum remove libvirt*
删除 libvirt 所创建的网卡并删除相应的 nat 规则:
ip link set dev virbr0 down
brctl delbr virbr0
ip link del virbr0-nic
iptables -F -t nat
iptables -F
Comment 7 dust.li alibaba_cloud_group 2022-10-12 15:20:25 UTC
最后,总结一下:


短连接性能不好时,需要关注两方面:
1.  iptables 规则,特别是 NAT 规则,确保没有相应规则,否则会影响性能
    ```iptables -L -t nat
       iptables -L
    ```

2. time_wait 状态的连接数是否达到了上线
   sysctl -a | grep tcp_max_tw_buckets
   netstat -ant | grep -i time_wait | wc -l