Bug 1752 - kernel crash
Summary: kernel crash
Status: RESOLVED FIXED
Alias: None
Product: ANCK 4.19 Dev
Classification: ANCK
Component: drivers (show other bugs) drivers
Version: 4.19-026.x
Hardware: All Linux
: P3-Medium S2-major
Target Milestone: ---
Assignee: maqiao
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-07-28 17:54 UTC by sunwuhao
Modified: 2022-08-11 16:03 UTC (History)
7 users (show)

See Also:


Attachments
vmcore-dmesg (129.42 KB, text/plain)
2022-07-28 17:54 UTC, sunwuhao
Details
vmcore-dmesg (125.52 KB, text/plain)
2022-07-28 17:55 UTC, sunwuhao
Details
bnxt-testko-v1 (2.24 MB, application/octet-stream)
2022-07-29 17:09 UTC, maqiao_mq
Details
bnxt-testko-v2 (2.24 MB, application/octet-stream)
2022-07-29 17:45 UTC, maqiao_mq
Details
vmcore-dmesg-8-2 (129.11 KB, text/plain)
2022-08-02 14:19 UTC, sunwuhao
Details
vmcore-dmesg-8-2 (123.71 KB, text/plain)
2022-08-02 14:20 UTC, sunwuhao
Details
url of 4.19.91-cbp.git.5b8703df0.an8.x86_64 (2.51 KB, text/csv)
2022-08-04 17:26 UTC, maqiao_mq
Details
url(with md5sum) of 4.19.91-cbp.git.5b8703df0.an8.x86_64 (2.87 KB, text/csv)
2022-08-05 16:42 UTC, maqiao
Details

Note You need to log in before you can comment on or make changes to this bug.
Description sunwuhao 2022-07-28 17:54:49 UTC
Created attachment 341 [details]
vmcore-dmesg

系统频繁 crash

系统版本:Anolis OS release 8.4
kernel 版本:4.19.91-26.an8.x86_64

vmcore-dmesg 如附件所示
Comment 1 sunwuhao 2022-07-28 17:55:26 UTC
Created attachment 342 [details]
vmcore-dmesg
Comment 2 xunlei alibaba_cloud_group 2022-07-28 18:08:04 UTC
[  187.572539] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
[  187.572940] Modules linked in: cls_bpf(E) sch_ingress(E) xt_TPROXY(E) nf_tproxy_ipv6(E) nf_tproxy_ipv4(E) xt_CT(E) veth(E) bpfilter(E) xt_socket(E) nf_socket_ipv4(E) nf_socket_ipv6(E) ip6table_filter(E) ip6table_raw(E) ip6table_mangle(E) ip6_tables(E) iptable_filter(E) iptable_raw(E) iptable_mangle(E) iptable_nat(E) nft_chain_route_ipv6(E) ip6t_MASQUERADE(E) nft_chain_route_ipv4(E) nft_chain_nat_ipv6(E) nf_nat_ipv6(E) ipt_MASQUERADE(E) xt_conntrack(E) xt_comment(E) nft_counter(E) xt_mark(E) nft_compat(E) nft_chain_nat_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_tables(E) nfnetlink(E) dm_mod(E) 8021q(E) sch_netem(E) garp(E) mrp(E) overlay(E) rpcrdma(E) intel_rapl_msr(E) intel_rapl_common(E) sunrpc(E) rdma_ucm(E) ib_uverbs(E) ib_srpt(E) ib_isert(E) isst_if_common(E) iscsi_target_mod(E) target_core_mod(E) ib_iser(E)
[  187.572961]  rdma_cm(E) skx_edac(E) iw_cm(E) ib_cm(E) nfit(E) libiscsi(E) scsi_transport_iscsi(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) ipmi_ssif(E) iTCO_wdt(E) iTCO_vendor_support(E) kvm_intel(E) bnxt_re(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ib_core(E) ghash_clmulni_intel(E) pcbc(E) joydev(E) mei_me(E) aesni_intel(E) glue_helper(E) pcspkr(E) mousedev(E) i2c_i801(E) lpc_ich(E) mei(E) ioatdma(E) dca(E) wmi(E) ipmi_si(E) ipmi_devintf(E) pcc_cpufreq(E) ipmi_msghandler(E) acpi_pad(E) acpi_power_meter(E) sch_fq_codel(E) bridge(E) stp(E) llc(E) toa(OE) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) sg(E) nvme(E) crc32c_intel(E) nvme_core(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E)
[  187.572985]  fb_sys_fops(E) ttm(E) ahci(E) drm(E) libahci(E) bnxt_en(E) i2c_core(E) libata(E)
[  187.572990] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G        W  OE     4.19.91-26.an8.x86_64 #1
[  187.572991] Hardware name: Inspur SA5212M5/YZMB-00882-10C, BIOS 4.1.8 05/08/2020
[  187.572992] RIP: 0010:__netdev_pick_tx+0x19c/0x220
[  187.572994] Code: 48 8d 04 83 0f b7 98 54 08 00 00 44 0f b7 a8 52 08 00 00 e9 e5 fe ff ff 83 e8 01 0f b7 d0 44 39 ea 73 07 01 d8 e9 08 ff ff ff <44> 29 ea 44 39 ea 73 f8 89 d0 01 d8 e9 f7 fe ff ff 4c 89 e6 4c 89
[  187.572995] RSP: 0018:ffff963cff403bf0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[  187.572996] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
[  187.572997] RDX: 0000000000000000 RSI: ffff963cf299a800 RDI: ffff963cf236e000
[  187.572997] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  187.572998] R10: ffff963cff403b30 R11: 0000000000000008 R12: ffff963cf299a800
[  187.572999] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff963cf299a800
[  187.572999] FS:  0000000000000000(0000) GS:ffff963cff400000(0000) knlGS:0000000000000000
[  187.573000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  187.573001] CR2: 0000561975b9e478 CR3: 0000006d4c20a003 CR4: 00000000007706f0
[  187.573001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  187.573002] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  187.573002] PKRU: 55555554
[  187.573002] Call Trace:
[  187.573004]  <IRQ>
[  187.573009]  netdev_pick_tx+0xa3/0xb0
[  187.573011]  __dev_queue_xmit+0x1f1/0x8a0
[  187.573014]  __bpf_redirect+0x97/0x2c0
[  187.573016]  __netif_receive_skb_core+0xca9/0xf90
[  187.573021]  ? ip_local_deliver+0x42/0xd0
[  187.573024]  ? inet_gro_receive+0x243/0x2a0
[  187.573025]  __netif_receive_skb_one_core+0x26/0x50
[  187.573027]  netif_receive_skb_internal+0x32/0xc0
[  187.573028]  napi_gro_receive+0xb8/0xe0
[  187.573034]  bnxt_rx_pkt+0xb14/0xfa0 [bnxt_en]
[  187.573039]  ? alloc_skb_with_frags+0xe0/0x1a0
[  187.573041]  bnxt_poll+0xf5/0x810 [bnxt_en]
[  187.573046]  ? tick_sched_handle.isra.5+0x60/0x60
[  187.573047]  net_rx_action+0x139/0x360
[  187.573052]  __do_softirq+0xd2/0x2b5
[  187.573056]  irq_exit+0xc8/0x100
[  187.573058]  do_IRQ+0x7f/0xe0
[  187.573060]  common_interrupt+0xf/0xf
[  187.573061]  </IRQ>
[  187.573065] RIP: 0010:cpuidle_enter_state+0xb9/0x320
[  187.573066] Code: e8 4c f6 9a ff 80 7c 24 0b 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3b 02 00 00 31 ff e8 6e 2f a1 ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 ba ff ff ff 7f 48 39 c3
[  187.573066] RSP: 0018:ffffffff90203e88 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd3
[  187.573067] RAX: ffff963cff422640 RBX: 00000026521cd89c RCX: 000000000000001f
[  187.573068] RDX: 00000026521cd89c RSI: 000000003158af9d RDI: 0000000000000000
[  187.573069] RBP: ffffcf22bf606270 R08: 0000000000000002 R09: 0000000000021e80
[  187.573069] R10: 0112b141c92c930e R11: ffff963cff421624 R12: 0000000000000001
[  187.573070] R13: ffffffff90396f78 R14: 0000000000000001 R15: 0000000000000000
[  187.573073]  ? cpuidle_enter_state+0x94/0x320
[  187.573077]  do_idle+0x210/0x250
[  187.573078]  cpu_startup_entry+0x5f/0x70
[  187.573083]  start_kernel+0x518/0x523
[  187.573088]  secondary_startup_64+0xb5/0xc0
[  187.573090] Kernel panic - not syncing: softlockup: hung tasks
[  187.573484] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G        W  OEL    4.19.91-26.an8.x86_64 #1
[  187.574184] Hardware name: Inspur SA5212M5/YZMB-00882-10C, BIOS 4.1.8 05/08/2020
[  187.574875] Call Trace:
[  187.575257]  <IRQ>
[  187.575639]  dump_stack+0x66/0x90
[  187.576027]  panic+0xf9/0x25c
[  187.576413]  ? startup_64+0x1/0x30
[  187.576798]  ? startup_64+0x30/0x30
[  187.577187]  watchdog_timer_fn.cold.2+0x16/0x16
[  187.577578]  ? report_softlockup+0x1a0/0x1a0
[  187.577968]  __hrtimer_run_queues+0xf0/0x260
[  187.578356]  hrtimer_interrupt+0x100/0x220
[  187.578745]  ? aperfmperf_snapshot_khz+0x67/0x90
[  187.579136]  smp_apic_timer_interrupt+0x6a/0x140
[  187.579526]  apic_timer_interrupt+0xf/0x20
[  187.579914] RIP: 0010:__netdev_pick_tx+0x19c/0x220
[  187.580305] Code: 48 8d 04 83 0f b7 98 54 08 00 00 44 0f b7 a8 52 08 00 00 e9 e5 fe ff ff 83 e8 01 0f b7 d0 44 39 ea 73 07 01 d8 e9 08 ff ff ff <44> 29 ea 44 39 ea 73 f8 89 d0 01 d8 e9 f7 fe ff ff 4c 89 e6 4c 89
[  187.581330] RSP: 0018:ffff963cff403bf0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[  187.582020] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
[  187.582419] RDX: 0000000000000000 RSI: ffff963cf299a800 RDI: ffff963cf236e000
[  187.582816] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  187.583214] R10: ffff963cff403b30 R11: 0000000000000008 R12: ffff963cf299a800
[  187.583612] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff963cf299a800
[  187.584011]  ? apic_timer_interrupt+0xa/0x20
[  187.584401]  netdev_pick_tx+0xa3/0xb0
[  187.584788]  __dev_queue_xmit+0x1f1/0x8a0
[  187.585177]  __bpf_redirect+0x97/0x2c0
[  187.585564]  __netif_receive_skb_core+0xca9/0xf90
[  187.585954]  ? ip_local_deliver+0x42/0xd0
[  187.586342]  ? inet_gro_receive+0x243/0x2a0
[  187.586731]  __netif_receive_skb_one_core+0x26/0x50
[  187.587121]  netif_receive_skb_internal+0x32/0xc0
[  187.587512]  napi_gro_receive+0xb8/0xe0
[  187.587900]  bnxt_rx_pkt+0xb14/0xfa0 [bnxt_en]
[  187.588291]  ? alloc_skb_with_frags+0xe0/0x1a0
[  187.588682]  bnxt_poll+0xf5/0x810 [bnxt_en]
[  187.589071]  ? tick_sched_handle.isra.5+0x60/0x60
[  187.589462]  net_rx_action+0x139/0x360
[  187.589850]  __do_softirq+0xd2/0x2b5
[  187.590237]  irq_exit+0xc8/0x100
[  187.590626]  do_IRQ+0x7f/0xe0
[  187.591011]  common_interrupt+0xf/0xf
[  187.591397]  </IRQ>
[  187.591777] RIP: 0010:cpuidle_enter_state+0xb9/0x320
[  187.592168] Code: e8 4c f6 9a ff 80 7c 24 0b 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3b 02 00 00 31 ff e8 6e 2f a1 ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 ba ff ff ff 7f 48 39 c3
[  187.593190] RSP: 0018:ffffffff90203e88 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd3
[  187.593882] RAX: ffff963cff422640 RBX: 00000026521cd89c RCX: 000000000000001f
[  187.594283] RDX: 00000026521cd89c RSI: 000000003158af9d RDI: 0000000000000000
[  187.594681] RBP: ffffcf22bf606270 R08: 0000000000000002 R09: 0000000000021e80
[  187.595079] R10: 0112b141c92c930e R11: ffff963cff421624 R12: 0000000000000001
[  187.595478] R13: ffffffff90396f78 R14: 0000000000000001 R15: 0000000000000000
[  187.595878]  ? cpuidle_enter_state+0x94/0x320
[  187.596268]  do_idle+0x210/0x250
[  187.596653]  cpu_startup_entry+0x5f/0x70
[  187.597040]  start_kernel+0x518/0x523
[  187.597430]  secondary_startup_64+0xb5/0xc0
[  187.703923] Kernel Offset: 0xe000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Comment 3 xunlei alibaba_cloud_group 2022-07-28 19:25:22 UTC
[  187.573009]  netdev_pick_tx+0xa3/0xb0
[  187.573011]  __dev_queue_xmit+0x1f1/0x8a0
[  187.573014]  __bpf_redirect+0x97/0x2c0  
[  187.573016]  __netif_receive_skb_core+0xca9/0xf90

RX softirq中在搞发包,__bpf_redirect(),看这个现场是网络TC相关的,是不是配置了TC相关的bpf程序,建议去掉观察一下。
Comment 4 xunlei alibaba_cloud_group 2022-07-29 14:44:09 UTC
两次都打在同一个位置 netdev_pick_tx+0xa3/0xb0 上,在这里存在锁竞争之类的场景。
Comment 5 maqiao_mq alibaba_cloud_group 2022-07-29 17:09:28 UTC
Created attachment 343 [details]
bnxt-testko-v1

bnxt驱动-测试版本v1
Comment 6 maqiao_mq alibaba_cloud_group 2022-07-29 17:19:16 UTC
根据dmesg的日志:
> ...
> [  187.572992] RIP: 0010:__netdev_pick_tx+0x19c/0x220
> ...

结合kernel-debuginfo,定位到代码为net/core/dev.c:2858,代码逻辑如下:
>  static u16 skb_tx_hash(const struct net_device *dev,
>  		       const struct net_device *sb_dev,
>  		       struct sk_buff *skb)
>  {
>  	u32 hash;
>  	u16 qoffset = 0;
>  	u16 qcount = dev->real_num_tx_queues;
>  
>  	if (dev->num_tc) {
>  		u8 tc = netdev_get_prio_tc_map(dev, skb->priority);
>  
>  		qoffset = sb_dev->tc_to_txq[tc].offset;
>  		qcount = sb_dev->tc_to_txq[tc].count;
>  	}
>  
>  	if (skb_rx_queue_recorded(skb)) {
>  		hash = skb_get_rx_queue(skb);
>  		while (unlikely(hash >= qcount))
>  			hash -= qcount;
>  		return hash + qoffset;
>  	}
>  
>  	return (u16) reciprocal_scale(skb_get_hash(skb), qcount) + qoffset;
>  }

其中第2858行,对应代码"hash -= qcount;"

猜测是因为qcount为0,导致函数一直死循环,最终触发softlockup。

根据代码drivers/net/ethernet/broadcom/bnxt/bnxt.c:5857,即函数bnxt_setup_msix()的逻辑:
>  static void bnxt_setup_msix(struct bnxt *bp)
>  {
>  	  const int len = sizeof(bp->irq_tbl[0].name);
>  	  struct net_device *dev = bp->dev;
>  	  int tcs, i;
>    
>  	  tcs = netdev_get_num_tc(dev);
>  	  if (tcs > 1) {
>  	  	int i, off, count;
>    
>  	  	for (i = 0; i < tcs; i++) {
>  	  		count = bp->tx_nr_rings_per_tc;
>  	  		off = i * count;
>  	  		netdev_set_tc_queue(dev, i, count, off);
>  	  	}
>  	  }
>    ...
>  }

当tcs为1时,sb_dev->tc_to_txq[tc].count将不会被设置,其值为默认值0,导致skb_tx_hash()中发生死循环。

目前在bnxt_setup_msix()中手动添加逻辑,使得当tcs为1时,也将sb_dev->tc_to_txq[tc].count设置为1.即:
>  static void bnxt_setup_msix(struct bnxt *bp)
>  {
>  	  const int len = sizeof(bp->irq_tbl[0].name);
>  	  struct net_device *dev = bp->dev;
>  	  int tcs, i;
>    
>  	  tcs = netdev_get_num_tc(dev);
>  	  if (tcs > 1) {
>  	  	int i, off, count;
>    
>  	  	for (i = 0; i < tcs; i++) {
>  	  		count = bp->tx_nr_rings_per_tc;
>  	  		off = i * count;
>  	  		netdev_set_tc_queue(dev, i, count, off);
>  	  	}
>  	  } else if (tcs == 1) {
>  	  	netdev_set_tc_queue(dev, 0, 1, 0);
>  	  }
>    ...
>  }

重新编译的ko已上传至该bugzilla附件:bnxt-testko-v1 
麻烦帮忙验证一下
Comment 7 maqiao_mq alibaba_cloud_group 2022-07-29 17:45:19 UTC
Created attachment 344 [details]
bnxt-testko-v2
Comment 8 maqiao_mq alibaba_cloud_group 2022-07-29 17:51:32 UTC
使用了更加合理的修改方式,上传了v2版本,见附件bnxt-testko-v2
请使用该ko进行测试。

修改方式如下:
>  static void bnxt_setup_msix(struct bnxt *bp)
>  {
>  	  const int len = sizeof(bp->irq_tbl[0].name);
>  	  struct net_device *dev = bp->dev;
>  	  int tcs, i;
>    
>  	  tcs = netdev_get_num_tc(dev);
>  	  if (tcs > 1) {
>  	  	int i, off, count;
>    
>  	  	for (i = 0; i < tcs; i++) {
>  	  		count = bp->tx_nr_rings_per_tc;
>  	  		off = i * count;
>  	  		netdev_set_tc_queue(dev, i, count, off);
>  	  	}
>  	  } else if (tcs == 1) {
>  	  	WARN_ONCE(bp->tx_nr_rings_per_tc == 0, "unexpected bp->tx_nr_rings_per_tc: 0 !!!");
>  	  	netdev_set_tc_queue(dev, 0, bp->tx_nr_rings_per_tc, 0);
>  	  }
>  ..
>  }
Comment 9 maqiao_mq alibaba_cloud_group 2022-07-29 17:54:53 UTC
测试方法:
1. 使用浏览器下载bnxt-testko-v2.tar.gz (请不要直接wget,亲测有些问题)
2. 上传该文件至测试机器
3. 解包获取bnxt_en.ko: tar xf bnxt-testko-v2.tar.gz
4. 备份原有ko:
mv /usr/lib/modules/4.19.91-26.an8.x86_64/kernel/drivers/net/ethernet/broadcom/bnxt/bnxt_en.ko /usr/lib/modules/4.19.91-26.an8.x86_64/kernel/drivers/net/ethernet/broadcom/bnxt/bnxt_en.ko.bak

5. 覆盖原有ko:
mv bnxt_en.ko /usr/lib/modules/4.19.91-26.an8.x86_64/kernel/drivers/net/ethernet/broadcom/bnxt/bnxt_en.ko

6. reboot进行测试
Comment 10 sunwuhao 2022-08-02 14:18:33 UTC
还在 crash , vmcore-dmesg 见附件,分别是两次的 crash 信息
Comment 11 sunwuhao 2022-08-02 14:19:39 UTC
Created attachment 345 [details]
vmcore-dmesg-8-2
Comment 12 sunwuhao 2022-08-02 14:20:09 UTC
Created attachment 346 [details]
vmcore-dmesg-8-2
Comment 13 linsheng alibaba_cloud_group 2022-08-02 15:09:58 UTC
>  	if (skb_rx_queue_recorded(skb)) {
>  		hash = skb_get_rx_queue(skb);
>  		while (unlikely(hash >= qcount))
>  			hash -= qcount;
>  		return hash + qoffset;
>  	}
>  
是否在 qcount 为 0 时,直接 BUGON crash 掉
Comment 14 linsheng alibaba_cloud_group 2022-08-02 15:13:04 UTC
(In reply to maqiao_mq from comment #9)
> 测试方法:
> 1. 使用浏览器下载bnxt-testko-v2.tar.gz (请不要直接wget,亲测有些问题)
> 2. 上传该文件至测试机器
> 3. 解包获取bnxt_en.ko: tar xf bnxt-testko-v2.tar.gz
> 4. 备份原有ko:
> mv
> /usr/lib/modules/4.19.91-26.an8.x86_64/kernel/drivers/net/ethernet/broadcom/
> bnxt/bnxt_en.ko
> /usr/lib/modules/4.19.91-26.an8.x86_64/kernel/drivers/net/ethernet/broadcom/
> bnxt/bnxt_en.ko.bak
> 
> 5. 覆盖原有ko:
> mv bnxt_en.ko
> /usr/lib/modules/4.19.91-26.an8.x86_64/kernel/drivers/net/ethernet/broadcom/
> bnxt/bnxt_en.ko
> 
> 6. reboot进行测试

建议在驱动版本号上做个修改,或者打印,确保使用的是新驱动做的验证
Comment 15 maqiao_mq alibaba_cloud_group 2022-08-04 17:26:19 UTC
Created attachment 348 [details]
url of 4.19.91-cbp.git.5b8703df0.an8.x86_64
Comment 16 maqiao_mq alibaba_cloud_group 2022-08-04 17:30:13 UTC
通过分析vmcore,确认是qcount为0导致的。
此外,分析下来,发生宕机的机器上,bnxt_en.ko还是老的,新的ko并没有被安装到内核中,导致之前修改的代码不起作用,猜测是在替换ko后没有调用dracut -f刷新initramfs导致的,这一步在上文的替换步骤中忘了。。。

我们回合了upstream上两个修复该问题的补丁,出了kernel的rpm包,麻烦直接使用该rpm包进行验证。
rpm包的下载链接,见附件:url of 4.19.91-cbp.git.5b8703df0.an8.x86_64

使用方法:
1. 下载以下3个包:
- kernel-core-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm
- kernel-modules-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm
- kernel-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm

2. 执行安装命令:
rpm -ivh kernel-core-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm kernel-modules-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm kernel-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm --force

3. reboot重启即可

附:回合补丁列表:
- bnxt_en: Fix TC queue mapping.
- net: Prevent infinite while loop in skb_tx_hash()
Comment 17 jni_ni 2022-08-05 15:37:42 UTC
# rpm -ivh kernel-core-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm kernel-modules-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm kernel-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm --force
Verifying...                          ################################# [100%]
Preparing...                          ################################# [100%]
Updating / installing...
   1:kernel-core-4.19.91-cbp.git.5b870################################# [ 33%]
   2:kernel-modules-4.19.91-cbp.git.5b################################# [ 67%]
   3:kernel-4.19.91-cbp.git.5b8703df0.################################# [100%]
Unable to decompress /boot/initramfs-4.19.91-cbp.git.5b8703df0.an8.x86_64.img: Unknown format
Comment 18 maqiao alibaba_cloud_group 2022-08-05 16:42:48 UTC
Created attachment 351 [details]
url(with md5sum) of 4.19.91-cbp.git.5b8703df0.an8.x86_64
Comment 19 maqiao alibaba_cloud_group 2022-08-05 16:59:05 UTC
(In reply to jni_ni from comment #17)
> # rpm -ivh kernel-core-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm
> kernel-modules-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm
> kernel-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm --force
> Verifying...                          #################################
> [100%]
> Preparing...                          #################################
> [100%]
> Updating / installing...
>    1:kernel-core-4.19.91-cbp.git.5b870################################# [
> 33%]
>    2:kernel-modules-4.19.91-cbp.git.5b################################# [
> 67%]
>    3:kernel-4.19.91-cbp.git.5b8703df0.#################################
> [100%]
> Unable to decompress
> /boot/initramfs-4.19.91-cbp.git.5b8703df0.an8.x86_64.img: Unknown format

[root@mq-an8 ~]# rpm -ivh kernel-core-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm kernel-modules-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm kernel-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm --force
Verifying...                          ################################# [100%]
Preparing...                          ################################# [100%]
Updating / installing...
   1:kernel-core-4.19.91-cbp.git.5b870################################# [ 33%]
   2:kernel-modules-4.19.91-cbp.git.5b################################# [ 67%]
   3:kernel-4.19.91-cbp.git.5b8703df0.################################# [100%]
[root@mq-an8 ~]# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.4"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.4"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.4"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"

[root@mq-an8 ~]# uname -a
Linux mq-an8 4.19.91-26.an8.x86_64 #1 SMP Tue May 24 13:10:09 CST 2022 x86_64 x86_64 x86_64 GNU/Linux

在ECS上未复现该问题,有可能是文件在下载时损坏了,请比对一下md5值。
md5值见附件:url(with md5sum) of 4.19.91-cbp.git.5b8703df0.an8.x86_64
Comment 20 jni_ni 2022-08-05 17:56:59 UTC
重启了服务器,重新下载了文件,并校对过MD5,还是报
Unable to decompress /boot/initramfs-4.19.91-cbp.git.5b8703df0.an8.x86_64.img: Unknown format

 tmp]# md5sum kernel-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm
98c6d43505a7960f80a9f24814bf16a8  kernel-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm
 tmp]# md5sum kernel-core-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm
95d4e85e0c26911062e09e4b0233d3f5  kernel-core-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm
 tmp]# md5sum kernel-modules-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm
29b2ec03399be4ba68904b8424901a9b  kernel-modules-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm


tmp]# rpm -ivh --force kernel-modules-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm
Verifying...                          ################################# [100%]
Preparing...                          ################################# [100%]
Updating / installing...
   1:kernel-modules-4.19.91-cbp.git.5b################################# [100%]

 tmp]# rpm -ivh --force kernel-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm
Verifying...                          ################################# [100%]
Preparing...                          ################################# [100%]
Updating / installing...
   1:kernel-4.19.91-cbp.git.5b8703df0.################################# [100%]

 tmp]# rpm -ivh --force kernel-core-4.19.91-cbp.git.5b8703df0.an8.x86_64.rpm
Verifying...                          ################################# [100%]
Preparing...                          ################################# [100%]
Updating / installing...
   1:kernel-core-4.19.91-cbp.git.5b870################################# [100%]
Unable to decompress /boot/initramfs-4.19.91-cbp.git.5b8703df0.an8.x86_64.img: Unknown format
Comment 21 jni_ni 2022-08-05 18:04:45 UTC
tmp]# cat /etc/os-release 
NAME="Anolis OS"
VERSION="8.4"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.4"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.4"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"

 tmp]# uname -a
Linux 10.72.46.4 4.19.91-26.an8.x86_64 #1 SMP Tue May 24 13:10:09 CST 2022 x86_64 x86_64 x86_64 GNU/Linux
Comment 22 maqiao alibaba_cloud_group 2022-08-05 18:29:06 UTC
(In reply to maqiao_mq from comment #9)
> 测试方法:
> 1. 使用浏览器下载bnxt-testko-v2.tar.gz (请不要直接wget,亲测有些问题)
> 2. 上传该文件至测试机器
> 3. 解包获取bnxt_en.ko: tar xf bnxt-testko-v2.tar.gz
> 4. 备份原有ko:
> mv
> /usr/lib/modules/4.19.91-26.an8.x86_64/kernel/drivers/net/ethernet/broadcom/
> bnxt/bnxt_en.ko
> /usr/lib/modules/4.19.91-26.an8.x86_64/kernel/drivers/net/ethernet/broadcom/
> bnxt/bnxt_en.ko.bak
> 
> 5. 覆盖原有ko:
> mv bnxt_en.ko
> /usr/lib/modules/4.19.91-26.an8.x86_64/kernel/drivers/net/ethernet/broadcom/
> bnxt/bnxt_en.ko
> 
> 6. reboot进行测试

rpm安装的问题一时半会不好排查,可以先用下面这个方法workaround:
在第5-6步之间,执行dracut -f刷新initramfs,然后再reboot重试
Comment 23 maqiao alibaba_cloud_group 2022-08-11 16:03:33 UTC
已合入修复代码

https://gitee.com/anolis/cloud-kernel/pulls/624