Bug 9465 - iommu cause null pointer panic in intel_iotlb_flush_all()
Summary: iommu cause null pointer panic in intel_iotlb_flush_all()
Status: RESOLVED FIXED
Alias: None
Product: ANCK 5.10 Dev
Classification: ANCK
Component: general/others (show other bugs) general/others
Version: unspecified
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: zelin
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-02 09:34 UTC by zelin
Modified: 2024-07-02 16:09 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description zelin alibaba_cloud_group 2024-07-02 09:34:43 UTC
Description of problem:
kernel panic when doing intel_flush_iotlb_all().

[20036.479789] BUG: kernel NULL pointer dereference, address: 0000000000000304
[20036.479791] pci 0000:d2:00.2: Removing from iommu group 444
[20036.492765] #PF: supervisor read access in kernel mode
[20036.498102] #PF: error_code(0x0000) - not-present page
[20036.503438] PGD 0 P4D 0 
[20036.506176] Oops: 0000 [#1] SMP NOPTI
[20036.510041] CPU: 65 PID: 0 Comm: swapper/65 Kdump: loaded Tainted: P S         OE K   5.10.134-13.al8.x86_64 #1
[20036.520345] Hardware name: H3C R5500 G5/BS55M2C3SD, BIOS 5.58.75 04/20/2023
[20036.527758] RIP: 0010:intel_flush_iotlb_all+0x70/0x100
[20036.533139] Code: d2 48 89 ef ff d0 0f 1f 00 f6 45 18 80 75 30 48 8b 55 68 44 89 e8 0f b6 c4 48 8b 3c c2 48 85 ff 74 08 45 0f b6 ed 4a 8b 3c ef <80> bf 04 03 00 00 00 74 0c ba 34 00 00 00 31 f6 e8 0b fd ff ff 48
[20036.552423] RSP: 0018:ffffa1ebd9b8ce78 EFLAGS: 00010286
[20036.557908] RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffff8f5380053888
[20036.565307] RDX: ffff905304676000 RSI: 0000000000000246 RDI: 0000000000000000
[20036.572710] RBP: ffff8f538005f600 R08: 0000000000000020 R09: 0000000000000000
[20036.580101] R10: 0000000000000010 R11: 0000000000000010 R12: ffff9054f3ab1600
[20036.587493] R13: 0000000000000021 R14: 2000000000000000 R15: ffff9054f3ab0b40
[20036.594890] FS:  0000000000000000(0000) GS:ffff904effa40000(0000) knlGS:0000000000000000
[20036.603247] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20036.609273] CR2: 0000000000000304 CR3: 0000010208ad6003 CR4: 0000000000770ee0
[20036.616696] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[20036.624116] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[20036.631544] PKRU: 55555554
[20036.634558] Call Trace:
[20036.637305]  <IRQ>
[20036.639624]  iova_domain_flush+0x17/0x30
[20036.643853]  fq_flush_timeout+0x2e/0xa0
[20036.648001]  ? fq_ring_free+0xf0/0xf0
[20036.651976]  call_timer_fn+0x27/0x100
[20036.655955]  __run_timers.part.0+0x19a/0x210
[20036.660543]  ? ktime_get+0x35/0xa0
[20036.664272]  ? clockevents_program_event+0x8a/0xf0
[20036.669395]  run_timer_softirq+0x26/0x50
[20036.673912]  __do_softirq+0xc1/0x280
[20036.677833]  asm_call_irq_on_stack+0xf/0x20
[20036.682371]  </IRQ>
[20036.684820]  do_softirq_own_stack+0x37/0x50
[20036.689358]  irq_exit_rcu+0xc4/0x100
[20036.693297]  sysvec_apic_timer_interrupt+0x36/0x80
[20036.698451]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[20036.703966] RIP: 0010:cpuidle_enter_state+0xd2/0x330

Version-Release number of selected component (if applicable):


How reproducible:
Low probability

Steps to Reproduce:
1. there're traffic coming on mlx5 nic
2. enable mlx5 nic sriov, and network traffic comes into vf
3. disable mlx5 nic sriov, which will do intel_iotlb_flush_all()

Actual results:
panic


Expected results:
No panic, should disabling successfully

Additional info:
Comment 1 小龙 admin 2024-07-02 09:51:05 UTC
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3454
Comment 2 zelin alibaba_cloud_group 2024-07-02 16:09:03 UTC
see pr link fixed