Bug 21731 - access rq_hang leads to crash if corresponding device is removing at the same time
Summary: access rq_hang leads to crash if corresponding device is removing at the same...
Status: NEW
Alias: None
Product: ANCK 5.10 Dev
Classification: ANCK
Component: block/storage (show other bugs) block/storage
Version: 5.10.y-12
Hardware: All Linux
: P2-High S2-major
Target Milestone: ---
Assignee: Ferry Meng
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-06-11 14:41 UTC by cyxhlc
Modified: 2025-06-13 14:08 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description cyxhlc 2025-06-11 14:41:34 UTC
Description of problem:
we found a kernel NULL pointer dereference bug in product environment.
[  415.974569] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  415.991350] #PF: supervisor read access in kernel mode
[  415.991838] #PF: error_code(0x0000) - not-present page
[  415.992314] PGD 0 P4D 0
[  415.992566] Oops: 0000 [#1] SMP NOPTI
[  415.992910] CPU: 7 PID: 3560 Comm: cat Kdump: loaded Not tainted 5.10.134-14.zncgsl6.x86_64 #1
[  415.993703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[  415.994755] RIP: 0010:blk_mq_queue_tag_busy_iter+0x276/0x2a0
[  415.995275] Code: 89 ef d3 e6 01 c6 e8 c9 f6 ff ff 84 c0 75 ca 4c 8b 64 24 08 4c 8b 74 24 20 8b 6c 24 2c e9 15 fe ff ff 48 8b 04 24 48 8b 50 18 <48> 8b 02 48 85 c0 74 11 48 8d 48 01 f0 48 0f b1 0a 0f 84 a4 fd ff
[  415.996976] RSP: 0018:ffffa9f588dd7d40 EFLAGS: 00010206
[  415.997471] RAX: ffff9b95de0c8970 RBX: 0000000000000000 RCX: 000000000000452d
[  415.998125] RDX: 0000000000000000 RSI: ffffffffa15668a0 RDI: ffff9b95de0c8970
[  415.998785] RBP: ffff9b95c9a8bb58 R08: 0000000000001000 R09: ffff9b95de96a5f8
[  415.999440] R10: ffffa9f588dd7ee0 R11: 0000000000000001 R12: ffff9b95d41f6d00
[  416.000098] R13: 0000000000000001 R14: 0000000000000001 R15: ffff9b95c9a8bb30
[  416.000762] FS:  00007f7a502a7680(0000) GS:ffff9b98e8180000(0000) knlGS:0000000000000000
[  416.001512] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  416.002041] CR2: 0000000000000000 CR3: 000000011b42e002 CR4: 0000000000770ee0
[  416.002706] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  416.003356] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  416.004044] PKRU: 55555554
[  416.004318] Call Trace:
[  416.004584]  ? hctx_tags_show+0x60/0x60
[  416.004968]  ? __kmalloc_node+0x59/0x550
[  416.005340]  ? seq_read_iter+0x330/0x3f0
[  416.005715]  queue_rq_hang_show+0x14/0x20
[  416.006092]  seq_read_iter+0x199/0x3f0
[  416.006450]  ? page_add_new_anon_rmap+0x9e/0x1f0
[  416.006885]  seq_read+0xf6/0x130
[  416.007195]  full_proxy_read+0x50/0x80
[  416.007556]  vfs_read+0x95/0x180
[  416.007880]  ksys_read+0x49/0xc0
[  416.008200]  do_syscall_64+0x30/0x40
[  416.008557]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[  416.009044] RIP: 0033:0x7f7a4fd1f995
[  416.009382] Code: fe ff ff 50 48 8d 3d a2 ca 06 00 e8 c5 ed 01 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 a5 4d 2a 00 8b 00 85 c0 75 0f 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89

Version-Release number of selected component (if applicable):
5.10.134

How reproducible:
1. execute: while true; do ./attach_detach.sh; done [on host]
the content of attach_detach.sh:
virsh attach-device 25 disk1.xml
sleep 0.1
virsh detach-device 25 disk1.xml

2. execute: while true; do cat /sys/kernel/debug/block/vdb/rq_hang; done [on virtual machine]


Actual results:
os crashed

Expected results:
the os will not crash

Additional info:
I provide a simple fix:

static int queue_rq_hang_show(void *data, struct seq_file *m)
{
    struct request_queue *q = data;
+   if (!blk_get_queue(q))  
+       return -ENODEV;     

    blk_mq_queue_tag_busy_iter(q, blk_mq_check_rq_hang, m);
+   blk_put_queue(q);  
    return 0;
}