Bug 28271 - A memory leak occurs in cgroup wmark reclaim when MGLRU is enabled
Summary: A memory leak occurs in cgroup wmark reclaim when MGLRU is enabled
Status: NEW
Alias: None
Product: ANCK 6.6 Dev
Classification: ANCK
Component: mm (show other bugs) mm
Version: 6.6.88-4
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: baolinwang
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-12-18 20:22 UTC by houwenwu
Modified: 2025-12-19 11:31 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description houwenwu 2025-12-18 20:22:35 UTC
Description of problem:

A memory leak occurs in cgroup wmark reclaim when MGLRU is enabled.

call stack:
    kmalloc_trace+124
    set_mm_walk+100
    try_to_inc_max_seq.isra.0+82
    try_to_shrink_lruvec+553
    lru_gen_shrink_lruvec+71
    shrink_node_memcgs+386
    shrink_node+390
    shrink_zones.constprop.0+133
    do_try_to_free_pages+157
    try_to_free_mem_cgroup_pages+263
    wmark_work_func+153
    process_one_work+397
    worker_thread+631
    kthread+228
    ret_from_fork+48
    ret_from_fork_asm+27

`walk = kzalloc(sizeof(*walk) ...` will never free! 
In direct reclaim, MGLRU allocate walk in set_mm_walk() and free it in clear_mm_walk(). In kspwad, however, MGLRU use static allocated walk and free is not needed.

cgroup wmark reclaim call direct reclaim from kswapd, this lead a inconsistence, set_mm_walk() hit `VM_WARN_ON_ONCE(current_is_kswapd())` but still call kzalloc(), clear_mm_walk() free walk iff current is not kswapd.

It consumes ~60GB kmalloc-256 slab memory per day in our production cluster.


Version-Release number of selected component (if applicable):


How reproducible:
In cgroup v1 environment, this can be reproduced by
```
ubuntu2404 :: ~ » slabtop -o | grep kmalloc-256
  1792   1792 100%    0.25K     56       32       448K kmalloc-256
     0      0   0%    0.25K      0       32         0K dma-kmalloc-256
ubuntu2404 :: ~ » echo y > /sys/kernel/mm/lru_gen/enabled                                                                               130 ↵
ubuntu2404 :: ~ » mkdir -p /sys/fs/cgroup/memory/mytest
ubuntu2404 :: ~ » echo 200M > /sys/fs/cgroup/memory/mytest/memory.limit_in_bytes
ubuntu2404 :: ~ » echo 50 > /sys/fs/cgroup/memory/mytest/memory.wmark_ratio
ubuntu2404 :: ~ » echo $$ > /sys/fs/cgroup/memory/mytest/cgroup.procs
ubuntu2404 :: ~ » python3 -c 'import numpy as np;a = np.random.rand(1024, 1024, 20)'
ubuntu2404 :: ~ » slabtop -o | grep kmalloc-256
  4128   4128 100%    0.25K    129       32      1032K kmalloc-256
     0      0   0%    0.25K      0       32         0K dma-kmalloc-256
ubuntu2404 :: ~ » python3 -c 'import numpy as np;a = np.random.rand(1024, 1024, 20)'
ubuntu2404 :: ~ » slabtop -o | grep kmalloc-256
  6432   6432 100%    0.25K    201       32      1608K kmalloc-256
     0      0   0%    0.25K      0       32         0K dma-kmalloc-256
```

kmalloc-256 keep growing and never free.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 小龙 admin 2025-12-19 11:31:56 UTC
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/6224