Description of problem: Current lru_lock is one for each of node, pgdat->lru_lock, that guard for lru lists, but now we had moved the lru lists into memcg for long time. Still using per node lru_lock is clearly unscalable, pages on each of memcgs have to compete each others for a whole lru_lock. This patchset try to use per lruvec/memcg lru_lock to repleace per node lru lock to guard lru lists, make it scalable for memcgs and get performance gain. PER CGROUP PER LOCK is the upstream solution: https://lore.kernel.org/all/1604566549-62481-15-git-send-email-alex.shi@linux.alibaba.com/T/#m5510a411124f4e1f21e3585c6d1db28dcd13bce3 upstream shows 62% improvement on modified readtwice case on his 2P * 10 core * 2 HT broadwell box on v18, which has no much different with this v20. and also it can have 10% performance improvement in our container migration solution using DSA due to it's using per cgroup per lock rather than pgdat lock.