The Zhaoxin KH-40000 platform can't keep the PCIE transaction order for DMA writes, whose target addresses are located in different NUMA nodes, from the same device. Patch this issue by flushing the target DMA write with a subsequent PCIE configuration space read operation. For streaming dma ops callback .unmap_page/sg and .sync_sg/single_for_cpu, add PCIE configuration space read operation to flush the target DMA write. For coherent DMA map, limit the DMA buffer that the device driver applies to to be on the same node as the device.
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/2966