Memory bandwidth counter width and overflow issue - steps to reproduce: 1. Run memory bandwidth stress workload (e.g., STREAM) on multiple CPUs to generate memory bandwidth. # taskset -c 0-7 ./stream_c.exe # pstree -lp | grep stream_c.exe | |-sshd(4118)---bash(4184)---stream_c.exe(4388)-+-{stream_c.exe}(4389) | | |-{stream_c.exe}(4390) | | |-{stream_c.exe}(4391) | | |-{stream_c.exe}(4392) | | |-{stream_c.exe}(4393) | | |-{stream_c.exe}(4394) | | `-{stream_c.exe}(4395) 2. Mount Resctrl Filesystem, create control group "p1", add STREAM workload PIDs/TIDs to p1 group: # mount -t resctrl resctrl /sys/fs/resctrl # mkdir /sys/fs/resctrl/p1 # ./add_tasks.sh ------------------------------------------------------------------------------------- #/usr/bin/bash # add_tasks.sh ID=`ps -efL | grep stream_c.exe | grep -v "grep" | awk '{print $4}'` for id in $ID do echo $id >/sys/fs/resctrl/p1/tasks done ------------------------------------------------------------------------------------- 3. Read memory bandwidth counters with 1 second interval, observe the random unexpected values due to incorrect counter overflow handling: # ./mbm_total_verbose.sh ------------------------------------------------------------------------------------- #/usr/bin/bash # mbm_total_verbose.sh total_1=`cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/mbm_total_bytes` sleep 1 while true do total_2=`cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/mbm_total_bytes` echo "total b/w (bytes/s):" `expr $total_2 - $total_1` "($total_2 - $total_1)" total_1=$total_2 sleep 1 done ------------------------------------------------------------------------------------- total b/w (bytes/s): 21663042304 (222340959424 - 200677917120) total b/w (bytes/s): 21951040640 (244292000064 - 222340959424) total b/w (bytes/s): 21491510336 (265783510400 - 244292000064) total b/w (bytes/s): 1125646660758656 (1125912444269056 - 265783510400) < == unexpected values due to incorrect counter overflow handling total b/w (bytes/s): 21957204096 (1125934401473152 - 1125912444269056) total b/w (bytes/s): 21750773184 (1125956152246336 - 1125934401473152) total b/w (bytes/s): 21397818496 (1125977550064832 - 1125956152246336) total b/w (bytes/s): 21933403264 (1125999483468096 - 1125977550064832) total b/w (bytes/s): 21947796032 (1126021431264128 - 1125999483468096) total b/w (bytes/s): 21271977152 (1126042703241280 - 1126021431264128) total b/w (bytes/s): 21854536000 (1126064557777280 - 1126042703241280) total b/w (bytes/s): 21957381568 (1126086515158848 - 1126064557777280) total b/w (bytes/s): 21358693184 (1126107873852032 - 1126086515158848) total b/w (bytes/s): 21771448640 (1126129645300672 - 1126107873852032) total b/w (bytes/s): 21960641920 (1126151605942592 - 1126129645300672) total b/w (bytes/s): 21463460352 (1126173069402944 - 1126151605942592) total b/w (bytes/s): 1125646683192192 (2251819752595136 - 1126173069402944) < == unexpected values due to incorrect counter overflow handling total b/w (bytes/s): 21958503872 (2251841711099008 - 2251819752595136) total b/w (bytes/s): 21731181440 (2251863442280448 - 2251841711099008) total b/w (bytes/s): 21410335936 (2251884852616384 - 2251863442280448)