处理日志,求和,任意精度
日志文件样例: I0802 21:03:05.976434 15241 embedding_merger.cc:804, FillTensorWithRawSignEmbeddingsExp] open_dirty_mask is:0, memset_dur(ms):0.000137, func_dur(ms):0.006606, memset/func(ration):0.0207387, pre_flat_tensor.num_bytes:512, in MiB:0.000488281 I0802 21:03:05.976545 15269 embedding_merger.cc:804, FillTensorWithRawSignEmbeddingsExp] open_dirty_mask is:0, memset_dur(ms):8.2e-05, func_dur(ms):0.00523, memset/func(ration):0.0156788, pre_flat_tensor.num_bytes:512, in MiB:0.000488281
任意精度做法
由于存在8.2e-05这样的科学计算,需要用awk -F'e' 'BEGIN{OFMT="%10.10f"} {print $1*(10^$2)}'把科学计算转换成普通的浮点数 cat log/rat.txt | cut -d, -f3 | cut -d':' -f2 | awk -F'e' 'BEGIN{OFMT="%10.10f"} {print $1*(10^$2)}' | paste -sd+ | bc -l cut -d的方法来自: https://stackoverflow.com/questions/21277631/awk-sum-of-large-integers scientific number to float number: https://stackoverflow.com/questions/13826237/convert-scientific-notation-to-decimal-in-bash
如果没有超过awk表示范围
cat log/rat.txt | awk -F, '{print $3 $4}' | sed 's/:/ /g' | awk '{sum1+=$2; sum2+=$4} END {print "tot1:", sum1, "tot2:", sum2}'
nvidia-smi结果汇总求和
while true
do
line=""
readarray -t arr2 < <(nvidia-smi | grep %)
for oneline in "${arr2[@]}"; do
u1=`echo "${oneline}" | awk -F'|' '{print $4}' | awk -F'%' '{print $1}' | sed 's, ,,g'`
m1=`echo "${oneline}" | cut -d'|' -f3 | cut -d'/' -f1 | sed 's, ,,g'`
line="${line},${u1},${m1}"
done
echo $line >>/tmp/gpu-util-mem.txt
sleep 0.1
done
样例输出
,53,21491MiB,46,21487MiB
,63,21491MiB,52,21487MiB
,59,21491MiB,53,21487MiB
,52,21491MiB,56,21487MiB
,54,21491MiB,42,21487MiB
,60,21491MiB,63,21487MiB
,41,21491MiB,39,21487MiB
,53,21491MiB,62,21487MiB
,55,21491MiB,50,21487MiB
,64,21491MiB,55,21487MiB
更好的smi
while true
do
nvidia-smi --query-gpu index,utilization.gpu,memory.used --format csv >>/tmp/gpu-util-mem.txt
sleep 1
done
输出
index, utilization.gpu [%], memory.used [MiB]
0, 36 %, 78947 MiB
1, 0 %, 78001 MiB
index, utilization.gpu [%], memory.used [MiB]
0, 40 %, 78947 MiB
1, 30 %, 78001 MiB
nvidia gpu 锁频
// persistent mode on; lock memory clock; lock gpu clock; power limit
nvidia-smi -pm 1 ; nvidia-smi -lmc 6251; nvidia-smi -lgc 1500; nvidia-smi -pl 150
nvidia gpu 监控频率
watch -n 5 "nvidia-smi --query-gpu=power.max_limit,clocks.current.sm,power.draw,pcie.link.gen.current,pcie.link.gen.max,pcie.link.width.current,pcie.link.width.max,clocks_throttle_reasons.active,power.limit,power.min_limit --format=csv"