(In reply to feitian200603 from comment #0) > ntel_extension_for_tensorflow开启 Auto Mixed > Precision模式,环境变量ITEX_AUTO_MIXED_PRECISION_DATA_TYPE设置方式与官网文档不符, > 官方文档https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/ > guide/aamp_tune.md#tuning-performance-example-by-advanced-amp-configure-list- > manually,提示开启自动混合精度模式优化: > 两种方式: > 1,Python API > Basic (Default configuration) import intel_extension_for_tensorflow as itex > > auto_mixed_precision_options = itex.AutoMixedPrecisionOptions() > auto_mixed_precision_options.data_type = itex.BFLOAT16 #itex.FLOAT16 > > graph_options = itex.GraphOptions() > graph_options.auto_mixed_precision_options=auto_mixed_precision_options > graph_options.auto_mixed_precision = itex.ON > > config = itex.ConfigProto(graph_options=graph_options) > itex.set_config(config) > 2,Environment Variable > export ITEX_AUTO_MIXED_PRECISION=1 > export ITEX_AUTO_MIXED_PRECISION_DATA_TYPE="BFLOAT16" #"FLOAT16" > > 实际测试效果: > Node: 'resnet50/Cast' > No registered '_ITEXCast' OpKernel for 'CPU' devices compatible with node > {{node resnet50/Cast}} > (OpKernel was found, but attributes didn't match) Requested Attributes: > DstT=DT_BFLOAT16, SrcT=DT_BFLOAT16, T=DT_BFLOAT16, Truncate=false, > _XlaHasReferenceVars=false, > _device="/job:localhost/replica:0/task:0/device:CPU:0" > . Registered: device='CPU'; SrcT in [DT_HALF]; DstT in [DT_BFLOAT16] > device='CPU'; SrcT in [DT_HALF]; DstT in [DT_FLOAT] > device='CPU'; SrcT in [DT_BFLOAT16]; DstT in [DT_HALF] > device='CPU'; SrcT in [DT_BFLOAT16]; DstT in [DT_FLOAT] > device='CPU'; SrcT in [DT_FLOAT]; DstT in [DT_HALF] > device='CPU'; SrcT in [DT_FLOAT]; DstT in [DT_BFLOAT16]
复现步骤:模型使用bf16精度测试 for inputs, labels in data_iterator: cur = cur + 1 if (cur*batch_size > iterations): break print("inference dataset batch_size cur",cur) # 设置精度 policy = mixed_precision.Policy('mixed_bfloat16') mixed_precision.set_global_policy(policy) # 将输入数据转换为指定精度 inputs=tf.dtypes.cast(inputs, tf.bfloat16) tick = time.time() #inputs = inputs.to(input_device) #labels = labels.to(input_device) # 进行推理 with strategy.scope(): outputs = model.predict(inputs) tock = time.time() times.append(tock - tick)