1.
2. 问题
2.1 求助:TensorFlow指定使用GPU2,3却自动占用GPU0
问题:使用os.environ["CUDA_VISIBLE_DEVICES"] = "1"设置使用GPU1,却总是使用GPU0;,问题出在print("GPU状态:",tf.test.is_gpu_available())代码上,在设置使用哪块GPU之前,不能调用tensorflow的函数。
import os
# 用于保存训练后的检查点文件和日志文件路径
train_log_file = 'miniImageNet-better-cs.ckpt'
print("GPU状态:",tf.test.is_gpu_available())
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" #有效,但是运行结果不对
os.environ["CUDA_VISIBLE_DEVICES"] = gpu_flag
#os.environ["CUDA_VISIBLE_DEVICES"] = "1"
修改前的运行log:
2020-07-27 12:18:22.763620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7
2020-07-27 12:18:22.772153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-27 12:18:22.772171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0 1 2 3 4 5 6 7
2020-07-27 12:18:22.772181: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N N N N N N N N
2020-07-27 12:18:22.772188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1: N N N N N N N N
2020-07-27 12:18:22.772195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 2: N N N N N N N N
2020-07-27 12:18:22.772202: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 3: N N N N N N N N
2020-07-27 12:18:22.772209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 4: N N N N N N N N
2020-07-27 12:18:22.772216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 5: N N N N N N N N
2020-07-27 12:18:22.772222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 6: N N N N N N N N
修改如下,把tf.test.is_gpu_available()语句挪后即可:
import os
# 用于保存训练后的检查点文件和日志文件路径
train_log_file = 'miniImageNet-better-cs.ckpt'
print("GPU状态:",tf.test.is_gpu_available())
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" #有效,但是运行结果不对
os.environ["CUDA_VISIBLE_DEVICES"] = gpu_flag
#os.environ["CUDA_VISIBLE_DEVICES"] = "1"
修改后的运行log:
2020-07-27 12:25:40.070721: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2020-07-27 12:25:40.070792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-27 12:25:40.070801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2020-07-27 12:25:40.070810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2020-07-27 12:25:40.070915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10238 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:88:00.0, compute capability: 7.5)
2020-07-27 12:25:42.117883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2020-07-27 12:25:42.117947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-27 12:25:42.117956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2020-07-27 12:25:42.117964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2020-07-27 12:25:42.118029: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/device:GPU:0 with 10238 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:88:00.0, compute capability: 7.5)
参考资料
[1] tensorflow运行在gpu还是cpu
[2] tensorflow训练使用GPU和CPU的不同指定方法
[3] tensorflow指定使用哪块GPU运行程序
[4] 求助:TensorFlow指定使用GPU2,3却自动占用GPU0