对于TensorFlow的GPU版本而言, 严重依赖系统的CUDA硬件环境.
查看CUDA版本:
(venv)$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44
以下是对于支持CUDA 8.0版本的TF GPU版本的总结:
- 1.2 版本, 需要libcudnn.so.5即可.
- 1.3 版本, 需要libcudnn.so.6即可.
- 1.4 版本, 需要libcudnn.so.7和系统环境
CUDA_DEVICE_ORDER
和CUDA_VISIBLE_DEVICES
. - 1.5版本以上, 需要CUDA 9.0版本.
1.5版本以上的CUDA 9.0
错误提示:
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
1.4版本的CUDA常量的值:
(venv) $ echo $CUDA_DEVICE_ORDER
PCI_BUS_ID
echo $CUDA_VISIBLE_DEVICES
0,1,2,3
导入CUDA常量的命令:
export CUDA_DEVICE_ORDER="PCI_BUS_ID"
export CUDA_VISIBLE_DEVICES="0,1,2,3"
关于Linux下载的cuDNN包:
- cudnn-8.0-linux-x64-v6.0.tgz, 包含libcudnn.so.6;
- cudnn-8.0-linux-x64-v7.tgz, 包含libcudnn.so.7;
即 cudnn-[CUDA版本]-[操作系统]-[64位]-[动态链接库版本]
当CUDA官网无法登录时, 请耐心等待... 或者在CSDN中下载资源.
下载:
wget http://developer.download.nvidia.com/compute/redist/cudnn/v6.0/cudnn-8.0-linux-x64-v6.0.tgz
wget http://developer.download.nvidia.com/compute/redist/cudnn/v7.0.5/cudnn-8.0-linux-x64-v7.tgz
设置LD_LIBRARY_PATH
变量
echo $LD_LIBRARY_PATH
/usr/local/cuda-9.0/lib64/usr/local/cuda-8.0/lib:/usr/local/cuda/lib:/usr/local/cuda-8.0/lib64/:/usr/local/cuda/lib64/:
export LD_LIBRARY_PATH="/usr/local/cuda-9.0/lib64/usr/local/cuda-8.0/lib:/usr/local/cuda/lib:/usr/local/cuda-8.0/lib64/:/usr/local/cuda/lib64/:"
export CUDA_DEVICE_ORDER="PCI_BUS_ID"
export CUDA_VISIBLE_DEVICES="0,1,2,3"
CUDA版本列表:
在cuda压缩包中, 含有两个文件夹
- include: 包含cuda的头文件, 不同版本基本一致;
- lib64: 已编译cuda链接库的文件夹;
- lib64/libcudnn_static.a: 静态链接库, 不同版本基本一致;
- lib64/libcudnn.so.6.0.21: 动态链接库, TF GPU的核心执行库;
- libcudnn.so.6和libcudnn.so是两个软链接, 指向具体的版本号, 如6.0.21;
执行命令, 操作usr目录, 需要sudo管理员权限:
tar -xzvf cudnn-8.0-linux-x64-v5.1.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
测试GPU的Python脚本:
from tensorflow.python.client import device_lib
def get_available_gpus():
"""
查看GPU的命令:nvidia-smi
查看被占用的情况:ps aux | grep PID
:return: GPU个数
"""
local_device_protos = device_lib.list_local_devices()
print "all: %s" % [x.name for x in local_device_protos]
print "gpu: %s" % [x.name for x in local_device_protos if x.device_type == 'GPU']
get_available_gpus()
输出支持GPU的编号列表.
因此, 如果CUDA是8.0版本, 则TF GPU最高支持1.4版本, 不要痴心妄想了!
CUDA和cuDNN是用于神经网络训练的GPU环境,属于硬件信息,不同的CUDA版本支持不同的机器学习库,因此,需要确定当前服务器的CUDA版本,以便于安装相应的机器学习库。
查询CUDA的版本,如8.0.44:
wcl1@BJYS-AMAXGPU-34-1:~$ cat /usr/local/cuda/version.txt
CUDA Version 8.0.44
或
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44
查询cuDNN的版本,如5.1.5:
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 5
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 5
查询GPU的信息,如4个GPU:
nvidia-smi
Wed Mar 28 12:32:01 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.26 Driver Version: 387.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:02:00.0 Off | N/A |
| 23% 22C P8 16W / 250W | 289MiB / 11172MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A |
| 23% 23C P8 8W / 250W | 10MiB / 11172MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:82:00.0 Off | N/A |
| 23% 18C P8 8W / 250W | 10MiB / 11172MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 108... Off | 00000000:83:00.0 Off | N/A |
| 23% 19C P8 8W / 250W | 10MiB / 11172MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 20009 C python 279MiB |
+-----------------------------------------------------------------------------+