在Linux上配置CUDA环境

https://www.tensorflow.org/install/install_linux

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ubuntu-installation

https://developer.nvidia.com/cudnn

pre-installation

https://developer.nvidia.com/cuda-downloads 选择下载安装包,以ubuntu使用网络安装为例

Installation Instructions:

安装完成后重启,在启动脚本里面添加

export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
#END_EXAMPLE

验证安装成功可以运行命令

#+BEGIN_EXAMPLE
root@testnn:~# nvidia-smi
Thu Mar 22 09:54:25 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30                 Driver Version: 390.30                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:08.0 Off |                    0 |
| N/A   31C    P0    28W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

可能现在TF只支持CUDA9.0, 所以安装9.1会出现链接错误找不到.so的问题。为了验证tensorflow的确使用了GPU,可以运行下面脚本

import tensorflow as tf sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

输出大概是下面这样的

2018-03-22 11:01:18.699187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-03-22 11:01:18.699367: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 570 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:08.0, compute capability: 6.0)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:08.0, compute capability: 6.0
2018-03-22 11:01:18.699531: I tensorflow/core/common_runtime/direct_session.cc:297] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:08.0, compute capability: 6.0