deepin20,nvidia450+cuda11+cudnn8,深度学习pytorch,tensorFlow- Community

[Share Experiences] deepin20,nvidia450+cuda11+cudnn8,深度学习pytorch,tensorFlow

Experiences and Insight 1815 views · 5 replies ·

monkeycc

deepin

2021-01-06 23:16

Author

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66       Driver Version: 450.66       CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   54C    P8     4W /  N/A |   1142MiB /  5934MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      9566      C   python                            829MiB |
|    0   N/A  N/A     12113      C   python                             81MiB |
|    0   N/A  N/A     14126      C   python                            229MiB |
+-----------------------------------------------------------------------------+

环境：

深度操作系统20.1（1010）

anaconda3

抛弃ubuntu 拥抱deepin（debian）

Pytorch 1.7.1

conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch

>>> import torch
>>> print(torch.cuda.is_available())
True

TensorFlow 2 tensorflow_gpu-2.3.0

pip insta 对应py版本软件包

>>> import tensorflow as tf
2021-01-06 14:45:20.989379: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
>>> tf.test.is_gpu_available()
WARNING:tensorflow:From :1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-01-06 14:45:36.576268: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-06 14:45:36.603810: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2599990000 Hz
2021-01-06 14:45:36.604273: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b46983bee0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-06 14:45:36.604289: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-06 14:45:36.606366: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-01-06 14:45:36.679400: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-06 14:45:36.679812: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b4698bdc20 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-01-06 14:45:36.679830: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2060, Compute Capability 7.5
2021-01-06 14:45:36.680037: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-06 14:45:36.680277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.335GHz coreCount: 30 deviceMemorySize: 5.79GiB deviceMemoryBandwidth: 312.97GiB/s
2021-01-06 14:45:36.680304: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-01-06 14:45:36.682137: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-01-06 14:45:36.682889: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-01-06 14:45:36.683132: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-01-06 14:45:36.689812: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-01-06 14:45:36.691749: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-01-06 14:45:36.691987: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-01-06 14:45:36.692241: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-06 14:45:36.692886: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-06 14:45:36.693187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-01-06 14:45:36.693549: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-01-06 14:45:37.287921: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-06 14:45:37.287954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2021-01-06 14:45:37.287962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2021-01-06 14:45:37.288581: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-06 14:45:37.288894: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-06 14:45:37.289145: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/device:GPU:0 with 4494 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
True

paddlepaddle 2.0.0rc1

python -m pip install paddlepaddle-gpu==2.0.0rc1.post110 -f https://paddlepaddle.org.cn/whl/stable.html

>>> import paddle
>>> paddle.utils.run_check()
Running verify PaddlePaddle program ...
W0106 14:13:25.657202  9566 device_context.cc:320] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.0, Runtime API Version: 11.0
W0106 14:13:25.664103  9566 device_context.cc:330] device: 0, cuDNN Version: 8.0.
PaddlePaddle works well on 1 GPU.
PaddlePaddle works well on 1 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

用了2天完成了 deepin20的 nvidia 的 cuda cudnn

测试了 pytorch tensorflow paddlepaddle

非root用户也能正常使用

（把 root 的 .bashrc 里面对应的内容放在非root用户中去）

一直以为nvidia对debian不友好

原来不是

deepin20没了root图形登陆导致安装的时间成本很高

不利于深度学习环境的搭建特别是强化学习需要的界面

安装思路：

一切环境安装在root环境，注意要用sudo的命令来安装（允许系统管理员让普通用户执行）

再修改root用户的 .bashrc （把 root 的 .bashrc 里面对应的内容放在非root用户中去）

这边我直接用 cuda_11.0 ,

之前在ubuntu20一直用cuda_10.2

不管你安装的是cuda几版本，深度学习框架安装要对应的版本就行了

安装流程：

打开终端，设置root密码
sudo passwd root

以下所有命令，在“root”用户登陆权限下使用：


卸载英伟达开源驱动
sudo apt autoremove nvidia-*

重启

安装英伟达闭源驱动
sudo apt install nvidia-driver

重启

sudo apt update -y && sudo apt install nvidia-smi -y

重启

去官方 安装对应的cuda版本（提示tmp空间一定不能小于2G，建议 3G，！！别忘记加 sudo 命令）
（官网速度很慢很卡，建议用迅雷下载）

--------------------------------------------------------------
（安装完成别忘记了按照提示弄环境）

export PATH="/usr/local/cuda-11.1/bin:$PATH"

export LD_LIBRARY_PATH="/usr/local/cuda-11.1/lib64:$LD_LIBRARY_PATH"

export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-11.1
------------------------------------------------------------------------------

sudo apt install nvidia-cuda-toolkit 

完成cuda的安装 

至于 cudnn 
去官网下对应的版本 linux版本
（官网速度很慢很卡，建议用迅雷下载）

-----------------------
（注意解压出来的cuda路径）
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
-----------------------
完成 cudnn

原创论坛帖：https://bbs.deepin.org/post/209407

Reply Like 3 Favorite View the author

All Replies

忘记、过去

deepin

2021-01-06 23:37

root 图形登陆我记得还是支持的吧，只不过默认密码不再是安装的时候你创建的用户的密码了......

需要执行 sudo passwd root 修改 root 密码，然后从 tty 登录，再执行 startx 启动图形界面......烦的一批......而且默认新安装的系统好像连 startx 指令都没有来着......万一N卡没配置好 startx 都进不去 XD......

Reply Like 0 View the author

monkeycc

deepin

2021-01-06 23:57

忘记、过去：

root 图形登陆我记得还是支持的吧，只不过默认密码不再是安装的时候你创建的用户的密码了......

所以我没考虑用root的startx进去

因为官方想办法不让root图形登陆，改了很多东西和环境

硬是要root图形进去

想想root各种环境需要重新安装和搭建，头疼, 费劲

Reply Like 0 View the author

lcw0268

deepin

2021-01-07 03:19

看楼主的帖子，我对nvidia开源驱动，迷糊了。

nvidia-* 包含nvidia-driver,开源驱动是不是xserver-xorg-video-nvidia?

还有，cudnn是做什么的？

Reply Like 0 View the author

忘记、过去

deepin

2021-01-07 05:58

lcw0268：

看楼主的帖子，我对nvidia开源驱动，迷糊了。

nvidia-* 包含nvidia-driver,开源驱动是不是xserver-xorg-video-nvidia?

还有，cudnn是做什么的？

开源驱动指的是 nouveau，只不过相关的软件包也是 nvidia-* 开头的。

nvidia-driver 是源里面最新版本的闭源驱动软件包，xserver-xorg-video-nvidia 应该算是其中的一部分？安装 nvidia-driver 的时候会依赖这个自动安装。

cudnn 是配置 CUDA 时相关的组件，跑深度学习啥的要用。

Reply Like 0 View the author

lcw0268

deepin

2021-01-07 08:17

忘记、过去：

开源驱动指的是 nouveau，只不过相关的软件包也是 nvidia-* 开头的。

nvidia-driver 是源里面最新版本的闭源驱动软件包，xserver-xorg-video-nvidia 应该算是其中的一部分？安装 nvidia-driver 的时候会依赖这个自动安装。

cudnn 是配置 CUDA 时相关的组件，跑深度学习啥的要用。

谢谢解惑。

安装nvidia-cuda-toolkit有4G以上，安装了好像就没有X错误了。

Reply Like 0 View the author

Featured Collection

Change

New Thread

Popular Ranking

Change

dde-dconfig-daemon segfault on shutdown causes 2-minute hang

Popular Events