RTX3060显卡比1060跑深度学习慢?

news/2024/10/30 15:27:55/

最近单位搞到1台装了rtx3060显卡到机器,我把之前项目代码上面一跑发现速度非常啦跨...!!!!

举个例子:视频目标检测推理原来能跑到60帧,但这货居然只能跑到12帧!!!!(tensorflow1)

然后我换了框架(tensorrt+pycuda)一顿搞,发现RTX3060显卡上到速度比我到笔记本1060显卡慢4倍!!!!

这简直给我带到了新世界,于是我用tensorflow写了一个demo:

import numpy as np
import time 
import tensorflow as tfa=np.random.rand(100,100)
b=np.random.rand(100,100)
c= tf.matmul(a,b)with tf.Session() as sess:for i in range(10):t0=time.time()sess.run(c)print('time cost:{:.4f}'.format((time.time()-t0)*1000))

 3060机器测定结果:

(AI) root@face-ai:~$ nvidia-smi
Thu Jul 15 10:48:43 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3060    Off  | 00000000:02:00.0 Off |                  N/A |
| 42%   49C    P2    43W / 170W |    849MiB / 12051MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1139      G   /usr/bin/gnome-shell                4MiB |
|    0   N/A  N/A      6905      C   python3                           841MiB |
+-----------------------------------------------------------------------------+
(AI) root@face-ai:~$ python3 test.py 
2021-07-15 10:48:50.362846: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:From test.py:9: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.2021-07-15 10:48:58.212358: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-07-15 10:48:58.249094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties: 
name: GeForce RTX 3060 major: 8 minor: 6 memoryClockRate(GHz): 1.837
pciBusID: 0000:02:00.0
2021-07-15 10:48:58.249440: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-15 10:48:58.282163: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-07-15 10:48:58.288839: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-15 10:48:58.290773: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-15 10:48:58.319544: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-07-15 10:48:58.323162: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-07-15 10:48:58.326224: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-15 10:48:58.331603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Adding visible gpu devices: 0
2021-07-15 10:48:58.421741: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3499825000 Hz
2021-07-15 10:48:58.423567: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c8c5fdcc20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-07-15 10:48:58.423802: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-07-15 10:48:58.919241: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c8c606faf0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-07-15 10:48:58.919997: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 3060, Compute Capability 8.6
2021-07-15 10:48:58.923105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties: 
name: GeForce RTX 3060 major: 8 minor: 6 memoryClockRate(GHz): 1.837
pciBusID: 0000:02:00.0
2021-07-15 10:48:58.934999: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-15 10:48:58.935367: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-07-15 10:48:58.935458: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-07-15 10:48:58.935535: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-07-15 10:48:58.935604: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-07-15 10:48:58.935679: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-07-15 10:48:58.935753: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-07-15 10:48:58.937903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Adding visible gpu devices: 0
2021-07-15 10:48:58.938317: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-07-15 10:49:01.153241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-15 10:49:01.154207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212]      0 
2021-07-15 10:49:01.154511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0:   N 
2021-07-15 10:49:01.162712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9454 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3060, pci bus id: 0000:02:00.0, compute capability: 8.6)
time cost:600.3177
time cost:17.2832
time cost:3.6066
time cost:2.5594
time cost:1.3814
time cost:1.4493
time cost:1.7078
time cost:2.7463
time cost:16.8326
time cost:3.1228

1060笔记本结果

a@a-G3-3579:/media/a$ nvidia-smi
Thu Jul 15 10:50:50 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   59C    P0    24W /  N/A |    494MiB /  6078MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      4574      G   /usr/lib/xorg/Xorg                224MiB |
|    0   N/A  N/A      4777      G   /usr/bin/gnome-shell              212MiB |
|    0   N/A  N/A      5165      G   fcitx-qimpanel                     40MiB |
|    0   N/A  N/A      6374      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      6445      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      6488      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A      7201      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A     13756      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A     13799      G   /usr/lib/firefox/firefox            1MiB |
|    0   N/A  N/A     13944      G   /usr/lib/firefox/firefox            1MiB |
+-----------------------------------------------------------------------------+
a@a-G3-3579:/media/a$ python3 test.py 
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'._np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/a/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.np_resource = np.dtype([("resource", np.ubyte, 1)])
2021-07-15 10:50:56.135547: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-07-15 10:50:56.229574: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-15 10:50:56.230025: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2063ff0 executing computations on platform CUDA. Devices:
2021-07-15 10:50:56.230041: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce GTX 1060 with Max-Q Design, Compute Capability 6.1
2021-07-15 10:50:56.231739: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2021-07-15 10:50:56.232615: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x27288f0 executing computations on platform Host. Devices:
2021-07-15 10:50:56.232631: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2021-07-15 10:50:56.232716: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce GTX 1060 with Max-Q Design major: 6 minor: 1 memoryClockRate(GHz): 1.3415
pciBusID: 0000:01:00.0
totalMemory: 5.94GiB freeMemory: 5.39GiB
2021-07-15 10:50:56.232747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2021-07-15 10:50:56.233196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-15 10:50:56.233207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2021-07-15 10:50:56.233234: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2021-07-15 10:50:56.233302: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5220 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1)
time cost:58.0266
time cost:0.4869
time cost:0.3860
time cost:0.3378
time cost:0.3417
time cost:0.3548
time cost:0.2599
time cost:0.2871
time cost:0.2599
time cost:0.2649

 这个速度实在太离谱了!!!!

也许是我哪个地方设置问题,如果有大佬知道怎么优化到话还欢迎指导


http://www.ppmy.cn/news/170806.html

相关文章

nvidia-dali GPU加速预处理

当我们使用pytorch训练小模型的时候会发现GPU利用率很低&#xff0c;训练速度非常慢&#xff0c;profile发现预处理速度很慢&#xff0c;很多时候都是GPU在等CPU的数据&#xff0c;造成了严重的浪费&#xff0c;而dali就是利用GPU进行预处理&#xff0c;可以极大的提高训练的效…

PyTorch GPU环境配置 win11+NVIDIA GeForce RTX3060 Laptop GPU

一、Anaconda安装 anaconda下载地址清华镜像源下载 安装的时候跳过vscode 添加路径 二、安装显卡驱动 驱动下载地址 下载之后检查gpu运行情况 ctrlaltdelete快捷键进入任务管理器 三、安装CUDA cuda下载地址 四、下载CuDNN cudnn下载地址 将下载好的cudnn这三个文件夹复制…

win10 gtx1660ti 配置vs opencv cuda加速

VS2015OpenCV3.0.0CUDA10.0 环境&#xff1a;第一步&#xff1a;安装显卡驱动&#xff1a;461.40-desktop-win10-64bit-international-nsd-dch-whql.exe第二步&#xff1a;安装Visual Studio&#xff1a;2015第三步&#xff1a;安装CUDA&#xff1a; cuda_10.0.130_411.31_win1…

技嘉显卡性能测试软件,你好六啊!GTX 1660 Ti深度测试:升吧

近两个季度对NVIDIA来说颇为动荡&#xff0c;矿潮退去&#xff0c;业绩不断下行&#xff0c;股价也经历了过山车式的震动。随着GTX 10系列库存压力逐步减轻&#xff0c;NVIDIA也开始重新构建产品线。 今天就带来非常六的显卡GTX 1660 Ti 6GB DDR6的测试报告。这张卡基于图灵核心…

CV-CUDA高性能图像处理加速库

开源地址&#xff1a;https://github.com/CVCUDA/CV-CUDA CV-CUDA 可以集成到 C/C、Python 应用程序中&#xff0c;也可以集成到 PyTorch 等现有的深度学习框架中。 以图像背景模糊算法为例&#xff0c;将CV-CUDA替换 OpenCV作为图像预/后处理的后端&#xff0c;整个推理过程吞…

CV-CUDA: NVIDIA 官方出品高性能图像处理加速库

引言 随着短视频 APP、视频会议平台以及 VR/AR 等技术的发展&#xff0c;视频与图像已逐渐成为全球互联网流量的主要组成部分。包含我们平时接触到的这些视频图像&#xff0c;也有很多是被 AI 和计算机视觉&#xff08;CV&#xff09;算法处理并增强过的。然而&#xff0c;随着…

【AI应用】NVIDIA GeForce RTX 1080Ti的详情参数

【AI应用】NVIDIA GeForce RTX 1080Ti的详情参数 1、背景2、理论性能3、实测1、背景 NVIDIA GeForce RTX 1080Ti 主要参数: 核心频率1481 MHz(注意3060:1320 MHzTurbo频率1582 MHz(注意3060:1777 MHz)流处理单元3584核心架构Pascal(注意3060:Ampere)GPU代号GP102(注意…

【配环境】ubuntu18.04 3080ti显卡+cuda+cudnn+torch

1. 安装显卡驱动 1. (方法一)打开系统设置中的software & updates 注意&#xff1a;后期出现黑屏问题 点击 System Settings&#xff0c;选择 Additional Drivers 标签&#xff0c;下面会列出当前显卡可用的驱动版本。 注意&#xff1a;3080ti建议不要安装最新版本驱动 点…