运行torch_xla时,提示找不到cuda相关库(torchvision版本错误)

news/2024/11/8 6:05:51/

环境

  • pytorch 2.0.0(+cuda)
  • cuda 11.7
  • torch-xla 2.0.0
  • tensorflow 2.11.1

错误信息

明明cuda所有相关的库均存在,却提示不能加载动态库,仔细查看错误信息,是由于找不到此符号,从而引发的错误:

torch::jit::parseSchemaOrName(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)

(py3.10) [jack@td09 /mnt/data/jack/workspace/pytorch/torch_xla]# python resnet50_infer.py
/mnt/data/jack/anaconda3/envs/py3.10/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/mnt/data/jack/anaconda3/envs/py3.10/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?warn(
2023-12-16 14:20:57.896978: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: /mnt/data/jack/anaconda3/envs/py3.10/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE; LD_LIBRARY_PATH: /usr/local/cuda-11.7/lib64
2023-12-16 14:20:57.897105: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: /mnt/data/jack/anaconda3/envs/py3.10/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE; LD_LIBRARY_PATH: /usr/local/cuda-11.7/lib64
2023-12-16 14:20:57.897166: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: /mnt/data/jack/anaconda3/envs/py3.10/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE; LD_LIBRARY_PATH: /usr/local/cuda-11.7/lib64
2023-12-16 14:20:57.897237: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: /mnt/data/jack/anaconda3/envs/py3.10/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE; LD_LIBRARY_PATH: /usr/local/cuda-11.7/lib64
2023-12-16 14:20:57.898062: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: /mnt/data/jack/anaconda3/envs/py3.10/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE; LD_LIBRARY_PATH: /usr/local/cuda-11.7/lib64
2023-12-16 14:20:57.898158: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: /mnt/data/jack/anaconda3/envs/py3.10/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE; LD_LIBRARY_PATH: /usr/local/cuda-11.7/lib64
2023-12-16 14:20:57.898173: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2023-12-16 14:20:57.996329: F tensorflow/tsl/platform/default/env.cc:74] Check failed: ret == 0 (11 vs. 0)Thread GrpcWorkerEnvPool creation via pthread_create() failed.
Aborted (core dumped)
(py3.10) [jack@td09 /mnt/data/jack/workspace/pytorch/torch_xla]# c++filt _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
torch::jit::parseSchemaOrName(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
(py3.10) [jack@td09 /mnt/data/jack/workspace/pytorch/torch_xla]# pip list | grep torch
torch              2.0.0a0+gitec54f40 /mnt/data/jack/workspace/pytorch
torch-xla          2.0.0              /mnt/data/jack/workspace/pytorch/torch_xla
torchvision        0.15.2a0
(py3.10) [jack@td09 /mnt/data/jack/workspace/pytorch/torch_xla]#
(py3.10) [jack@td09 /mnt/data/jack/workspace/pytorch/torch_xla]#
(py3.10) [jack@td09 /mnt/data/jack/workspace/pytorch/torch_xla]# ls /usr/local/cuda-11.7/lib64
cmake                           libcudnn_cnn_train.so           libcufile.so.0               libnppial.so.11.7.4.75   libnppitc.so.11.7.4.75
libaccinj64.so                  libcudnn_cnn_train.so.8         libcufile.so.1.3.1           libnppial_static.a       libnppitc_static.a
libaccinj64.so.11.7             libcudnn_cnn_train.so.8.9.7     libcufile_static.a           libnppicc.so             libnpps.so
libaccinj64.so.11.7.101         libcudnn_cnn_train_static.a     libcufilt.a                  libnppicc.so.11          libnpps.so.11
libcublasLt.so                  libcudnn_cnn_train_static_v8.a  libcuinj64.so                libnppicc.so.11.7.4.75   libnpps.so.11.7.4.75
libcublasLt.so.11               libcudnn_ops_infer.so           libcuinj64.so.11.7           libnppicc_static.a       libnpps_static.a
libcublasLt.so.11.10.3.66       libcudnn_ops_infer.so.8         libcuinj64.so.11.7.101       libnppidei.so            libnvblas.so
libcublasLt_static.a            libcudnn_ops_infer.so.8.9.7     libculibos.a                 libnppidei.so.11         libnvblas.so.11
libcublas.so                    libcudnn_ops_infer_static.a     libcurand.so                 libnppidei.so.11.7.4.75  libnvblas.so.11.10.3.66
libcublas.so.11                 libcudnn_ops_infer_static_v8.a  libcurand.so.10              libnppidei_static.a      libnvjpeg.so
libcublas.so.11.10.3.66         libcudnn_ops_train.so           libcurand.so.10.2.10.91      libnppif.so              libnvjpeg.so.11
libcublas_static.a              libcudnn_ops_train.so.8         libcurand_static.a           libnppif.so.11           libnvjpeg.so.11.8.0.2
libcudadevrt.a                  libcudnn_ops_train.so.8.9.7     libcusolver_lapack_static.a  libnppif.so.11.7.4.75    libnvjpeg_static.a
libcudart.so                    libcudnn_ops_train_static.a     libcusolverMg.so             libnppif_static.a        libnvptxcompiler_static.a
libcudart.so.11.0               libcudnn_ops_train_static_v8.a  libcusolverMg.so.11          libnppig.so              libnvrtc-builtins.so
libcudart.so.11.7.99            libcudnn.so                     libcusolverMg.so.11.4.0.1    libnppig.so.11           libnvrtc-builtins.so.11.7
libcudart_static.a              libcudnn.so.8                   libcusolver.so               libnppig.so.11.7.4.75    libnvrtc-builtins.so.11.7.99
libcudnn_adv_infer.so           libcudnn.so.8.9.7               libcusolver.so.11            libnppig_static.a        libnvrtc-builtins_static.a
libcudnn_adv_infer.so.8         libcufft.so                     libcusolver.so.11.4.0.1      libnppim.so              libnvrtc.so
libcudnn_adv_infer.so.8.9.7     libcufft.so.10                  libcusolver_static.a         libnppim.so.11           libnvrtc.so.11.2
libcudnn_adv_infer_static.a     libcufft.so.10.7.2.91           libcusparse.so               libnppim.so.11.7.4.75    libnvrtc.so.11.7.99
libcudnn_adv_infer_static_v8.a  libcufft_static.a               libcusparse.so.11            libnppim_static.a        libnvrtc_static.a
libcudnn_adv_train.so           libcufft_static_nocallback.a    libcusparse.so.11.7.4.91     libnppist.so             libnvToolsExt.so
libcudnn_adv_train.so.8         libcufftw.so                    libcusparse_static.a         libnppist.so.11          libnvToolsExt.so.1
libcudnn_adv_train.so.8.9.7     libcufftw.so.10                 liblapack_static.a           libnppist.so.11.7.4.75   libnvToolsExt.so.1.0.0
libcudnn_adv_train_static.a     libcufftw.so.10.7.2.91          libmetis_static.a            libnppist_static.a       libOpenCL.so
libcudnn_adv_train_static_v8.a  libcufftw_static.a              libnppc.so                   libnppisu.so             libOpenCL.so.1
libcudnn_cnn_infer.so           libcufile_rdma.so               libnppc.so.11                libnppisu.so.11          libOpenCL.so.1.0
libcudnn_cnn_infer.so.8         libcufile_rdma.so.1             libnppc.so.11.7.4.75         libnppisu.so.11.7.4.75   libOpenCL.so.1.0.0
libcudnn_cnn_infer.so.8.9.7     libcufile_rdma.so.1.3.1         libnppc_static.a             libnppisu_static.a       stubs
libcudnn_cnn_infer_static.a     libcufile_rdma_static.a         libnppial.so                 libnppitc.so
libcudnn_cnn_infer_static_v8.a  libcufile.so                    libnppial.so.11              libnppitc.so.11
(py3.10) [jack@td09 /mnt/data/jack/workspace/pytorch/torch_xla]#

更换torchvision版本

注意到torchvision,对比docker下的正常环境,发现torchvision版本略有差异,大胆猜测是torchvision导致,直接下手干!

(py3.10) [jack@td09 /mnt/data/jack/workspace/pytorch/torch_xla]# pip install torchvision==0.15.0
Collecting torchvision==0.15.0Using cached torchvision-0.15.0-cp310-cp310-manylinux1_x86_64.whl (6.0 MB)
Requirement already satisfied: numpy in /mnt/data/jack/anaconda3/envs/py3.10/lib/python3.10/site-packages (from torchvision==0.15.0) (1.26.2)
Requirement already satisfied: requests in /mnt/data/jack/anaconda3/envs/py3.10/lib/python3.10/site-packages (from torchvision==0.15.0) (2.31.0)
INFO: pip is looking at multiple versions of torchvision to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement torch==2.0.0+cu117 (from torchvision) (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2)
ERROR: No matching distribution found for torch==2.0.0+cu117
(py3.10) [jack@td09 /mnt/data/jack/workspace/pytorch/torch_xla]# pip install --no-deps torchvision==0.15.1
Collecting torchvision==0.15.1Using cached torchvision-0.15.1-cp310-cp310-manylinux1_x86_64.whl (6.0 MB)
Installing collected packages: torchvision
Successfully installed torchvision-0.15.1
(py3.10) [jack@td09 /mnt/data/jack/workspace/pytorch/torch_xla]#
(py3.10) [jack@td09 /mnt/data/jack/workspace/pytorch/torch_xla]# pip list | grep torch
torch              2.0.0a0+gitec54f40 /mnt/data/jack/workspace/pytorch
torch-xla          2.0.0              /mnt/data/jack/workspace/pytorch/torch_xla
torchvision        0.15.1
(py3.10) [jack@td09 /mnt/data/jack/workspace/pytorch/torch_xla]#

参考链接

refering here for torch==2.0.0+cu117


http://www.ppmy.cn/news/1272219.html

相关文章

Julia调用Matlab, Python以及R的微分方程求解器

文章目录 从其他语言翻译来的求解器重新封装版本 SciML教程系列&#xff1a; Julia求解常微分方程解Lorentz方程求解简谐振动的微分方程求解单摆 从其他语言翻译来的求解器 对于熟悉MATLAB/Python/R的程序员&#xff0c;可先使用下表中的求解器&#xff0c;因为这些求解器是…

[pasecactf_2019]flask_ssti proc ssti config

其实这个很简单 Linux的/proc/self/学习-CSDN博客 首先ssti 直接fenjing一把锁了 这里被加密后 存储在 config中了 然后我们去config中查看即可 {{config}} 可以获取到flag的值 -M7\x10wd94\x02!-\x0eL\x0c;\x07(DKO\r\x17!2R4\x02\rO\x0bsT#-\x1cZ\x1dG然后就可以写代码解…

opencv中叠加Sobel算子与Laplacian算子实现边缘检测

1 边缘检测介绍 图像边缘检测技术是图像处理和计算机视觉等领域最基本的问题&#xff0c;也是经典的技术难题之一。如何快速、精确地提取图像边缘信息&#xff0c;一直是国内外的研究热点&#xff0c;同时边缘的检测也是图像处理中的一个难题。早期的经典算法包括边缘算子方法…

自动化测试(二)selenium八大获取元素方法及对象操作

目录 webdriver API 脚本实例 元素的定位 1. 通过id定位&#xff1a; 2. 通过name方式定位 3. 通过tag name&#xff08;标签名&#xff09;定位 4. 通过class name &#xff08;类名&#xff09;方式定位 5. 通过CSS 方式定位 6. 通过xpath方式定位 7. link text定位 8. Parti…

K8s(九)—volume.md

目录 volumeconfigMap介绍官网例子基于文件生成 ConfigMap使用 ConfigMap 数据定义容器环境变量使用单个 ConfigMap 中的数据定义容器环境变量 EmptyDirhostPathhostPath 配置示例 nfspersistentVolumeClaim volume https://kubernetes.io/zh-cn/docs/concepts/storage/volume…

HiveSql语法优化四 :Bucket Map Join和Sort Merge Bucket Map Join优化

Bucket Map Join 之前的map join适用场景是大表join小表的情况&#xff0c;但是两张表都相对较大&#xff0c;若采用普通的Map Join算法&#xff0c;则Map端需要较多的内存来缓存数据&#xff0c;当然可以选择为Map段分配更多的内存&#xff0c;来保证任务运行成功。但是&#…

谷歌的开源供应链安全

本内容是对Go项目负责人Russ Cox 在 ACM SCORED 活动上演讲内容[1]的摘录与整理。 SCORED 是Software Supply Chain Offensive Research and Ecosystem Defenses的简称, SCORED 23[2]于2023年11月30日在丹麦哥本哈根及远程参会形式举行。 摘要 &#x1f4a1; 谷歌在开源软件供应…

python如何通过自身日志系统读写日志文件

在Python中&#xff0c;可以使用logging模块来实现日志的读写操作。 首先&#xff0c;在代码中引入logging模块&#xff1a; import logging然后&#xff0c;创建一个日志记录器&#xff0c;你可以指定记录器的名称&#xff0c;这样你就可以在代码中通过名称来获取这个日志记…