当前位置: 首页 > news >正文

【经验】bitsandbytes安装-LLAVA-1.5库调试

硬件配置:显卡H100

系统:Ubuntu20.04

CUDA:12.1、12.6

出现问题

===================================BUG REPORT===================================
/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes


  warn(msg)
================================================================================
/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: /data2/toky/CondaEnvs/llava-med did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda-12.6/1ib64')}
/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
  warn(msg)
/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: /usr/local/cuda-12.6/1ib64:/usr/local/cuda/lib64: did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/10510,unix/user-Rack-Server'), PosixPath('local/user-Rack-Server')}
The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
The following directories listed in your path were found to be non-existent: {PosixPath('/org/freedesktop/DisplayManager/Session0')}
The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')}
The following directories listed in your path were found to be non-existent: {PosixPath('/org/freedesktop/DisplayManager/Seat0')}
The following directories listed in your path were found to be non-existent: {PosixPath('/org/gnome/Terminal/screen/affc481c_d625_4146_8f6c_5df0b196f137')}
The following directories listed in your path were found to be non-existent: {PosixPath('unix'), PosixPath('path=/run/user/1001/bus,guid=9c7c4e2848096e531282f19d684e975d')}
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
DEBUG: Possible options found for libcudart.so: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}
CUDA SETUP: PyTorch settings found: CUDA_VERSION=126, Highest Compute Capability: 9.0.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Required library version not found: libbitsandbytes_cuda126.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. You need to manually override the PyTorch CUDA version. Please see: "https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
2. CUDA driver not installed
3. CUDA not installed
4. You have multiple conflicting CUDA libraries
5. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
================================================================================

CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=126
python setup.py install
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
  File "/data2/toky/Projects/LLAVA-1.5/llava/train/train_mem.py", line 7, in <module>
    import bitsandbytes as bnb
  File "/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError:
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

大概描述就是装的bitsandbytes库,llava1.5是两年前的代码了,需要0.41.0版本。我装的torch太新了,估计是12.6的,12.6需要一些系统编译包,xxx.so 需要ubuntu22.04里面的。

尝试:

降低CUDA版本到11.8

H100显卡环境对应的cuda和torch版本_h100 cuda版本-CSDN博客

pip install torch==2.0.0+cu118 torchaudio==2.0.0+cu118 torchvision==0.15.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

重新装torch

~/.bashrc

/etc/profile

source ~/.bashrc

还是出现问题

flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c105Error4whatEv

https://github.com/Dao-AILab/flash-attention/releases?page=1

这里找到flash-attn的包,注意,需要找到对应cudatorchpython版本的flash-attn。虽然很麻烦,但能够弥补你使用pip安装的时间。

直接去下载了flash_attn-2.6.1+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

pip install flash_attn-2.6.1+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
python -c "import flash_attn_2_cuda; print('FlashAttention 安装成功!')"python -c "import flash_attn_2_cuda; print('FlashAttention 安装成功!')"
Traceback (most recent call last):File "<string>", line 1, in <module>
ImportError: libc10.so: cannot open shared object file: No such file or directory这个错误表明系统找不到 PyTorch 的核心库 libc10.so,导致 FlashAttention 无法加载。这通常是由于环境变量配置不正确或 PyTorch 安装不完整引起的。以下是解决方案:find / -name "libc10.so" 2>/dev/null
find $(python -c "import torch; print(torch.__path__[0])") -name "libc10.so" 2>/dev/null
(/data2/toky/CondaEnvs/llava-med) toky@user-Rack-Server:/data2/toky/Softwares/Flash-attention$ find / -name "libc10.so" 2>/dev/null
/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/torch/lib/libc10.so根据输出,libc10.so 位于 Conda 环境目录 /data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/torch/lib/。接下来需要将此路径添加到系统库搜索路径中。export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/torch/lib/
python -c "import flash_attn_2_cuda; print('FlashAttention 安装成功!')"备选方案:使用软链接(适用于系统全局访问)
如果需要系统全局访问该库(不推荐,可能导致版本冲突):
bash
sudo ln -s /data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/torch/lib/libc10.so /usr/lib/
sudo ldconfig又出现问题
python -c "import flash_attn_2_cuda; print('FlashAttention 安装成功!')"
Traceback (most recent call last):File "<string>", line 1, in <module>
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory2. 创建符号链接
大多数情况下,CUDA 11.x 系列的运行时库是兼容的。可以通过创建软链接让系统将 libcudart.so.11.0 指向现有版本:
bash
sudo ln -s /usr/local/cuda-11.8/lib64/libcudart.so.11.8 /usr/local/cuda-11.8/lib64/libcudart.so.11.0

http://www.lqws.cn/news/563005.html

相关文章:

  • 【数据标注师】分类标注
  • AD 学习笔记——第一章 系统的安装及参数设置
  • 一个简单测试Deepseek吞吐量的脚本,国内环境可跑
  • 印度和澳洲的地理因素
  • 西门子S7-200 SMART PLC:小型自动化领域的高效之选
  • 数据库(MYsql)
  • Qt-Advanced-Docking-System 关闭、禁止拖动、最大化按钮等设置
  • 从静态到动态:Web渲染模式的演进和突破
  • Spring Cloud:高级特性与最佳实践
  • 布林带的使用
  • 华为云Flexus+DeepSeek征文 |华为云ModelArts Studio集成OpenAI Translator:开启桌面级AI翻译新时代
  • Pytest自动化测试执行环境切换的2种解决方案
  • Linux基本命令篇 —— less命令
  • c++学习(四、引用)
  • ClickHouse基础知识
  • 【编译原理】期末
  • 14-C#的弹出的窗口输入与输出
  • 在C++中#pragma“可选预处理指令的作用“。
  • C++泛型编程1 - 函数模板
  • PyQtNode Editor 第三篇创建节点(节点的定义)
  • 电子电气架构 --- 车辆产品的生产周期和研发周
  • 路由器对不同数据帧的处理
  • WebRTC(十一):RTCP和SRTCP
  • 黑客入门 | 用ROP和shellcode攻击SolarWinds Serv-U SSH漏洞
  • 【云桌面容器KasmVNC】如何关闭SSL使用HTTP
  • Pycatia二次开发基础代码解析:面属性控制、视图定向与特征统计的工业级实现
  • HashMap 和 ConcurrentHashMap的区别
  • 数据结构之——顺序栈与链式栈
  • 【图像处理基石】什么是摄影的数码味?
  • Redis—主从复制