【经验】bitsandbytes安装-LLAVA-1.5库调试
硬件配置:显卡H100
系统:Ubuntu20.04
CUDA:12.1、12.6
出现问题
===================================BUG REPORT===================================
/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Welcome to bitsandbytes. For bug reports, please runpython -m bitsandbytes
warn(msg)
================================================================================
/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: /data2/toky/CondaEnvs/llava-med did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda-12.6/1ib64')}
/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
warn(msg)
/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:166: UserWarning: /usr/local/cuda-12.6/1ib64:/usr/local/cuda/lib64: did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/10510,unix/user-Rack-Server'), PosixPath('local/user-Rack-Server')}
The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
The following directories listed in your path were found to be non-existent: {PosixPath('/org/freedesktop/DisplayManager/Session0')}
The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')}
The following directories listed in your path were found to be non-existent: {PosixPath('/org/freedesktop/DisplayManager/Seat0')}
The following directories listed in your path were found to be non-existent: {PosixPath('/org/gnome/Terminal/screen/affc481c_d625_4146_8f6c_5df0b196f137')}
The following directories listed in your path were found to be non-existent: {PosixPath('unix'), PosixPath('path=/run/user/1001/bus,guid=9c7c4e2848096e531282f19d684e975d')}
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
DEBUG: Possible options found for libcudart.so: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}
CUDA SETUP: PyTorch settings found: CUDA_VERSION=126, Highest Compute Capability: 9.0.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Required library version not found: libbitsandbytes_cuda126.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. You need to manually override the PyTorch CUDA version. Please see: "https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
2. CUDA driver not installed
3. CUDA not installed
4. You have multiple conflicting CUDA libraries
5. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
================================================================================CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=126
python setup.py install
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
File "/data2/toky/Projects/LLAVA-1.5/llava/train/train_mem.py", line 7, in <module>
import bitsandbytes as bnb
File "/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
from . import cuda_setup, utils, research
File "/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
from . import nn
File "/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
from .modules import LinearFP8Mixed, LinearFP8Global
File "/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
from bitsandbytes.optim import GlobalOptimManager
File "/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
from bitsandbytes.cextension import COMPILED_WITH_CUDA
File "/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 20, in <module>
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Please run the following command to get more information:python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
大概描述就是装的bitsandbytes库,llava1.5是两年前的代码了,需要0.41.0版本。我装的torch太新了,估计是12.6的,12.6需要一些系统编译包,xxx.so 需要ubuntu22.04里面的。
尝试:
降低CUDA版本到11.8
H100显卡环境对应的cuda和torch版本_h100 cuda版本-CSDN博客
pip install torch==2.0.0+cu118 torchaudio==2.0.0+cu118 torchvision==0.15.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
重新装torch
~/.bashrc
/etc/profile
source ~/.bashrc
还是出现问题
flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c105Error4whatEv
https://github.com/Dao-AILab/flash-attention/releases?page=1
这里找到flash-attn的包,注意,需要找到对应cuda
、torch
、python
版本的flash-attn
。虽然很麻烦,但能够弥补你使用pip安装的时间。
直接去下载了flash_attn-2.6.1+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.6.1+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
python -c "import flash_attn_2_cuda; print('FlashAttention 安装成功!')"python -c "import flash_attn_2_cuda; print('FlashAttention 安装成功!')"
Traceback (most recent call last):File "<string>", line 1, in <module>
ImportError: libc10.so: cannot open shared object file: No such file or directory这个错误表明系统找不到 PyTorch 的核心库 libc10.so,导致 FlashAttention 无法加载。这通常是由于环境变量配置不正确或 PyTorch 安装不完整引起的。以下是解决方案:find / -name "libc10.so" 2>/dev/null
find $(python -c "import torch; print(torch.__path__[0])") -name "libc10.so" 2>/dev/null
(/data2/toky/CondaEnvs/llava-med) toky@user-Rack-Server:/data2/toky/Softwares/Flash-attention$ find / -name "libc10.so" 2>/dev/null
/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/torch/lib/libc10.so根据输出,libc10.so 位于 Conda 环境目录 /data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/torch/lib/。接下来需要将此路径添加到系统库搜索路径中。export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/torch/lib/
python -c "import flash_attn_2_cuda; print('FlashAttention 安装成功!')"备选方案:使用软链接(适用于系统全局访问)
如果需要系统全局访问该库(不推荐,可能导致版本冲突):
bash
sudo ln -s /data2/toky/CondaEnvs/llava-med/lib/python3.10/site-packages/torch/lib/libc10.so /usr/lib/
sudo ldconfig又出现问题
python -c "import flash_attn_2_cuda; print('FlashAttention 安装成功!')"
Traceback (most recent call last):File "<string>", line 1, in <module>
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory2. 创建符号链接
大多数情况下,CUDA 11.x 系列的运行时库是兼容的。可以通过创建软链接让系统将 libcudart.so.11.0 指向现有版本:
bash
sudo ln -s /usr/local/cuda-11.8/lib64/libcudart.so.11.8 /usr/local/cuda-11.8/lib64/libcudart.so.11.0