【问题解决】DockerError: Container exited with code 1: Error loading model: GPU sm

文章目录

【问题解决】DockerError: Container exited with code 1: Error loading model: GPU sm_86 is not supported by the compiled kernel
- 问题描述
- 问题原因
- 解决方案
- - 方案 1：使用支持多种 GPU 架构的容器镜像
  - 方案 2：在容器中重新编译内核
  - 方案 3：指定 GPU 架构运行容器
  - 方案 4：更新主机 GPU 驱动程序
  - 方案 5：使用不同的量化方法
  - 方案 6：构建自定义容器镜像
  - - Dockerfile 示例
  - 方案 7：检查并更新 CUDA 版本
- 示例代码
- - 完整的 Docker 容器配置和运行示例
  - - 1. 构建支持 sm_86 的容器镜像
    - 2. 创建应用程序 `app.py`
    - 3. 构建和运行容器
- 常见问题
- - Q: 什么是 GPU 架构代码？
  - Q: 如何确定我的 GPU 架构？
  - Q: 为什么容器中的内核不支持我的 GPU 架构？
  - Q: 重新编译内核需要多长时间？
  - Q: 除了重新编译内核，还有其他方法吗？
- 总结

【问题解决】DockerError: Container exited with code 1: Error loading model: GPU sm_86 is not supported by the compiled kernel

问题描述

在 Docker 容器中运行模型时，遇到以下错误：

DockerError: Container exited with code 1: Error loading model: GPU sm_86 is not supported by the compiled kernel

问题原因

这个错误通常由以下原因引起：

GPU 架构不支持：容器中编译的内核不支持当前 GPU 的架构（sm_86 对应 NVIDIA Ampere 架构，如 RTX 30 系列）
CUDA 版本不匹配：容器中使用的 CUDA 版本与 GPU 架构不兼容
内核编译问题：容器中的内核是为特定 GPU 架构编译的，不支持其他架构
驱动程序问题：主机的 GPU 驱动程序版本过低，不支持当前 GPU 架构
容器镜像问题：使用的容器镜像没有正确配置以支持多种 GPU 架构
量化库问题：使用的量化库（如 GPTQ、AWQ）编译时没有支持当前 GPU 架构

解决方案

方案 1：使用支持多种 GPU 架构的容器镜像

选择支持多种 GPU 架构的容器镜像，特别是包含针对不同架构预编译内核的镜像：

# 使用官方 PyTorch 镜像，它支持多种 GPU 架构dockerpull pytorch/pytorch:latest# 或使用包含量化库的镜像dockerpull thebloke/cuda11.8.0-runtime-ubuntu20.04

方案 2：在容器中重新编译内核

进入容器并重新编译支持当前 GPU 架构的内核：

# 进入运行中的容器dockerexec-it container_namebash# 安装必要的依赖pipinstall--upgrade auto-gptq# 重新编译内核（针对 sm_86）python -c"from auto_gptq import exllama_kernels; exllama_kernels.build_kernels(arch='86')"

方案 3：指定 GPU 架构运行容器

在运行容器时，指定当前 GPU 的架构：

# 运行容器并设置 GPU 架构环境变量dockerrun --gpus all -eTORCH_CUDA_ARCH_LIST="8.6"your-image

方案 4：更新主机 GPU 驱动程序

确保主机的 GPU 驱动程序支持当前 GPU 架构：

# 检查当前驱动版本nvidia-smi# 访问 NVIDIA 官网下载最新驱动# https://www.nvidia.com/Download/index.aspx

方案 5：使用不同的量化方法

如果是量化库的问题，可以尝试使用不同的量化方法：

# 使用 GPTQ 量化（需要支持 sm_86）fromtransformersimportAutoModelForCausalLM model=AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GPTQ",device_map="auto",trust_remote_code=True)# 或使用 FP16 而不是量化model=AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf",device_map="auto",torch_dtype="auto")

方案 6：构建自定义容器镜像

构建支持当前 GPU 架构的自定义容器镜像：

Dockerfile 示例

FROM pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime # 安装必要的依赖 RUN pip install --upgrade pip && \npip install transformers auto-gptq # 设置环境变量 ENV TORCH_CUDA_ARCH_LIST="8.6" # 重新编译内核 RUN python -c "from auto_gptq import exllama_kernels; exllama_kernels.build_kernels()" # 设置工作目录 WORKDIR /app # 复制应用程序 COPY . /app # 运行命令 CMD ["python", "app.py"]

方案 7：检查并更新 CUDA 版本

确保容器中使用的 CUDA 版本支持当前 GPU 架构：

# 检查容器中的 CUDA 版本dockerexeccontainer_name nvcc --version# 检查 GPU 架构dockerexeccontainer_name nvidia-smi --query-gpu=compute_cap --format=csv

示例代码

完整的 Docker 容器配置和运行示例

1. 构建支持 sm_86 的容器镜像

创建Dockerfile：

FROM pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime # 安装系统依赖 RUN apt-get update && apt-get install -y --no-install-recommends \ git \ && rm -rf /var/lib/apt/lists/* # 安装 Python 依赖 RUN pip install --upgrade pip && \npip install \ transformers \ auto-gptq[triton] \ accelerate \ datasets # 设置环境变量 ENV TORCH_CUDA_ARCH_LIST="8.6" ENV PYTHONUNBUFFERED=1 # 重新编译内核以支持 sm_86 RUN python -c "from auto_gptq import exllama_kernels; exllama_kernels.build_kernels(arch='86')" # 设置工作目录 WORKDIR /app # 复制应用程序 COPY app.py /app/ # 运行应用 CMD ["python", "app.py"]

2. 创建应用程序`app.py`

fromtransformersimportAutoTokenizerfromauto_gptqimportAutoGPTQForCausalLMimporttorchprint("=== Initializing ===")print(f"CUDA available:{torch.cuda.is_available()}")iftorch.cuda.is_available():print(f"CUDA version:{torch.version.cuda}")print(f"GPU:{torch.cuda.get_device_name(0)}")print(f"GPU arch:{torch.cuda.get_device_capability(0)}")print("\n=== Loading model ===")# 加载 GPTQ 量化模型tokenizer=AutoTokenizer.from_pretrained("TheBloke/Llama-2-7B-GPTQ")model=AutoGPTQForCausalLM.from_quantized("TheBloke/Llama-2-7B-GPTQ",device_map="auto",use_safetensors=True,trust_remote_code=True)print("\n=== Model loaded successfully ===")print(f"Model type:{type(model)}")print("\n=== Testing model ===")# 测试生成prompt="Hello, how are you?"inputs=tokenizer(prompt,return_tensors="pt").to(model.device)withtorch.no_grad():outputs=model.generate(**inputs,max_new_tokens=50,temperature=0.7,top_p=0.95)generated_text=tokenizer.decode(outputs[0],skip_special_tokens=True)print(f"Prompt:{prompt}")print(f"Generated:{generated_text}")print("\n=== Done ===")

3. 构建和运行容器

# 构建镜像dockerbuild -t gptq-sm86.# 运行容器dockerrun --gpus all gptq-sm86

常见问题

Q: 什么是 GPU 架构代码？

A: GPU 架构代码（如 sm_86）是 NVIDIA 对不同 GPU 架构的标识。sm_86 对应 NVIDIA Ampere 架构，用于 RTX 30 系列、A100 等 GPU。

Q: 如何确定我的 GPU 架构？

A: 可以使用nvidia-smi --query-gpu=compute_cap --format=csv命令查看 GPU 的计算能力，然后参考 NVIDIA 文档映射到架构代码。

Q: 为什么容器中的内核不支持我的 GPU 架构？

A: 容器中的内核通常是为特定架构编译的，以提高性能。如果容器是为旧架构（如 sm_75）编译的，它可能不支持新架构（如 sm_86）。

Q: 重新编译内核需要多长时间？

A: 重新编译内核通常需要几分钟时间，具体取决于容器的计算资源。

Q: 除了重新编译内核，还有其他方法吗？

A: 可以尝试使用不同的容器镜像，或使用 FP16 精度而不是量化，这样就不需要依赖特定架构的内核。

总结

遇到DockerError: Container exited with code 1: Error loading model: GPU sm_86 is not supported by the compiled kernel错误时，主要需要：

使用支持多种 GPU 架构的容器镜像
在容器中重新编译支持当前 GPU 架构的内核
指定 GPU 架构运行容器
更新主机 GPU 驱动程序
尝试使用不同的量化方法或精度
构建自定义容器镜像以支持特定 GPU 架构

通过以上解决方案，大部分情况下都能成功解决 GPU 架构不支持的问题，顺利在 Docker 容器中运行模型。

【问题解决】DockerError: Container exited with code 1: Error loading model: GPU sm_86 is not supported by

文章目录