保姆级教程：用GLM-4v-9b搭建中英双语多轮对话机器人-开发者社区

保姆级教程：用GLM-4v-9b搭建中英双语多轮对话机器人

1. 为什么选择GLM-4v-9b构建多模态对话系统

在当前多模态AI应用快速发展的背景下，构建一个既能理解文本又能处理图像的中英双语对话机器人，已经成为许多开发者的核心需求。GLM-4v-9b作为智谱AI于2024年开源的90亿参数视觉-语言多模态模型，恰好填补了这一技术空白。

与其他主流多模态模型相比，GLM-4v-9b有三个显著优势：首先，它原生支持1120×1120高分辨率输入，这意味着在处理包含小字、复杂表格或精细截图的图像时，细节保留度远超同类产品；其次，它在中文场景下的OCR与图表理解能力经过专门优化，在中文文档解析、财务报表分析等实际业务中表现突出；最后，它的部署门槛极低——单张RTX 4090显卡即可全速推理，INT4量化后仅需9GB显存，让个人开发者也能轻松上手。

更重要的是，GLM-4v-9b不是简单的"图片+文字"拼接模型，而是基于GLM-4-9B语言底座，通过端到端训练加入视觉编码器，并采用图文交叉注意力机制实现深度对齐。这种架构设计使得模型不仅能回答"这张图里有什么"，还能进行跨模态推理，比如"根据这张销售数据图表，预测下季度增长趋势"。

本教程将带你从零开始，避开常见的部署陷阱，一步步搭建一个真正可用的中英双语多轮对话机器人。无论你是刚接触多模态AI的新手，还是希望快速验证业务想法的工程师，都能通过这篇教程获得可立即落地的解决方案。

2. 环境准备与模型部署

2.1 硬件与软件基础要求

GLM-4v-9b对硬件的要求相对友好，但为了获得最佳体验，我们建议以下配置：

GPU：NVIDIA RTX 4090（24GB显存）或更高配置。如果使用INT4量化版本，RTX 3090（24GB）也可满足基本需求
CPU：Intel i7-10700K或AMD Ryzen 7 5800X及以上
内存：32GB DDR4及以上
存储：至少50GB可用空间（模型权重约18GB，加上缓存和临时文件）

软件环境方面，我们需要安装以下组件：

# 创建独立的conda环境（推荐） conda create -n glm4v-env python=3.10 conda activate glm4v-env # 安装核心依赖 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 pip install transformers accelerate bitsandbytes pillow requests gradio streamlit pip install vllm # 如需高性能推理

2.2 模型下载与存储优化

GLM-4v-9b模型权重可通过Hugging Face直接获取，但直接下载可能遇到网络问题。我们推荐两种更稳定的获取方式：

方式一：使用huggingface-cli命令行工具

# 首先安装huggingface-cli pip install huggingface-hub # 登录Hugging Face账号（如未登录） huggingface-cli login # 下载模型（自动处理分片和缓存） huggingface-cli download THUDM/glm-4v-9b --local-dir ./glm-4v-9b-model --revision main

方式二：手动下载并验证完整性

# 创建模型目录 mkdir -p ./glm-4v-9b-model # 下载关键文件（使用浏览器或wget） # 访问 https://huggingface.co/THUDM/glm-4v-9b/tree/main # 下载以下文件到 ./glm-4v-9b-model 目录： # - config.json # - model.safetensors（或pytorch_model.bin） # - tokenizer.model # - tokenizer_config.json # - special_tokens_map.json # 验证文件完整性（可选） sha256sum ./glm-4v-9b-model/model.safetensors

2.3 三种部署方案对比与选择

根据你的使用场景，我们提供三种不同的部署方案：

方案	适用场景	显存占用	启动速度	推理速度	备注
Transformers原生	快速验证、调试开发	~18GB (FP16)	中等	中等	兼容性最好，支持所有功能
vLLM加速	生产环境、高并发	~9GB (INT4)	较快	最快	需要额外安装vLLM，不支持部分多模态功能
llama.cpp GGUF	CPU推理、边缘设备	~9GB (Q4_K_M)	慢	慢	适合无GPU环境，但多模态支持有限

对于本教程，我们推荐从Transformers原生方案开始，因为它能完整支持GLM-4v-9b的所有多模态功能，且调试最为方便。

3. 核心代码实现：从零构建对话机器人

3.1 基础对话功能实现

让我们从最基础的文本对话开始，创建一个能够处理中英双语的聊天机器人。以下是核心代码：

# chat_bot_basic.py import torch from transformers import AutoTokenizer, AutoModel from PIL import Image import requests from io import BytesIO class GLM4vChatBot: def __init__(self, model_path="./glm-4v-9b-model"): """初始化GLM-4v-9b对话机器人""" print("正在加载GLM-4v-9b模型...") self.tokenizer = AutoTokenizer.from_pretrained( model_path, trust_remote_code=True, encode_special_tokens=True ) self.model = AutoModel.from_pretrained( model_path, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16 ).eval() print("模型加载完成！") def chat(self, user_input, history=None, max_length=1024, temperature=0.6, top_p=0.8): """执行单轮对话""" if history is None: history = [] # 构建消息历史 messages = [] for user_msg, assistant_msg in history: messages.append({"role": "user", "content": user_msg}) if assistant_msg: messages.append({"role": "assistant", "content": assistant_msg}) # 添加当前用户输入 messages.append({"role": "user", "content": user_input}) # 应用聊天模板 model_inputs = self.tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_tensors="pt" ).to(self.model.device) # 生成响应 outputs = self.model.generate( input_ids=model_inputs, max_new_tokens=max_length, do_sample=True, temperature=temperature, top_p=top_p, repetition_penalty=1.2, eos_token_id=self.model.config.eos_token_id ) # 解码响应 response = self.tokenizer.decode(outputs[0][model_inputs.shape[1]:], skip_special_tokens=True) return response.strip() # 使用示例 if __name__ == "__main__": bot = GLM4vChatBot() # 中文对话测试 response_zh = bot.chat("你好，今天天气怎么样？") print(f"中文响应: {response_zh}") # 英文对话测试 response_en = bot.chat("Hello, how's the weather today?") print(f"英文响应: {response_en}")

这段代码实现了GLM-4v-9b的基础对话功能，支持中英双语无缝切换。注意几个关键点：

trust_remote_code=True是必需的，因为GLM系列模型使用了自定义的模型架构
torch_dtype=torch.bfloat16在保持精度的同时减少了显存占用
device_map="auto"让Hugging Face自动分配模型到可用设备

3.2 多模态对话功能增强

现在让我们为机器人添加图像理解能力，使其真正成为"多模态"对话系统：

# chat_bot_multimodal.py import torch from transformers import AutoTokenizer, AutoModel from PIL import Image import requests from io import BytesIO class GLM4vMultimodalBot: def __init__(self, model_path="./glm-4v-9b-model"): """初始化多模态对话机器人""" self.tokenizer = AutoTokenizer.from_pretrained( model_path, trust_remote_code=True, encode_special_tokens=True ) self.model = AutoModel.from_pretrained( model_path, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16 ).eval() def load_image(self, image_source): """加载图像：支持本地路径、URL或PIL Image对象""" if isinstance(image_source, str): if image_source.startswith(('http://', 'https://')): # 从URL加载 response = requests.get(image_source) image = Image.open(BytesIO(response.content)).convert("RGB") else: # 从本地路径加载 image = Image.open(image_source).convert("RGB") elif isinstance(image_source, Image.Image): # 已经是PIL Image image = image_source.convert("RGB") else: raise ValueError("不支持的图像源类型") return image def multimodal_chat(self, user_input, image_source=None, history=None, max_length=1024, temperature=0.6, top_p=0.8): """执行多模态对话""" if history is None: history = [] # 构建消息历史 messages = [] for user_msg, assistant_msg in history: messages.append({"role": "user", "content": user_msg}) if assistant_msg: messages.append({"role": "assistant", "content": assistant_msg}) # 准备当前输入 if image_source is not None: image = self.load_image(image_source) messages.append({ "role": "user", "content": user_input, "image": image }) else: messages.append({"role": "user", "content": user_input}) # 应用聊天模板 model_inputs = self.tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_tensors="pt" ).to(self.model.device) # 生成响应 outputs = self.model.generate( **model_inputs, max_new_tokens=max_length, do_sample=True, temperature=temperature, top_p=top_p, repetition_penalty=1.2, eos_token_id=self.model.config.eos_token_id ) # 解码响应 response = self.tokenizer.decode(outputs[0][model_inputs.input_ids.shape[1]:], skip_special_tokens=True) return response.strip() # 使用示例 if __name__ == "__main__": bot = GLM4vMultimodalBot() # 纯文本对话 text_response = bot.multimodal_chat("请用中文简要介绍人工智能的发展历程") print(f"文本响应: {text_response}") # 图像对话（需要准备一张图片） # image_response = bot.multimodal_chat( # "这张图展示了什么内容？请用中英文分别描述", # image_source="./sample.jpg" # ) # print(f"图像响应: {image_response}")

这个多模态版本的关键改进在于：

支持多种图像输入方式（本地文件、网络URL、PIL Image对象）
自动处理图像格式转换（确保为RGB模式）
在消息结构中正确嵌入图像信息
保持与纯文本对话相同的API接口，便于集成

3.3 Web界面搭建：Gradio快速实现

为了让对话机器人更易于使用，我们使用Gradio创建一个直观的Web界面：

# web_interface.py import gradio as gr from chat_bot_multimodal import GLM4vMultimodalBot import torch # 初始化机器人 bot = GLM4vMultimodalBot() def chat_with_image(user_input, image_input, history): """处理带图像的对话""" try: if image_input is not None: # 将Gradio上传的图像转换为PIL格式 from PIL import Image import numpy as np pil_image = Image.fromarray(np.array(image_input)) # 执行多模态对话 response = bot.multimodal_chat( user_input, image_source=pil_image, history=history ) else: # 纯文本对话 response = bot.multimodal_chat( user_input, history=history ) # 更新历史记录 history.append((user_input, response)) return history, "", None except Exception as e: error_msg = f"错误: {str(e)}" history.append((user_input, error_msg)) return history, "", None def clear_history(): """清空对话历史""" return [], None # 创建Gradio界面 with gr.Blocks(title="GLM-4v-9b多模态对话机器人") as demo: gr.Markdown("# GLM-4v-9b中英双语多模态对话机器人") gr.Markdown("支持文本对话与图像理解，原生支持1120×1120高分辨率输入") with gr.Row(): with gr.Column(scale=2): chatbot = gr.Chatbot(height=400, label="对话历史") msg = gr.Textbox(label="输入您的问题", placeholder="例如：这张图里有什么？或者：请用英文写一封商务邮件...") with gr.Row(): submit_btn = gr.Button("发送", variant="primary") clear_btn = gr.Button("清空历史") with gr.Column(scale=1): image_input = gr.Image( type="pil", label="上传图片（可选）", height=300 ) gr.Markdown(" 提示：上传图片后，您可以询问关于图片内容的问题") # 事件处理 submit_btn.click( chat_with_image, inputs=[msg, image_input, chatbot], outputs=[chatbot, msg, image_input] ) clear_btn.click( clear_history, outputs=[chatbot, image_input] ) # 回车键支持 msg.submit( chat_with_image, inputs=[msg, image_input, chatbot], outputs=[chatbot, msg, image_input] ) # 启动应用 if __name__ == "__main__": demo.launch(server_name="0.0.0.0", server_port=7860, share=True)

这个Web界面具有以下特点：

双栏布局，左侧为对话区域，右侧为图像上传区域
支持纯文本对话和多模态对话无缝切换
响应式设计，适配不同屏幕尺寸
错误处理机制，当出现异常时向用户显示友好的错误信息
支持回车键快速提交，提升用户体验

4. 高级功能实现：多轮对话与上下文管理

4.1 多轮对话状态管理

真正的对话机器人需要维护对话状态，记住之前的交流内容。GLM-4v-9b支持长上下文，但我们需要合理管理以避免性能下降：

# conversation_manager.py from dataclasses import dataclass from typing import List, Optional, Dict, Any import json @dataclass class ConversationTurn: """单轮对话数据结构""" user_input: str assistant_response: str timestamp: float image_info: Optional[Dict[str, Any]] = None class ConversationManager: """对话状态管理器""" def __init__(self, max_history_length: int = 10): self.history: List[ConversationTurn] = [] self.max_history_length = max_history_length def add_turn(self, user_input: str, assistant_response: str, image_info: Optional[Dict[str, Any]] = None): """添加新的对话轮次""" import time turn = ConversationTurn( user_input=user_input, assistant_response=assistant_response, timestamp=time.time(), image_info=image_info ) self.history.append(turn) # 限制历史长度 if len(self.history) > self.max_history_length: self.history = self.history[-self.max_history_length:] def get_formatted_history(self) -> List[Dict[str, str]]: """获取格式化的对话历史，供模型使用""" formatted = [] for turn in self.history: formatted.append({"role": "user", "content": turn.user_input}) formatted.append({"role": "assistant", "content": turn.assistant_response}) return formatted def get_summary(self) -> str: """生成对话摘要""" if not self.history: return "暂无对话历史" # 简单摘要：提取最近3轮的关键信息 summary_parts = [] for turn in self.history[-3:]: summary_parts.append(f"用户: {turn.user_input[:50]}...") summary_parts.append(f"助手: {turn.assistant_response[:50]}...") return " | ".join(summary_parts) def export_history(self, filename: str): """导出对话历史到JSON文件""" history_data = [ { "user_input": turn.user_input, "assistant_response": turn.assistant_response, "timestamp": turn.timestamp, "image_info": turn.image_info } for turn in self.history ] with open(filename, 'w', encoding='utf-8') as f: json.dump(history_data, f, ensure_ascii=False, indent=2) print(f"对话历史已导出到 {filename}") def load_history(self, filename: str): """从JSON文件加载对话历史""" try: with open(filename, 'r', encoding='utf-8') as f: history_data = json.load(f) self.history = [] for item in history_data: turn = ConversationTurn( user_input=item["user_input"], assistant_response=item["assistant_response"], timestamp=item["timestamp"], image_info=item.get("image_info") ) self.history.append(turn) print(f"成功从 {filename} 加载 {len(self.history)} 轮对话") except FileNotFoundError: print(f"文件 {filename} 不存在") except Exception as e: print(f"加载对话历史时出错: {e}") # 使用示例 if __name__ == "__main__": manager = ConversationManager(max_history_length=5) # 模拟几轮对话 manager.add_turn("你好！", "你好！我是GLM-4v-9b多模态对话机器人。") manager.add_turn("你能做什么？", "我可以理解文本和图像，支持中英双语对话，还能分析图表和文档。") manager.add_turn("请用英文描述这张图", "当然可以，请上传图片。") print("对话摘要:", manager.get_summary()) print("格式化历史:", manager.get_formatted_history())

4.2 上下文感知的智能提示工程

为了提升GLM-4v-9b的对话质量，我们需要设计智能的提示词（prompt engineering）。以下是针对不同场景的优化策略：

# prompt_engineering.py class SmartPromptEngine: """智能提示词引擎""" @staticmethod def generate_system_prompt(language: str = "zh", context: str = "") -> str: """生成系统提示词""" if language == "zh": base_prompt = "你是一个专业的中英双语多模态AI助手，基于GLM-4v-9b模型。" if context: base_prompt += f"当前对话背景: {context}" base_prompt += "请用中文回答，除非用户明确要求使用英文。" else: base_prompt = "You are a professional bilingual multimodal AI assistant based on GLM-4v-9b model." if context: base_prompt += f" Current conversation context: {context}" base_prompt += " Please respond in English unless the user explicitly requests Chinese." return base_prompt @staticmethod def enhance_user_input(user_input: str, image_available: bool = False, language_preference: str = "auto") -> str: """增强用户输入，添加上下文信息""" enhanced = user_input # 添加图像上下文提示 if image_available: enhanced += " [图像已提供，请结合图像内容回答]" # 添加语言偏好提示 if language_preference != "auto": enhanced += f" [请用{language_preference}回答]" return enhanced @staticmethod def generate_multimodal_prompt(user_input: str, image_description: str = "") -> str: """为多模态场景生成优化提示""" prompt = f"用户问题: {user_input}\n" if image_description: prompt += f"图像描述: {image_description}\n" prompt += "请结合上述图像描述和用户问题给出详细回答。\n" else: prompt += "图像已提供，请直接分析图像内容。\n" prompt += "回答要求:\n" prompt += "- 如果问题涉及图像，请详细描述图像中的关键元素\n" prompt += "- 如果问题需要推理，请分步骤说明推理过程\n" prompt += "- 保持回答简洁明了，重点突出\n" return prompt @staticmethod def detect_language(text: str) -> str: """简单语言检测""" # 实际项目中应使用专业库如langdetect chinese_chars = sum(1 for c in text if '\u4e00' <= c <= '\u9fff') english_chars = sum(1 for c in text if 'a' <= c <= 'z' or 'A' <= c <= 'Z') if chinese_chars > english_chars * 2: return "zh" elif english_chars > chinese_chars * 2: return "en" else: return "auto" # 使用示例 if __name__ == "__main__": engine = SmartPromptEngine() # 系统提示词 system_prompt = engine.generate_system_prompt("zh", "电商客服场景") print("系统提示:", system_prompt) # 增强用户输入 enhanced_input = engine.enhance_user_input( "这个产品怎么样？", image_available=True, language_preference="zh" ) print("增强输入:", enhanced_input) # 多模态提示 multimodal_prompt = engine.generate_multimodal_prompt( "请分析这张销售数据图表的趋势", "图表显示2023年各季度销售额，柱状图形式" ) print("多模态提示:", multimodal_prompt)

4.3 实战案例：构建电商客服对话机器人

让我们将前面学到的知识整合，构建一个实际可用的电商客服对话机器人：

# ecommerce_bot.py from conversation_manager import ConversationManager from prompt_engineering import SmartPromptEngine from chat_bot_multimodal import GLM4vMultimodalBot class EcommerceChatBot: """电商客服专用对话机器人""" def __init__(self, model_path="./glm-4v-9b-model"): self.bot = GLM4vMultimodalBot(model_path) self.conversation_manager = ConversationManager(max_history_length=8) self.prompt_engine = SmartPromptEngine() self.product_knowledge = self._load_product_knowledge() def _load_product_knowledge(self) -> dict: """加载产品知识库（实际项目中应从数据库加载）""" return { "SKU-12345": { "name": "智能手表Pro", "price": 899, "features": ["心率监测", "GPS定位", "50米防水", "7天续航"], "stock": "有货" }, "SKU-67890": { "name": "无线降噪耳机", "price": 1299, "features": ["主动降噪", "30小时续航", "空间音频", "IPX4防水"], "stock": "缺货" } } def _get_product_info(self, sku: str) -> str: """根据SKU获取产品信息""" product = self.product_knowledge.get(sku) if product: return f"{product['name']}，价格{product['price']}元，特性：{', '.join(product['features'])}，库存：{product['stock']}" return "未找到该产品信息" def handle_ecommerce_query(self, user_input: str, image_source=None) -> str: """处理电商相关查询""" # 检测用户意图 intent = self._detect_intent(user_input) if intent == "product_inquiry": # 产品咨询：提取SKU或产品名称 sku = self._extract_sku(user_input) if sku: product_info = self._get_product_info(sku) return f"关于您咨询的产品：{product_info}。还有其他问题吗？" # 默认使用多模态对话 return self.bot.multimodal_chat( user_input, image_source=image_source, history=self.conversation_manager.get_formatted_history() ) def _detect_intent(self, text: str) -> str: """简单意图识别""" text_lower = text.lower() if any(word in text_lower for word in ["多少钱", "价格", "贵吗", "便宜"]): return "price_inquiry" elif any(word in text_lower for word in ["怎么用", "功能", "特性", "有什么"]): return "feature_inquiry" elif any(word in text_lower for word in ["有货", "缺货", "库存", "能买"]): return "stock_inquiry" else: return "general_conversation" def _extract_sku(self, text: str) -> str: """从文本中提取SKU（简化版）""" import re # 匹配类似 SKU-12345 或 12345 的模式 pattern = r'(SKU-)?(\d{5,})' match = re.search(pattern, text.upper()) if match: return match.group(2) return "" def chat(self, user_input: str, image_source=None) -> str: """主对话方法""" # 生成智能提示 language = self.prompt_engine.detect_language(user_input) system_prompt = self.prompt_engine.generate_system_prompt(language, "电商客服") # 处理查询 response = self.handle_ecommerce_query(user_input, image_source) # 更新对话历史 self.conversation_manager.add_turn(user_input, response) return response # 使用示例 if __name__ == "__main__": bot = EcommerceChatBot() # 测试不同类型的电商查询 print("=== 电商客服机器人测试 ===") # 产品咨询 response1 = bot.chat("SKU-12345的价格是多少？") print(f"用户: SKU-12345的价格是多少？\n助手: {response1}\n") # 功能咨询 response2 = bot.chat("这款手表有什么功能？") print(f"用户: 这款手表有什么功能？\n助手: {response2}\n") # 多模态咨询（假设有一张产品图片） # response3 = bot.chat("请分析这张产品图", image_source="./watch.jpg") # print(f"用户: 请分析这张产品图\n助手: {response3}\n")

这个电商客服机器人的特点是：

内置产品知识库，支持SKU查询
简单但有效的意图识别
结合GLM-4v-9b的多模态能力处理产品图片
对话状态管理，支持多轮交互
可扩展架构，便于添加更多电商功能

5. 性能优化与生产部署

5.1 显存优化技巧

GLM-4v-9b虽然参数量为90亿，但通过以下技巧可以显著降低显存占用：

# optimization_tips.py import torch from transformers import AutoTokenizer, AutoModel from transformers.utils import is_flash_attn_2_available def create_optimized_model(model_path: str, use_quantization: bool = True): """创建优化后的模型实例""" # 1. 使用Flash Attention 2（如果可用） attn_implementation = "flash_attention_2" if is_flash_attn_2_available() else "eager" # 2. 量化配置 quantization_config = None if use_quantization: from transformers import BitsAndBytesConfig quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4" ) # 3. 模型加载 model = AutoModel.from_pretrained( model_path, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16, quantization_config=quantization_config, attn_implementation=attn_implementation ).eval() # 4. 分词器加载 tokenizer = AutoTokenizer.from_pretrained( model_path, trust_remote_code=True, encode_special_tokens=True ) return model, tokenizer def optimize_inference_parameters(): """优化推理参数""" return { "max_new_tokens": 1024, # 适当限制输出长度 "do_sample": True, # 启用采样提高多样性 "temperature": 0.6, # 控制随机性 "top_p": 0.9, # 核采样阈值 "repetition_penalty": 1.15, # 防止重复 "eos_token_id": [151329, 151336, 151338] # GLM-4v-9b的结束标记 } # 使用示例 if __name__ == "__main__": # 创建优化模型 model, tokenizer = create_optimized_model("./glm-4v-9b-model") # 获取优化的推理参数 inference_params = optimize_inference_parameters() print("优化的推理参数:", inference_params) # 显存使用情况 print(f"模型显存占用: {torch.cuda.memory_allocated()/1024**3:.2f} GB")

5.2 vLLM高性能推理部署

对于生产环境，我们推荐使用vLLM进行高性能推理：

# vllm_deployment.sh # 安装vLLM（需要CUDA 11.8+） pip install vllm # 启动vLLM服务 python -m vllm.entrypoints.api_server \ --model THUDM/glm-4v-9b \ --tensor-parallel-size 1 \ --dtype bfloat16 \ --gpu-memory-utilization 0.9 \ --enforce-eager \ --port 8000 \ --host 0.0.0.0

然后创建一个客户端来调用vLLM服务：

# vllm_client.py import requests import json class VLLMClient: """vLLM服务客户端""" def __init__(self, base_url: str = "http://localhost:8000"): self.base_url = base_url.rstrip('/') def chat_completion(self, messages: list, max_tokens: int = 1024, temperature: float = 0.6, top_p: float = 0.9): """调用vLLM聊天补全API""" url = f"{self.base_url}/v1/chat/completions" payload = { "model": "glm-4v-9b", "messages": messages, "max_tokens": max_tokens, "temperature": temperature, "top_p": top_p, "stream": False } headers = { "Content-Type": "application/json" } try: response = requests.post(url, json=payload, headers=headers, timeout=30) response.raise_for_status() return response.json()["choices"][0]["message"]["content"] except requests.exceptions.RequestException as e: return f"API调用失败: {e}" def multimodal_chat(self, user_input: str, image_url: str = None): """多模态聊天（vLLM目前对多模态支持有限，此为概念演示）""" # 实际项目中需要vLLM的多模态扩展 messages = [ {"role": "system", "content": "你是一个多模态AI助手"}, {"role": "user", "content": user_input} ] if image_url: messages[1]["content"] += f" [图像URL: {image_url}]" return self.chat_completion(messages) # 使用示例 if __name__ == "__main__": client = VLLMClient() # 测试聊天 response = client.chat_completion([ {"role": "user", "content": "你好，今天过得怎么样？"} ]) print("vLLM响应:", response)

5.3 Docker容器化部署

为了确保环境一致性，我们提供Docker部署方案：

# Dockerfile FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04 # 设置环境变量 ENV DEBIAN_FRONTEND=noninteractive ENV PYTHONDONTWRITEBYTECODE=1 ENV PYTHONUNBUFFERED=1 # 安装系统依赖 RUN apt-get update && apt-get install -y \ python3.10 \ python3.10-venv \ python3-pip \ git \ && rm -rf /var/lib/apt/lists/* # 创建工作目录 WORKDIR /app # 复制并安装Python依赖 COPY requirements.txt . RUN pip3 install --no-cache-dir -r requirements.txt # 复制应用代码 COPY . . # 创建模型目录 RUN mkdir -p /app/models # 暴露端口 EXPOSE 7860 # 启动命令 CMD ["python3", "web_interface.py"]

# requirements.txt torch==2.1.0+cu118 transformers==4.35.0 accelerate==0.24.1 bitsandbytes==0.41.3 pillow==10.1.0 requests==2.31.0 gradio==4.20.0 numpy==1.24.3

# build_and_run.sh # 构建Docker镜像 docker build -t glm4v-bot . # 运

保姆级教程：用GLM-4v-9b搭建中英双语多轮对话机器人