news 2026/3/31 9:16:34

【Qwen】train()函数说明

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
【Qwen】train()函数说明

train()函数文档

train(attn_implementation='flash_attention_2')

Runs the main training loop for Qwen VL (Qwen2-VL, Qwen2.5-VL, Qwen3-VL, or Qwen3-VL-MoE) instruction tuning.
Parses command-line arguments for model, data, and training config; loads the appropriate model class and processor; optionally applies LoRA or configures which modules to tune (vision encoder, MLP merger, LLM); builds the supervised data module and Hugging FaceTrainer, runs training (with optional resume), then saves the final model and processor tooutput_dir.

Parameters

NameTypeDefaultDescription
attn_implementationstr"flash_attention_2"Attention implementation passed to the model (e.g."flash_attention_2"for Flash Attention 2).

Command-line arguments (parsed viaHfArgumentParser)

  • ModelArguments

    • model_name_or_path(str) – HuggingFace model id or path (e.g.Qwen/Qwen2.5-VL-3B-Instruct,Qwen/Qwen3-VL-8B-Instruct). Used to select model class (Qwen2-VL, Qwen2.5-VL, Qwen3-VL, or Qwen3-VL-MoE).
    • tune_mm_llm(bool) – Whether to train the language model (andlm_head).
    • tune_mm_mlp(bool) – Whether to train the vision merger (MLP).
    • tune_mm_vision(bool) – Whether to train the vision encoder.

  • DataArguments

    • dataset_use(str) – Comma-separated dataset names (with optional%Nsampling, e.g.dataset1%50).
    • data_flatten(bool) – Whether to flatten/concat batch sequences.
    • data_packing(bool) – Whether to use packed data (requires preprocessing withpack_data.py).
    • max_pixels(int) – Max image pixels (default28*28*576).
    • min_pixels(int) – Min image pixels (default28*28*16).
    • video_max_frames,video_min_frames,video_max_pixels,video_min_pixels,video_fps– Video sampling and resolution settings.
  • TrainingArguments(extendstransformers.TrainingArguments)

    • cache_dir(str, optional) – Cache directory for model/processor.
    • model_max_length(int) – Maximum sequence length for tokenizer.
    • lora_enable(bool) – IfTrue, apply LoRA and ignoretune_mm_*for the base model.
    • lora_r,lora_alpha,lora_dropout– LoRA rank, alpha, and dropout.
    • mm_projector_lr,vision_tower_lr– Optional learning rates for projector and vision tower.
    • Plus standard Trainer args:output_dir,bf16,per_device_train_batch_size,gradient_accumulation_steps,learning_rate,num_train_epochs,save_steps,gradient_checkpointing,deepspeed, etc.

Returns

None. Model and processor are saved undertraining_args.output_dir.

Notes

  • Ifoutput_diralready containscheckpoint-*directories, training is resumed withresume_from_checkpoint=True.
  • Whendata_flattenordata_packingis enabled, the Qwen2 VL attention class is replaced for compatibility.
  • Qwen3-VL MoE models useQwen3VLMoeForConditionalGeneration; other Qwen3-VL models useQwen3VLForConditionalGeneration; Qwen2.5-VL and Qwen2-VL use the corresponding classes inferred frommodel_name_or_path.

Example

# Typical usage: arguments are passed via command line (e.g. from scripts/sft_qwen3_4b.sh)torchrun --nproc_per_node=4qwenvl/train/train_qwen.py\--model_name_or_path Qwen/Qwen3-VL-8B-Instruct\--dataset_use my_dataset\--data_flatten True\--tune_mm_vision False --tune_mm_mlp True --tune_mm_llm True\--output_dir ./output\--bf16 --per_device_train_batch_size4--gradient_accumulation_steps4\--learning_rate 1e-5 --num_train_epochs0.5
# Programmatic call (still requires sys.argv or explicit parse for HfArgumentParser)fromqwenvl.train.train_qwenimporttrain train(attn_implementation="flash_attention_2")
版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/3/27 3:38:49

《道德经》 德经第一章

《道德经》分为道经(前37章)和德经(后44章)两部分。 你提到的“德经第一章”,在传统通行本(王弼本等)中对应的是全书第38章,通常被视为德经的开篇,也是全书非常核心的一…

作者头像 李华
网站建设 2026/3/24 3:10:03

科普|宏智树AI开题报告神器:小白避坑指南,告别导师反复打回

作为深耕论文写作科普的博主,后台每天都能收到大量粉丝的开题求助:“开题报告改了4版,导师还是说逻辑乱”“选题要么太大要么太小,始终踩不到审核要点”“文献综述只会堆砌摘要,被批没有研究脉络”“技术路线画得乱七八…

作者头像 李华
网站建设 2026/3/20 3:10:36

从S锁/X锁到Next-Key Lock:MySQL锁机制硬核拆解

从 S 锁 / X 锁 到 Next-Key Lock:MySQL InnoDB 锁机制硬核拆解 MySQL 的 InnoDB 引擎锁机制是面试和生产中高频考点,尤其是幻读如何被解决、Next-Key Lock 到底锁了什么、加锁规则如何判断等。下面从基础到进阶,一层层拆解。 1. 锁的分类总…

作者头像 李华
网站建设 2026/3/28 6:00:02

PPML 估计 + 一般均衡求解?ge_gravity2 一套 Stata 命令全搞定

温馨提示:若页面不能正常显示数学公式和代码,请阅读原文获得更好的阅读体验。 丁闪闪 (lianxhcn163.com) 曾咏新 厦门大学 (zengyongxinhpe163.com) 提要:本文系统整理了金融大语言模型 (LLM) 研究的核心资源,包括 12 个主流金融数…

作者头像 李华
网站建设 2026/3/21 3:15:22

leetcode 930. Binary Subarrays With Sum 和相同的二元子数组

Problem: 930. Binary Subarrays With Sum 和相同的二元子数组 前缀和,哈希表记录每个和所在的索引i,对goal0分开讨论的,使用前缀和- goal,拿到s prefixSum[i1] - goal;,数可能的子数组个数,并累加 Code …

作者头像 李华