VisDrone 数据集转 YOLO 格式详解：从无人机小目标检测数据到 YOLOv8 训练数据-开发者社区

1. 项目背景

在进行无人机场景小目标检测研究时，VisDrone 是一个非常常用的数据集。它包含大量航拍图像，目标类型包括行人、车辆、自行车、三轮车、公交车等，非常适合研究密集小目标检测、遮挡目标检测以及无人机边缘部署场景下的轻量化检测问题。

但是，VisDrone 原始标注格式并不能直接用于 YOLOv8 训练。YOLOv8 要求数据集满足固定的目录结构和标签格式，因此在正式训练模型之前，需要先将 VisDrone 原始标注转换为 YOLO 格式。

本文记录 VisDrone2019-DET 数据集转换为 YOLO 格式的完整思路与代码实现。

2. VisDrone 原始标注格式

VisDrone 的每个图片对应一个.txt标注文件，文件中每一行表示一个目标框，格式如下：

x, y, w, h, score, category, truncation, occlusion

各字段含义如下：

字段	含义
x	目标框左上角 x 坐标
y	目标框左上角 y 坐标
w	目标框宽度
h	目标框高度
score	标注置信度
category	类别编号
truncation	截断程度
occlusion	遮挡程度

其中，最关键的是x, y, w, h, category。
这些信息决定了目标的位置和类别。

3. YOLO 标签格式

YOLO 使用的标签格式为：

class_id x_center y_center width height

并且要求：

类别编号从0开始；
坐标全部归一化到0~1；
每张图片对应一个同名.txt标签文件；
标签文件和图片文件按照固定目录组织。

例如：

3 0.521233 0.618293 0.032111 0.041225 0 0.217833 0.348901 0.012655 0.028411

其中：

3 表示类别编号 0.521233 表示目标框中心点 x 坐标 0.618293 表示目标框中心点 y 坐标 0.032111 表示目标框宽度 0.041225 表示目标框高度

4. 类别编号转换

VisDrone 原始类别如下：

0: ignored regions 1: pedestrian 2: people 3: bicycle 4: car 5: van 6: truck 7: tricycle 8: awning-tricycle 9: bus 10: motor 11: others

在目标检测任务中，我们通常只保留1~10这 10 个有效类别，忽略0 ignored regions和11 others。

由于 YOLO 类别编号必须从0开始，因此需要进行如下映射：

VisDrone 类别	YOLO 类别	类别名称
1	0	pedestrian
2	1	people
3	2	bicycle
4	3	car
5	4	van
6	5	truck
7	6	tricycle
8	7	awning-tricycle
9	8	bus
10	9	motor

代码中通过下面语句完成类别过滤和编号转换：

VALID_CLASSES = set(range(1, 11)) if category not in VALID_CLASSES: continue cls = category - 1

5. 坐标格式转换

VisDrone 使用的是左上角坐标格式：

x, y, w, h

其中：

x, y 表示目标框左上角坐标 w, h 表示目标框宽度和高度

YOLO 使用的是中心点坐标格式：

x_center, y_center, width, height

并且需要归一化。

转换公式如下：

x_center = (x1 + new_w / 2) / img_w y_center = (y1 + new_h / 2) / img_h box_w = new_w / img_w box_h = new_h / img_h

其中：

img_w 表示图片宽度 img_h 表示图片高度 new_w 表示修正后的目标框宽度 new_h 表示修正后的目标框高度

6. 为什么要进行边界裁剪？

在真实数据集中，部分目标框可能存在越界情况。例如目标框超出图片左边界、上边界，或者右下角超过图片尺寸。

如果不处理这些异常框，可能会导致 YOLO 训练时报错，或者影响模型训练稳定性。

因此代码中使用如下方式对目标框进行裁剪：

x1 = max(0, x) y1 = max(0, y) x2 = min(img_w, x + w) y2 = min(img_h, y + h)

这样可以保证目标框坐标始终位于图片范围内。

裁剪后重新计算宽高：

new_w = x2 - x1 new_h = y2 - y1

如果目标框过小，则直接跳过：

if new_w <= 1 or new_h <= 1: continue

7. 项目目录结构

转换前的原始数据目录如下：

datasets/VisDrone/raw/ ├── VisDrone2019-DET-train/ │ ├── images/ │ └── annotations/ ├── VisDrone2019-DET-val/ │ ├── images/ │ └── annotations/ └── VisDrone2019-DET-test-dev/ ├── images/ └── annotations/

转换后生成 YOLO 数据目录：

datasets/VisDrone/ ├── images/ │ ├── train/ │ ├── val/ │ └── test/ └── labels/ ├── train/ ├── val/ └── test/

YOLO 训练时会自动根据图片路径寻找对应标签文件。
例如：

datasets/VisDrone/images/train/000001.jpg datasets/VisDrone/labels/train/000001.txt

8. 完整转换代码

from pathlib import Path import shutil from PIL import Image from tqdm import tqdm # VisDrone 原始类别： # 0: ignored regions # 1: pedestrian # 2: people # 3: bicycle # 4: car # 5: van # 6: truck # 7: tricycle # 8: awning-tricycle # 9: bus # 10: motor # 11: others # # YOLO 类别编号需要从 0 开始： # VisDrone 1-10 -> YOLO 0-9 VALID_CLASSES = set(range(1, 11)) def convert_annotation(txt_path: Path, img_path: Path, save_path: Path): img = Image.open(img_path) img_w, img_h = img.size yolo_lines = [] with open(txt_path, "r", encoding="utf-8") as f: lines = f.readlines() for line in lines: parts = line.strip().split(",") if len(parts) < 8: continue x, y, w, h = map(float, parts[:4]) score = int(parts[4]) category = int(parts[5]) truncation = int(parts[6]) occlusion = int(parts[7]) # 忽略 ignored regions 和 others，只保留 1-10 类 if category not in VALID_CLASSES: continue # VisDrone 类别 1-10 转 YOLO 类别 0-9 cls = category - 1 # 过滤无效框 if w <= 0 or h <= 0: continue # 防止框越界 x1 = max(0, x) y1 = max(0, y) x2 = min(img_w, x + w) y2 = min(img_h, y + h) new_w = x2 - x1 new_h = y2 - y1 if new_w <= 1 or new_h <= 1: continue # 转 YOLO 格式：class x_center y_center width height x_center = (x1 + new_w / 2) / img_w y_center = (y1 + new_h / 2) / img_h box_w = new_w / img_w box_h = new_h / img_h yolo_lines.append( f"{cls} {x_center:.6f} {y_center:.6f} {box_w:.6f} {box_h:.6f}\n" ) with open(save_path, "w", encoding="utf-8") as f: f.writelines(yolo_lines) def process_split(root: Path, split_name: str, raw_folder: str): raw_dir = root / "raw" / raw_folder raw_img_dir = raw_dir / "images" raw_ann_dir = raw_dir / "annotations" out_img_dir = root / "images" / split_name out_label_dir = root / "labels" / split_name out_img_dir.mkdir(parents=True, exist_ok=True) out_label_dir.mkdir(parents=True, exist_ok=True) img_paths = sorted(raw_img_dir.glob("*.jpg")) print(f"\nProcessing {split_name}: {len(img_paths)} images") print(f"Image dir: {raw_img_dir}") print(f"Annotation dir: {raw_ann_dir}") for img_path in tqdm(img_paths): ann_path = raw_ann_dir / f"{img_path.stem}.txt" out_img_path = out_img_dir / img_path.name out_label_path = out_label_dir / f"{img_path.stem}.txt" shutil.copy2(img_path, out_img_path) if ann_path.exists(): convert_annotation(ann_path, img_path, out_label_path) else: out_label_path.write_text("", encoding="utf-8") def main(): root = Path("datasets/VisDrone") process_split(root, "train", "VisDrone2019-DET-train") process_split(root, "val", "VisDrone2019-DET-val") process_split(root, "test", "VisDrone2019-DET-test-dev") print("\nConversion finished.") print("YOLO images saved to: datasets/VisDrone/images") print("YOLO labels saved to: datasets/VisDrone/labels") if __name__ == "__main__": main()

9. 运行转换脚本

在项目根目录执行：

python scripts/convert_visdrone_to_yolo.py

正常情况下会看到类似输出：

Processing train: 6471 images Processing val: 548 images Processing test: 1610 images Conversion finished.

转换完成后，可以检查图片和标签数量：

(Get-ChildItem datasets\VisDrone\images\train -Filter *.jpg).Count (Get-ChildItem datasets\VisDrone\labels\train -Filter *.txt).Count (Get-ChildItem datasets\VisDrone\images\val -Filter *.jpg).Count (Get-ChildItem datasets\VisDrone\labels\val -Filter *.txt).Count

如果图片数量和标签数量一致，说明转换基本成功。

10. YOLO 数据配置文件

转换完成后，还需要准备 YOLO 数据配置文件：

path: ./datasets/VisDrone train: images/train val: images/val test: images/test names: 0: pedestrian 1: people 2: bicycle 3: car 4: van 5: truck 6: tricycle 7: awning-tricycle 8: bus 9: motor

保存为：

configs/visdrone.yaml

后续训练 YOLOv8 时，就可以使用：

yolo detect train model=yolov8n.pt data=configs/visdrone.yaml imgsz=640 epochs=50 batch=8 device=0

11. 总结

本文完成了 VisDrone2019-DET 数据集到 YOLO 格式的转换。核心步骤包括：

读取 VisDrone 原始标注文件；
过滤无效类别；
将类别编号从1~10转换为0~9；
将左上角坐标格式转换为 YOLO 中心点坐标格式；
对坐标进行归一化；
裁剪越界目标框；
按照 YOLO 标准目录保存图片和标签。

完成该步骤后，VisDrone 数据集就可以被 YOLOv8 正常读取和训练。
这也是后续开展无人机场景密集小目标检测、轻量化检测网络设计和边缘部署实验的基础。

VisDrone 数据集转 YOLO 格式详解：从无人机小目标检测数据到 YOLOv8 训练数据

1. 项目背景

2. VisDrone 原始标注格式

3. YOLO 标签格式

4. 类别编号转换

5. 坐标格式转换

6. 为什么要进行边界裁剪？

7. 项目目录结构

8. 完整转换代码

9. 运行转换脚本

10. YOLO 数据配置文件

11. 总结

私有化视频会议系统/视频高清直播点播EasyDSS构筑智慧校园安全可控全场景音视频中枢

从ZZULIOJ这道题出发，聊聊面试常客：有序数组合并的三种写法与性能对比

英雄联盟终极游戏助手：LeagueAkari完全指南

二维CAD设计的极简革命：零门槛掌握LitCAD的颠覆性绘图体验

如何在3分钟内掌握Blender超级复制粘贴：让3D资产导入导出效率提升500%

RPS（Requests Per Second）介绍（RPS只代表系统吞吐量Throughput，不代表延迟）并发数Concurrency并发量、QPS查询操作、CPU瓶颈、IO瓶颈、数据库瓶颈