别再只下载COCO了！手把手教你用Labelme标注并转成COCO实例分割数据集（附完整代码）-开发者社区

从零构建自定义实例分割数据集：Labelme标注与COCO格式转换实战指南

在计算机视觉领域，COCO数据集因其丰富的标注和标准化格式已成为行业基准。但现实项目中，我们往往需要针对特定场景（如工业质检、医疗影像或零售商品识别）构建专属数据集。本文将完整演示如何通过开源工具Labelme完成图像标注，并编写Python脚本将标注结果转换为COCO标准格式，最终生成可直接用于模型训练的实例分割数据集。

1. 工具选型与标注环境搭建

1.1 标注工具对比分析

不同于直接下载现成数据集，自定义数据集制作需要选择合适的标注工具。以下是主流开源工具的对比：

工具名称	标注类型支持	安装复杂度	导出格式	交互体验
Labelme	多边形/矩形/关键点	★★☆☆☆	JSON（自定义）	优秀
CVAT	全类型标注	★★★★☆	COCO/XML等	专业
LabelImg	矩形框	★★☆☆☆	VOC/YOLO	一般
VGG Image Annotator	多边形/点	★★☆☆☆	JSON	良好

对于实例分割任务，Labelme因其轻量级和灵活的多边形标注能力成为首选。安装仅需一行命令：

pip install labelme

1.2 标注规范制定

开始标注前需明确规范：

类别体系：预先定义所有待标注类别（如工业场景的"划痕"、"凹陷"等）
标注精度：确定多边形顶点密度（一般建议物体边缘每3-5像素一个点）
遮挡处理：约定被遮挡物体的标注方式（完整轮廓或可见部分）
标注验证：设置多人交叉验证机制

提示：建议创建标注手册并保存为PDF，供整个团队参考执行

2. Labelme标注全流程解析

2.1 标注操作实战演示

启动Labelme后，典型工作流如下：

创建标注项目目录结构：

/dataset /raw_images # 存放原始图像 /annotations # 保存JSON标注文件

执行标注命令：
```
labelme --output annotations --nodata
```
标注界面操作要点：
- 使用"Create Polygon"工具沿物体边缘描点
- 按ESC键完成当前多边形绘制
- 右键点击已标注对象可编辑形状

标注结果示例（JSON片段）：

{ "version": "5.1.1", "flags": {}, "shapes": [ { "label": "defect", "points": [[102,54],[115,49],[126,61],...], "shape_type": "polygon" } ], "imagePath": "IMG_001.jpg" }

2.2 标注质量控制技巧

放大标注：使用滚轮放大图像至像素级进行精细标注
快捷键：
- Ctrl+Z：撤销操作
- Del：删除选中标注
- Ctrl+S：快速保存
批量检查：使用预览模式快速浏览所有标注结果

常见问题解决方案：

边缘模糊：参考物体纹理走向确定边界
镜面反射：标注实际物体而非反射影像
部分遮挡：按可见部分完整轮廓标注

3. COCO数据结构深度解析

3.1 核心字段详解

COCO格式的JSON文件包含五个关键部分：

info：数据集元信息

"info": { "description": "Custom Industrial Defect Dataset", "year": 2023, "contributor": "Your Team" }

licenses：版权信息（可忽略）

images：图像基础信息

"images": [{ "id": 1, # 唯一标识 "width": 640, "height": 480, "file_name": "IMG_001.jpg" }]

categories：类别定义

"categories": [{ "id": 1, "name": "scratch", "supercategory": "defect" }]

annotations：实例标注核心

"annotations": [{ "id": 1, # 标注ID "image_id": 1, # 对应图像ID "category_id": 1, # 类别ID "segmentation": [[x1,y1,x2,y2,...]], # 多边形坐标 "area": 245.76, # 像素面积 "bbox": [x,y,w,h],# 外接矩形 "iscrowd": 0 # 是否群体标注 }]

3.2 关键数据结构转换

Labelme到COCO的核心转换逻辑：

坐标系统转换：
- Labelme使用绝对坐标
- COCO推荐使用相对坐标（RLE编码）
ID映射体系：
- 建立image_name到image_id的映射字典
- 构建category_name到category_id的对应关系

面积计算：

import cv2 def calculate_area(points): contour = np.array(points).reshape((-1,1,2)) return cv2.contourArea(contour)

4. 完整转换代码实现

4.1 转换脚本核心逻辑

import json import os import numpy as np from tqdm import tqdm def labelme2coco(labelme_dir, output_file): # 初始化COCO结构 coco = { "info": {}, "licenses": [], "images": [], "annotations": [], "categories": [] } # 构建类别字典 categories = {} for i, cls in enumerate(CLASSES): categories[cls] = i+1 coco["categories"].append({ "id": i+1, "name": cls, "supercategory": "object" }) # 遍历标注文件 ann_id = 1 for img_id, filename in enumerate(tqdm(os.listdir(labelme_dir))): if not filename.endswith('.json'): continue # 解析Labelme文件 with open(os.path.join(labelme_dir, filename)) as f: labelme = json.load(f) # 添加图像信息 img_info = { "id": img_id+1, "file_name": labelme["imagePath"], "width": labelme["imageWidth"], "height": labelme["imageHeight"] } coco["images"].append(img_info) # 转换标注 for shape in labelme["shapes"]: points = np.array(shape["points"]).flatten().tolist() # 计算外接矩形 x_coords = points[::2] y_coords = points[1::2] x_min, y_min = min(x_coords), min(y_coords) x_max, y_max = max(x_coords), max(y_coords) width, height = x_max - x_min, y_max - y_min coco["annotations"].append({ "id": ann_id, "image_id": img_id+1, "category_id": categories[shape["label"]], "segmentation": [points], "area": width * height, "bbox": [x_min, y_min, width, height], "iscrowd": 0 }) ann_id += 1 # 保存结果 with open(output_file, 'w') as f: json.dump(coco, f, indent=2)

4.2 常见问题调试

坐标越界：

# 坐标裁剪到图像范围内 points = np.clip(points, 0, [width-1, height-1])

无效多边形：

# 检查多边形是否闭合 if points[0] != points[-2] or points[1] != points[-1]: points.extend(points[:2])

类别映射缺失：

# 动态添加新类别 if shape["label"] not in categories: new_id = len(categories)+1 categories[shape["label"]] = new_id coco["categories"].append(...)

5. 数据集验证与优化

5.1 可视化验证工具

使用pycocotools检查数据集完整性：

from pycocotools.coco import COCO import matplotlib.pyplot as plt coco = COCO("annotations.json") img_ids = coco.getImgIds() for img_id in img_ids[:3]: img = coco.loadImgs(img_id)[0] ann_ids = coco.getAnnIds(imgIds=img_id) anns = coco.loadAnns(ann_ids) # 显示标注 plt.imshow(plt.imread(img["file_name"])) coco.showAnns(anns) plt.show()