news 2026/5/13 8:36:26

语义分割:从 FCN 到 Segment Anything

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
语义分割:从 FCN 到 Segment Anything

语义分割:从 FCN 到 Segment Anything

1. 技术分析

1.1 语义分割技术演进

语义分割经历了从全卷积网络到大模型的演进:

语义分割技术路线 FCN (2015) → U-Net (2015) → DeepLab (2016) → Mask R-CNN (2017) → SAM (2023)

1.2 分割方法对比

方法类型mIoU特点适用场景
FCN全卷积62%端到端训练基础分割
U-Net编码器-解码器85%医学影像医疗领域
DeepLabASPP89%空洞卷积通用分割
Mask R-CNN实例分割90%实例级别实例分割
SAM大模型95%提示驱动通用分割

1.3 分割评估指标

语义分割评估指标 mIoU: 平均交并比 Pixel Accuracy: 像素准确率 F1-score: 平衡指标

2. 核心功能实现

2.1 FCN 实现

import torch import torch.nn as nn import torch.nn.functional as F class FCN(nn.Module): def __init__(self, num_classes=21): super().__init__() self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1) self.conv2 = nn.Conv2d(64, 64, kernel_size=3, padding=1) self.pool1 = nn.MaxPool2d(2, stride=2) self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1) self.conv4 = nn.Conv2d(128, 128, kernel_size=3, padding=1) self.pool2 = nn.MaxPool2d(2, stride=2) self.conv5 = nn.Conv2d(128, 256, kernel_size=3, padding=1) self.conv6 = nn.Conv2d(256, 256, kernel_size=3, padding=1) self.conv7 = nn.Conv2d(256, 256, kernel_size=3, padding=1) self.pool3 = nn.MaxPool2d(2, stride=2) self.conv8 = nn.Conv2d(256, 512, kernel_size=3, padding=1) self.conv9 = nn.Conv2d(512, 512, kernel_size=3, padding=1) self.conv10 = nn.Conv2d(512, 512, kernel_size=3, padding=1) self.pool4 = nn.MaxPool2d(2, stride=2) self.conv11 = nn.Conv2d(512, 512, kernel_size=3, padding=1) self.conv12 = nn.Conv2d(512, 512, kernel_size=3, padding=1) self.conv13 = nn.Conv2d(512, 512, kernel_size=3, padding=1) self.pool5 = nn.MaxPool2d(2, stride=2) self.fc6 = nn.Conv2d(512, 4096, kernel_size=7) self.fc7 = nn.Conv2d(4096, 4096, kernel_size=1) self.score_fr = nn.Conv2d(4096, num_classes, kernel_size=1) self.upscore2 = nn.ConvTranspose2d(num_classes, num_classes, kernel_size=4, stride=2, bias=False) self.upscore_pool4 = nn.ConvTranspose2d(num_classes, num_classes, kernel_size=4, stride=2, bias=False) self.upscore_pool3 = nn.ConvTranspose2d(num_classes, num_classes, kernel_size=16, stride=8, bias=False) self.score_pool4 = nn.Conv2d(512, num_classes, kernel_size=1) self.score_pool3 = nn.Conv2d(256, num_classes, kernel_size=1) def forward(self, x): h = F.relu(self.conv1(x)) h = F.relu(self.conv2(h)) h = self.pool1(h) h = F.relu(self.conv3(h)) h = F.relu(self.conv4(h)) h = self.pool2(h) h = F.relu(self.conv5(h)) h = F.relu(self.conv6(h)) h = F.relu(self.conv7(h)) pool3 = h = self.pool3(h) h = F.relu(self.conv8(h)) h = F.relu(self.conv9(h)) h = F.relu(self.conv10(h)) pool4 = h = self.pool4(h) h = F.relu(self.conv11(h)) h = F.relu(self.conv12(h)) h = F.relu(self.conv13(h)) h = self.pool5(h) h = F.relu(self.fc6(h)) h = F.dropout(h) h = F.relu(self.fc7(h)) h = F.dropout(h) h = self.score_fr(h) h = self.upscore2(h) score_pool4 = self.score_pool4(pool4) h = h[:, :, 1:1+score_pool4.size(2), 1:1+score_pool4.size(3)] + score_pool4 h = self.upscore_pool4(h) score_pool3 = self.score_pool3(pool3) h = h[:, :, 1:1+score_pool3.size(2), 1:1+score_pool3.size(3)] + score_pool3 h = self.upscore_pool3(h) return h[:, :, 31:31+x.size(2), 31:31+x.size(3)]

2.2 U-Net 实现

class DoubleConv(nn.Module): def __init__(self, in_channels, out_channels): super().__init__() self.conv = nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True), nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) def forward(self, x): return self.conv(x) class Down(nn.Module): def __init__(self, in_channels, out_channels): super().__init__() self.maxpool_conv = nn.Sequential( nn.MaxPool2d(2), DoubleConv(in_channels, out_channels) ) def forward(self, x): return self.maxpool_conv(x) class Up(nn.Module): def __init__(self, in_channels, out_channels): super().__init__() self.up = nn.ConvTranspose2d(in_channels, in_channels // 2, kernel_size=2, stride=2) self.conv = DoubleConv(in_channels, out_channels) def forward(self, x1, x2): x1 = self.up(x1) diffY = x2.size()[2] - x1.size()[2] diffX = x2.size()[3] - x1.size()[3] x1 = F.pad(x1, [diffX // 2, diffX - diffX // 2, diffY // 2, diffY - diffY // 2]) x = torch.cat([x2, x1], dim=1) return self.conv(x) class OutConv(nn.Module): def __init__(self, in_channels, out_channels): super().__init__() self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=1) def forward(self, x): return self.conv(x) class UNet(nn.Module): def __init__(self, n_channels=3, n_classes=1): super().__init__() self.n_channels = n_channels self.n_classes = n_classes self.inc = DoubleConv(n_channels, 64) self.down1 = Down(64, 128) self.down2 = Down(128, 256) self.down3 = Down(256, 512) self.down4 = Down(512, 1024) self.up1 = Up(1024, 512) self.up2 = Up(512, 256) self.up3 = Up(256, 128) self.up4 = Up(128, 64) self.outc = OutConv(64, n_classes) def forward(self, x): x1 = self.inc(x) x2 = self.down1(x1) x3 = self.down2(x2) x4 = self.down3(x3) x5 = self.down4(x4) x = self.up1(x5, x4) x = self.up2(x, x3) x = self.up3(x, x2) x = self.up4(x, x1) logits = self.outc(x) return logits

2.3 DeepLab 实现

class ASPP(nn.Module): def __init__(self, in_channels, out_channels): super().__init__() self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1) self.conv2 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=6, dilation=6) self.conv3 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=12, dilation=12) self.conv4 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=18, dilation=18) self.conv5 = nn.Conv2d(in_channels, out_channels, kernel_size=1) self.bn1 = nn.BatchNorm2d(out_channels) self.bn2 = nn.BatchNorm2d(out_channels) self.bn3 = nn.BatchNorm2d(out_channels) self.bn4 = nn.BatchNorm2d(out_channels) self.bn5 = nn.BatchNorm2d(out_channels) self.final_conv = nn.Conv2d(out_channels * 5, out_channels, kernel_size=1) def forward(self, x): x1 = F.relu(self.bn1(self.conv1(x))) x2 = F.relu(self.bn2(self.conv2(x))) x3 = F.relu(self.bn3(self.conv3(x))) x4 = F.relu(self.bn4(self.conv4(x))) x5 = F.adaptive_avg_pool2d(x, (1, 1)) x5 = F.relu(self.bn5(self.conv5(x5))) x5 = F.interpolate(x5, size=x.size()[2:], mode='bilinear', align_corners=True) x = torch.cat([x1, x2, x3, x4, x5], dim=1) x = self.final_conv(x) return x class DeepLab(nn.Module): def __init__(self, num_classes=21): super().__init__() self.backbone = self._build_backbone() self.aspp = ASPP(2048, 256) self.decoder = self._build_decoder(num_classes) def _build_backbone(self): return nn.Sequential( nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3), nn.BatchNorm2d(64), nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2, padding=1) ) def _build_decoder(self, num_classes): return nn.Sequential( nn.Conv2d(256, 256, kernel_size=3, padding=1), nn.BatchNorm2d(256), nn.ReLU(), nn.Conv2d(256, num_classes, kernel_size=1) ) def forward(self, x): x = self.backbone(x) x = self.aspp(x) x = self.decoder(x) x = F.interpolate(x, scale_factor=8, mode='bilinear', align_corners=True) return x

3. 性能对比

3.1 语义分割模型对比

模型mIoUSpeed(fps)Parameters(M)
FCN-8s62%20134
U-Net85%3031
DeepLabv3+89%2560
Mask R-CNN90%15140
SAM95%10600

3.2 不同数据集表现

数据集FCNU-NetDeepLabSAM
Pascal VOC62%78%89%92%
Cityscapes70%75%83%88%
COCO65%72%80%90%

3.3 模型大小影响

模型参数(M)mIoU内存(GB)
U-Net Small1078%0.5
U-Net3185%1.0
DeepLab-Lite2082%0.8
DeepLabv3+6089%2.0

4. 最佳实践

4.1 语义分割模型选择

def select_segmentation_model(task_type, constraints): if task_type == 'medical': return UNet(n_classes=1) elif constraints.get('speed', False): return DeepLab(model_size='small') else: return SAM() class SegmentationFactory: @staticmethod def create(config): if config['type'] == 'fcn': return FCN(num_classes=config['num_classes']) elif config['type'] == 'unet': return UNet(n_channels=config['n_channels'], n_classes=config['n_classes']) elif config['type'] == 'deeplab': return DeepLab(num_classes=config['num_classes'])

4.2 语义分割训练流程

class SegmentationTrainer: def __init__(self, model, optimizer, scheduler, loss_fn): self.model = model self.optimizer = optimizer self.scheduler = scheduler self.loss_fn = loss_fn def train_step(self, images, masks): self.optimizer.zero_grad() outputs = self.model(images) loss = self.loss_fn(outputs, masks) loss.backward() self.optimizer.step() self.scheduler.step() return loss.item() def evaluate(self, dataloader): self.model.eval() total_miou = 0 with torch.no_grad(): for images, masks in dataloader: outputs = self.model(images) preds = torch.argmax(outputs, dim=1) miou = self._compute_miou(preds, masks) total_miou += miou return total_miou / len(dataloader) def _compute_miou(self, preds, masks): intersection = (preds & masks).sum() union = (preds | masks).sum() return intersection.item() / union.item() if union > 0 else 0

5. 总结

语义分割技术快速发展:

  1. FCN:开创端到端语义分割
  2. U-Net:医学影像分割首选
  3. DeepLab:使用空洞卷积提升感受野
  4. SAM:大模型时代的通用分割器

对比数据如下:

  • SAM 在多个数据集上达到最高 mIoU
  • U-Net 在医学影像领域表现出色
  • DeepLab 是通用场景的良好选择
  • 推荐根据具体场景选择合适模型
版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/5/13 8:35:26

Cursor免费版高效使用指南:配置优化与本地工具链整合

1. 项目概述与核心价值最近在开发者圈子里,关于AI编程工具的讨论热度一直居高不下。Cursor作为一款深度集成AI能力的代码编辑器,凭借其强大的代码生成、理解和重构功能,迅速成为了许多程序员提升效率的“新宠”。然而,其Pro版本需…

作者头像 李华
网站建设 2026/5/13 8:33:31

如何快速解决Switch文件传输难题:NS-USBLoader终极指南

如何快速解决Switch文件传输难题:NS-USBLoader终极指南 【免费下载链接】ns-usbloader Awoo Installer and GoldLeaf uploader of the NSPs (and other files), RCM payload injector, application for split/merge files. 项目地址: https://gitcode.com/gh_mirr…

作者头像 李华
网站建设 2026/5/13 8:27:05

算法社会与数字鸿沟:《Uplandia》中的技术统治与人性反思

1. 故事背景与核心设定解析 在科技与人文的交叉点上,我们常常会构想未来社会的形态。今天我想和大家深入探讨一个我反复琢磨的科幻短篇概念,它源自一篇名为《Uplandia》的虚构故事。这个故事构建了一个由全能算法“Skore”统治的未来社会,其核…

作者头像 李华
网站建设 2026/5/13 8:25:22

Linux系统级音频处理:JDSP4Linux架构、DSP效果器与实战调音指南

1. 项目概述:从“听个响”到“听个准”的桌面音频革命如果你是一个对电脑音质有追求的Linux用户,或者是一个音频领域的开发者,那么你很可能经历过这样的困扰:系统自带的音频管理就像个“大锅饭”,所有声音都混在一起&a…

作者头像 李华
网站建设 2026/5/13 8:23:09

基于硬件虚拟化的AI智能体安全隔离方案Clawcage设计与实现

1. 项目概述:为AI智能体打造一个坚不可摧的“笼子”如果你最近在尝试运行一些本地的AI智能体,比如Claude Desktop、Cursor的Agent模式,或者各种开源的AI助手工具,心里可能总会有点打鼓。这些工具功能强大,但它们背后运…

作者头像 李华
网站建设 2026/5/13 8:22:06

VMware Unlocker 3.0:如何在普通PC上免费运行macOS虚拟机?

VMware Unlocker 3.0:如何在普通PC上免费运行macOS虚拟机? 【免费下载链接】unlocker VMware Workstation macOS 项目地址: https://gitcode.com/gh_mirrors/unloc/unlocker 你是否梦想在普通的Windows或Linux电脑上体验macOS系统,却…

作者头像 李华