一、预训练的概念
预训练(Pre-training)是深度学习中一种迁移学习的核心技术,指先在一个大规模通用数据集上训练好模型的权重参数,再将这些参数迁移到目标任务中使用,而非从随机初始化参数开始训练。
核心原理
通用特征提取预训练过程会让模型学习到通用的底层视觉 / 语言特征,比如在图像任务中,模型会学会识别边缘、纹理、形状等基础特征;在 NLP 任务中,模型会学会语法、语义等语言规律。以你使用的ResNet18 预训练权重为例,它是在ImageNet(含 1000 类、超百万张图片)上训练完成的,已经具备了强大的通用图像特征提取能力。
迁移适配目标任务当处理你的目标任务(比如 10 分类任务)时,不需要重新训练整个模型:
- 保留预训练模型的大部分卷积层(这些层负责提取通用特征)。
- 只替换模型的最后一层全连接层,适配目标任务的类别数(比如从 1000 类改为 10 类)。
- 你代码中设置
model.eval()且不微调,就是直接用预训练的特征提取能力做推理。
核心优势
- 降低数据依赖:目标任务不需要海量标注数据,小数据集也能训练出效果不错的模型。
- 减少训练成本:避免从头训练的漫长过程,节省算力和时间。
- 提升模型性能:预训练权重提供了更好的参数初始化起点,能有效缓解过拟合,提升泛化能力。
常见使用场景
- 图像领域:ResNet、VGG、EfficientNet 等模型加载 ImageNet 预训练权重,用于分类、检测、分割等任务。
- NLP 领域:BERT、GPT 等模型在大规模文本语料上预训练,再适配下游的分类、翻译、问答等任务。
二、经典的预训练模型
2.1 CNN架构训练模型
2.2 Transformer类预训练模型
2.3 自监督预训练模型
三、常见的分类预训练模型介绍
模型架构演进关键点总结
1. 深度突破:从LeNet的7层到ResNet152的152层,残差连接解决了深度网络的训练难题。 ----没上过我复试班cv部分的自行去了解下什么叫做残差连接,很重要!
2. 计算效率:GoogLeNet(Inception)和MobileNet通过结构优化,在保持精度的同时大幅降低参数量。
3. 特征复用:DenseNet的密集连接设计使模型能更好地利用浅层特征,适合小数据集。
4. 自动化设计:EfficientNet使用神经架构搜索(NAS)自动寻找最优网络配置,开创了AutoML在CNN中的应用。
总结:CNN 架构发展脉络
1. 早期探索(1990s-2010s):LeNet 验证 CNN 可行性,但受限于计算和数据。
2. 深度学习复兴(2012-2015):AlexNet、VGGNet、GoogLeNet 通过加深网络和结构创新突破性能。
3. 超深网络时代(2015 年后):ResNet 解决退化问题,开启残差连接范式,后续模型围绕效率(MobileNet)、特征复用(DenseNet)、多分支结构(Inception)等方向优化。
3.1 预训练模型的训练策略
import torch import torch.nn as nn import torch.optim as optim from torchvision import datasets, transforms from torch.utils.data import DataLoader import matplotlib.pyplot as plt # 设置中文字体支持 plt.rcParams["font.family"] = ["SimHei"] plt.rcParams['axes.unicode_minus'] = False # 解决负号显示问题 # 检查GPU是否可用 device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f"使用设备: {device}") # 1. 数据预处理(训练集增强,测试集标准化) train_transform = transforms.Compose([ transforms.RandomCrop(32, padding=4), transforms.RandomHorizontalFlip(), transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1), transforms.RandomRotation(15), transforms.ToTensor(), transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)) ]) test_transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)) ]) # 2. 加载CIFAR-10数据集(修改root为压缩包所在文件夹) train_dataset = datasets.CIFAR10( root=r"D:\PythonStudy", # 压缩包所在的文件夹路径(关键!不是压缩包本身) train=True, download=True, # 检测到文件夹内有压缩包时,仅解压不下载 transform=train_transform ) test_dataset = datasets.CIFAR10( root=r"D:\PythonStudy", # 同训练集的root路径 train=False, transform=train_transform ) # 3. 创建数据加载器(可调整batch_size) batch_size = 64 train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False) # 4. 训练函数(支持学习率调度器) def train(model, train_loader, test_loader, criterion, optimizer, scheduler, device, epochs): model.train() # 设置为训练模式 train_loss_history = [] test_loss_history = [] train_acc_history = [] test_acc_history = [] all_iter_losses = [] iter_indices = [] for epoch in range(epochs): running_loss = 0.0 correct_train = 0 total_train = 0 for batch_idx, (data, target) in enumerate(train_loader): data, target = data.to(device), target.to(device) optimizer.zero_grad() output = model(data) loss = criterion(output, target) loss.backward() optimizer.step() # 记录Iteration损失 iter_loss = loss.item() all_iter_losses.append(iter_loss) iter_indices.append(epoch * len(train_loader) + batch_idx + 1) # 统计训练指标 running_loss += iter_loss _, predicted = output.max(1) total_train += target.size(0) correct_train += predicted.eq(target).sum().item() # 每100批次打印进度 if (batch_idx + 1) % 100 == 0: print(f"Epoch {epoch+1}/{epochs} | Batch {batch_idx+1}/{len(train_loader)} " f"| 单Batch损失: {iter_loss:.4f}") # 计算 epoch 级指标 epoch_train_loss = running_loss / len(train_loader) epoch_train_acc = 100. * correct_train / total_train # 测试阶段 model.eval() correct_test = 0 total_test = 0 test_loss = 0.0 with torch.no_grad(): for data, target in test_loader: data, target = data.to(device), target.to(device) output = model(data) test_loss += criterion(output, target).item() _, predicted = output.max(1) total_test += target.size(0) correct_test += predicted.eq(target).sum().item() epoch_test_loss = test_loss / len(test_loader) epoch_test_acc = 100. * correct_test / total_test # 记录历史数据 train_loss_history.append(epoch_train_loss) test_loss_history.append(epoch_test_loss) train_acc_history.append(epoch_train_acc) test_acc_history.append(epoch_test_acc) # 更新学习率调度器 if scheduler is not None: scheduler.step(epoch_test_loss) # 打印 epoch 结果 print(f"Epoch {epoch+1} 完成 | 训练损失: {epoch_train_loss:.4f} " f"| 训练准确率: {epoch_train_acc:.2f}% | 测试准确率: {epoch_test_acc:.2f}%") # 绘制损失和准确率曲线 plot_iter_losses(all_iter_losses, iter_indices) plot_epoch_metrics(train_acc_history, test_acc_history, train_loss_history, test_loss_history) return epoch_test_acc # 返回最终测试准确率 # 5. 绘制Iteration损失曲线 def plot_iter_losses(losses, indices): plt.figure(figsize=(10, 4)) plt.plot(indices, losses, 'b-', alpha=0.7) plt.xlabel('Iteration(Batch序号)') plt.ylabel('损失值') plt.title('训练过程中的Iteration损失变化') plt.grid(True) plt.show() # 6. 绘制Epoch级指标曲线 def plot_epoch_metrics(train_acc, test_acc, train_loss, test_loss): epochs = range(1, len(train_acc) + 1) plt.figure(figsize=(12, 5)) # 准确率曲线 plt.subplot(1, 2, 1) plt.plot(epochs, train_acc, 'b-', label='训练准确率') plt.plot(epochs, test_acc, 'r-', label='测试准确率') plt.xlabel('Epoch') plt.ylabel('准确率 (%)') plt.title('准确率随Epoch变化') plt.legend() plt.grid(True) # 损失曲线 plt.subplot(1, 2, 2) plt.plot(epochs, train_loss, 'b-', label='训练损失') plt.plot(epochs, test_loss, 'r-', label='测试损失') plt.xlabel('Epoch') plt.ylabel('损失值') plt.title('损失值随Epoch变化') plt.legend() plt.grid(True) plt.tight_layout() plt.show() # 导入ResNet模型 from torchvision.models import resnet18 # 定义ResNet18模型(支持预训练权重加载) def create_resnet18(pretrained=True, num_classes=10): # 加载预训练模型(ImageNet权重) model = resnet18(pretrained=pretrained) # 修改最后一层全连接层,适配CIFAR-10的10分类任务 in_features = model.fc.in_features model.fc = nn.Linear(in_features, num_classes) # 将模型转移到指定设备(CPU/GPU) model = model.to(device) return model # 创建ResNet18模型(加载ImageNet预训练权重,不进行微调) model = create_resnet18(pretrained=True, num_classes=10) model.eval() # 设置为推理模式 # 测试单张图片(示例) from torchvision import utils # 从测试数据集中获取一张图片 dataiter = iter(test_loader) images, labels = next(dataiter) # 仅修改这一行:替换dataiter.next()为next(dataiter) images = images[:1].to(device) # 取第1张图片 # 前向传播 with torch.no_grad(): outputs = model(images) _, predicted = torch.max(outputs.data, 1) # 显示图片和预测结果 plt.imshow(utils.make_grid(images.cpu(), normalize=True).permute(1, 2, 0)) plt.title(f"预测类别: {predicted.item()}") plt.axis('off') plt.show()在 CIFAR-10 数据集中,类别标签是固定的 10 个,分别对应:
| 标签(数字) | 类别名称 | 说明 |
|---|---|---|
| 0 | airplane | 飞机 |
| 1 | automobile | 汽车(含轿车、卡车等) |
| 2 | bird | 鸟类 |
| 3 | cat | 猫 |
| 4 | deer | 鹿 |
| 5 | dog | 狗 |
| 6 | frog | 青蛙 |
| 7 | horse | 马 |
| 8 | ship | 船 |
| 9 | truck | 卡车(重型货车等) |
import torch import torch.nn as nn import torch.optim as optim from torchvision import datasets, transforms, models from torch.utils.data import DataLoader import matplotlib.pyplot as plt import os # 设置中文字体支持 plt.rcParams["font.family"] = ["SimHei"] plt.rcParams['axes.unicode_minus'] = False # 解决负号显示问题 # 检查GPU是否可用 device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f"使用设备: {device}") # 1. 数据预处理(训练集增强,测试集标准化) train_transform = transforms.Compose([ transforms.RandomCrop(32, padding=4), transforms.RandomHorizontalFlip(), transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1), transforms.RandomRotation(15), transforms.ToTensor(), transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)) ]) test_transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)) ]) # 2. 加载CIFAR-10数据集(修改root为压缩包所在文件夹) train_dataset = datasets.CIFAR10( root=r"D:\PythonStudy", # 压缩包所在的文件夹路径(关键!不是压缩包本身) train=True, download=True, # 检测到文件夹内有压缩包时,仅解压不下载 transform=train_transform ) test_dataset = datasets.CIFAR10( root=r"D:\PythonStudy", # 同训练集的root路径 train=False, transform=train_transform ) # 3. 创建数据加载器 batch_size = 64 train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False) # 4. 定义ResNet18模型 def create_resnet18(pretrained=True, num_classes=10): model = models.resnet18(pretrained=pretrained) # 修改最后一层全连接层 in_features = model.fc.in_features model.fc = nn.Linear(in_features, num_classes) return model.to(device) # 5. 冻结/解冻模型层的函数 def freeze_model(model, freeze=True): """冻结或解冻模型的卷积层参数""" # 冻结/解冻除fc层外的所有参数 for name, param in model.named_parameters(): if 'fc' not in name: param.requires_grad = not freeze # 打印冻结状态 frozen_params = sum(p.numel() for p in model.parameters() if not p.requires_grad) total_params = sum(p.numel() for p in model.parameters()) if freeze: print(f"已冻结模型卷积层参数 ({frozen_params}/{total_params} 参数)") else: print(f"已解冻模型所有参数 ({total_params}/{total_params} 参数可训练)") return model # 6. 训练函数(支持阶段式训练) def train_with_freeze_schedule(model, train_loader, test_loader, criterion, optimizer, scheduler, device, epochs, freeze_epochs=5): """ 前freeze_epochs轮冻结卷积层,之后解冻所有层进行训练 """ train_loss_history = [] test_loss_history = [] train_acc_history = [] test_acc_history = [] all_iter_losses = [] iter_indices = [] # 初始冻结卷积层 if freeze_epochs > 0: model = freeze_model(model, freeze=True) for epoch in range(epochs): # 解冻控制:在指定轮次后解冻所有层 if epoch == freeze_epochs: model = freeze_model(model, freeze=False) # 解冻后调整优化器(可选) optimizer.param_groups[0]['lr'] = 1e-4 # 降低学习率防止过拟合 model.train() # 设置为训练模式 running_loss = 0.0 correct_train = 0 total_train = 0 for batch_idx, (data, target) in enumerate(train_loader): data, target = data.to(device), target.to(device) optimizer.zero_grad() output = model(data) loss = criterion(output, target) loss.backward() optimizer.step() # 记录Iteration损失 iter_loss = loss.item() all_iter_losses.append(iter_loss) iter_indices.append(epoch * len(train_loader) + batch_idx + 1) # 统计训练指标 running_loss += iter_loss _, predicted = output.max(1) total_train += target.size(0) correct_train += predicted.eq(target).sum().item() # 每100批次打印进度 if (batch_idx + 1) % 100 == 0: print(f"Epoch {epoch+1}/{epochs} | Batch {batch_idx+1}/{len(train_loader)} " f"| 单Batch损失: {iter_loss:.4f}") # 计算 epoch 级指标 epoch_train_loss = running_loss / len(train_loader) epoch_train_acc = 100. * correct_train / total_train # 测试阶段 model.eval() correct_test = 0 total_test = 0 test_loss = 0.0 with torch.no_grad(): for data, target in test_loader: data, target = data.to(device), target.to(device) output = model(data) test_loss += criterion(output, target).item() _, predicted = output.max(1) total_test += target.size(0) correct_test += predicted.eq(target).sum().item() epoch_test_loss = test_loss / len(test_loader) epoch_test_acc = 100. * correct_test / total_test # 记录历史数据 train_loss_history.append(epoch_train_loss) test_loss_history.append(epoch_test_loss) train_acc_history.append(epoch_train_acc) test_acc_history.append(epoch_test_acc) # 更新学习率调度器 if scheduler is not None: scheduler.step(epoch_test_loss) # 打印 epoch 结果 print(f"Epoch {epoch+1} 完成 | 训练损失: {epoch_train_loss:.4f} " f"| 训练准确率: {epoch_train_acc:.2f}% | 测试准确率: {epoch_test_acc:.2f}%") # 绘制损失和准确率曲线 plot_iter_losses(all_iter_losses, iter_indices) plot_epoch_metrics(train_acc_history, test_acc_history, train_loss_history, test_loss_history) return epoch_test_acc # 返回最终测试准确率 # 7. 绘制Iteration损失曲线 def plot_iter_losses(losses, indices): plt.figure(figsize=(10, 4)) plt.plot(indices, losses, 'b-', alpha=0.7) plt.xlabel('Iteration(Batch序号)') plt.ylabel('损失值') plt.title('训练过程中的Iteration损失变化') plt.grid(True) plt.show() # 8. 绘制Epoch级指标曲线 def plot_epoch_metrics(train_acc, test_acc, train_loss, test_loss): epochs = range(1, len(train_acc) + 1) plt.figure(figsize=(12, 5)) # 准确率曲线 plt.subplot(1, 2, 1) plt.plot(epochs, train_acc, 'b-', label='训练准确率') plt.plot(epochs, test_acc, 'r-', label='测试准确率') plt.xlabel('Epoch') plt.ylabel('准确率 (%)') plt.title('准确率随Epoch变化') plt.legend() plt.grid(True) # 损失曲线 plt.subplot(1, 2, 2) plt.plot(epochs, train_loss, 'b-', label='训练损失') plt.plot(epochs, test_loss, 'r-', label='测试损失') plt.xlabel('Epoch') plt.ylabel('损失值') plt.title('损失值随Epoch变化') plt.legend() plt.grid(True) plt.tight_layout() plt.show() # 主函数:训练模型 def main(): # 参数设置 epochs = 40 # 总训练轮次 freeze_epochs = 5 # 冻结卷积层的轮次 learning_rate = 1e-3 # 初始学习率 weight_decay = 1e-4 # 权重衰减 # 创建ResNet18模型(加载预训练权重) model = create_resnet18(pretrained=True, num_classes=10) # 定义优化器和损失函数 optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay) criterion = nn.CrossEntropyLoss() # 定义学习率调度器 scheduler = optim.lr_scheduler.ReduceLROnPlateau( optimizer, mode='min', factor=0.5, patience=2, verbose=True ) # 开始训练(前5轮冻结卷积层,之后解冻) final_accuracy = train_with_freeze_schedule( model=model, train_loader=train_loader, test_loader=test_loader, criterion=criterion, optimizer=optimizer, scheduler=scheduler, device=device, epochs=epochs, freeze_epochs=freeze_epochs ) print(f"训练完成!最终测试准确率: {final_accuracy:.2f}%") # # 保存模型 # torch.save(model.state_dict(), 'resnet18_cifar10_finetuned.pth') # print("模型已保存至: resnet18_cifar10_finetuned.pth") if __name__ == "__main__": main()几个明显的现象
1. 解冻后几个epoch即可达到之前cnn训练20轮的效果,这是预训练的优势
2. 由于训练集用了 RandomCrop(随机裁剪)、RandomHorizontalFlip(随机水平翻转)、ColorJitter(颜色抖动)等数据增强操作,这会让训练时模型看到的图片有更多 “干扰” 或变形。比如一张汽车图片,训练时可能被裁剪成只显示局部、颜色也有变化,模型学习难度更高;而测试集是标准的、没增强的图片,模型预测相对轻松,就可能出现训练集准确率暂时低于测试集的情况,尤其在训练前期增强对模型影响更明显。随着训练推进,模型适应增强后会缓解。
3. 最后收敛后的效果超过非预训练模型的80%,大幅提升
四、MobileNetV3模型
import torch import torch.nn as nn import torch.optim as optim from torchvision import datasets, transforms, models from torch.utils.data import DataLoader import matplotlib.pyplot as plt import os # 中文字体设置 plt.rcParams["font.family"] = ["SimHei"] plt.rcParams['axes.unicode_minus'] = False device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f"使用设备: {device}") # 1. 数据预处理(同原代码) train_transform = transforms.Compose([ transforms.RandomCrop(32, padding=4), transforms.RandomHorizontalFlip(), transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1), transforms.RandomRotation(15), transforms.ToTensor(), transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)) ]) test_transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)) ]) # 2. 加载CIFAR-10(同原代码) train_dataset = datasets.CIFAR10( root=r"D:\PythonStudy", train=True, download=True, transform=train_transform ) test_dataset = datasets.CIFAR10( root=r"D:\PythonStudy", train=False, transform=test_transform ) # 3. 数据加载器(同原代码) batch_size = 64 train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False) # 4. 定义MobileNetV3模型(替换核心) def create_mobilenetv3(pretrained=True, num_classes=10): # MobileNetV3有small/large版本,这里选small(更适配移动端) model = models.mobilenet_v3_small(pretrained=pretrained) # 修改分类头:MobileNetV3的分类层是classifier,需替换最后一层 in_features = model.classifier[-1].in_features model.classifier[-1] = nn.Linear(in_features, num_classes) return model.to(device) # 5. 冻结/解冻函数(适配MobileNetV3的classifier层) def freeze_model(model, freeze=True): for name, param in model.named_parameters(): if 'classifier' not in name: # 仅保留分类头可训练 param.requires_grad = not freeze frozen_params = sum(p.numel() for p in model.parameters() if not p.requires_grad) total_params = sum(p.numel() for p in model.parameters()) if freeze: print(f"已冻结模型特征层参数 ({frozen_params}/{total_params} 参数)") else: print(f"已解冻模型所有参数 ({total_params}/{total_params} 参数可训练)") return model # 6-8. 训练函数、绘图函数(同原代码,复用) def train_with_freeze_schedule(model, train_loader, test_loader, criterion, optimizer, scheduler, device, epochs, freeze_epochs=5): train_loss_history = [] test_loss_history = [] train_acc_history = [] test_acc_history = [] all_iter_losses = [] iter_indices = [] if freeze_epochs > 0: model = freeze_model(model, freeze=True) for epoch in range(epochs): if epoch == freeze_epochs: model = freeze_model(model, freeze=False) optimizer.param_groups[0]['lr'] = 1e-4 model.train() running_loss = 0.0 correct_train = 0 total_train = 0 for batch_idx, (data, target) in enumerate(train_loader): data, target = data.to(device), target.to(device) optimizer.zero_grad() output = model(data) loss = criterion(output, target) loss.backward() optimizer.step() iter_loss = loss.item() all_iter_losses.append(iter_loss) iter_indices.append(epoch * len(train_loader) + batch_idx + 1) running_loss += iter_loss _, predicted = output.max(1) total_train += target.size(0) correct_train += predicted.eq(target).sum().item() if (batch_idx + 1) % 100 == 0: print(f"Epoch {epoch+1}/{epochs} | Batch {batch_idx+1}/{len(train_loader)} | 单Batch损失: {iter_loss:.4f}") epoch_train_loss = running_loss / len(train_loader) epoch_train_acc = 100. * correct_train / total_train model.eval() correct_test = 0 total_test = 0 test_loss = 0.0 with torch.no_grad(): for data, target in test_loader: data, target = data.to(device), target.to(device) output = model(data) test_loss += criterion(output, target).item() _, predicted = output.max(1) total_test += target.size(0) correct_test += predicted.eq(target).sum().item() epoch_test_loss = test_loss / len(test_loader) epoch_test_acc = 100. * correct_test / total_test train_loss_history.append(epoch_train_loss) test_loss_history.append(epoch_test_loss) train_acc_history.append(epoch_train_acc) test_acc_history.append(epoch_test_acc) if scheduler is not None: scheduler.step(epoch_test_loss) print(f"Epoch {epoch+1} 完成 | 训练损失: {epoch_train_loss:.4f} | 训练准确率: {epoch_train_acc:.2f}% | 测试准确率: {epoch_test_acc:.2f}%") plot_iter_losses(all_iter_losses, iter_indices) plot_epoch_metrics(train_acc_history, test_acc_history, train_loss_history, test_loss_history) return epoch_test_acc def plot_iter_losses(losses, indices): plt.figure(figsize=(10, 4)) plt.plot(indices, losses, 'b-', alpha=0.7) plt.xlabel('Iteration(Batch序号)') plt.ylabel('损失值') plt.title('训练过程中的Iteration损失变化') plt.grid(True) plt.show() def plot_epoch_metrics(train_acc, test_acc, train_loss, test_loss): epochs = range(1, len(train_acc) + 1) plt.figure(figsize=(12, 5)) plt.subplot(1, 2, 1) plt.plot(epochs, train_acc, 'b-', label='训练准确率') plt.plot(epochs, test_acc, 'r-', label='测试准确率') plt.xlabel('Epoch') plt.ylabel('准确率 (%)') plt.title('准确率随Epoch变化') plt.legend() plt.grid(True) plt.subplot(1, 2, 2) plt.plot(epochs, train_loss, 'b-', label='训练损失') plt.plot(epochs, test_loss, 'r-', label='测试损失') plt.xlabel('Epoch') plt.ylabel('损失值') plt.title('损失值随Epoch变化') plt.legend() plt.grid(True) plt.tight_layout() plt.show() # 主函数 def main(): epochs = 40 freeze_epochs = 5 learning_rate = 1e-3 weight_decay = 1e-4 # 替换为MobileNetV3 model = create_mobilenetv3(pretrained=True, num_classes=10) optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay) criterion = nn.CrossEntropyLoss() scheduler = optim.lr_scheduler.ReduceLROnPlateau( optimizer, mode='min', factor=0.5, patience=2, verbose=True ) final_accuracy = train_with_freeze_schedule( model=model, train_loader=train_loader, test_loader=test_loader, criterion=criterion, optimizer=optimizer, scheduler=scheduler, device=device, epochs=epochs, freeze_epochs=freeze_epochs ) print(f"训练完成!最终测试准确率: {final_accuracy:.2f}%") if __name__ == "__main__": main()结果如下:
1、MobileNetV3 结果分析
从 MobileNetV3 的训练曲线可以看出:
Iteration 损失变化:损失值从初始的 3.0 左右快速下降,最终稳定在 1.0 上下波动,整体趋势收敛,但后期波动略大 —— 说明模型在训练过程中梯度更新不够稳定(这是轻量化模型的常见现象,参数量少导致抗干扰能力弱)。
准确率 + 损失随 Epoch 变化:
- 训练准确率最终约 70%,测试准确率约 70%;
- 训练损失最终约 0.9,测试损失约 0.8;
- 训练 / 测试指标几乎重合,无明显过拟合—— 说明 MobileNetV3 的轻量化结构在 CIFAR-10 上泛化能力较好,但精度上限不高(受参数量限制)。
2、ResNet18 结果分析
从 ResNet18 的训练曲线可以看出:
Iteration 损失变化:损失值从初始的 2.5 快速下降,最终稳定在 0.5 左右,后期波动远小于 MobileNetV3—— 说明 ResNet18 的残差结构让梯度传递更顺畅,训练稳定性更强。
准确率 + 损失随 Epoch 变化:
- 训练准确率最终约 90%,测试准确率约 88%;
- 训练损失最终约 0.25,测试损失约 0.45;
- 训练准确率略高于测试准确率,存在轻微过拟合—— 但整体精度远高于 MobileNetV3,说明 ResNet18 的特征提取能力更强(参数量更大)。
3、两者对比
| 维度 | MobileNetV3(轻量化) | ResNet18(残差结构) | |
|---|---|---|---|
| 最终测试准确率 | ~70% | ~88% | (ResNet18 领先 18%) |
| 训练稳定性 | 损失波动大 | 损失波动小 | (ResNet18 更稳定) |
| 过拟合程度 | 无过拟合 | 轻微过拟合 | (MobileNetV3 泛化更 “保守”) |
| 精度 - 效率权衡 | 速度快、参数量少(≈1.5M) | 精度高、参数量中等(≈11M) | (MobileNetV3 适合移动端,ResNet18 适合对精度有要求的场景) |
核心结论
- MobileNetV3 的优势是轻量化、无过拟合,但受参数量限制,在 CIFAR-10 上精度上限较低;
- ResNet18 的优势是精度高、训练稳定,残差结构提升了特征提取能力,但参数量更大、存在轻微过拟合;
- 两者的差异本质是 **“轻量化(速度)” 与 “精度” 的权衡 **——MobileNetV3 适合资源受限的场景,ResNet18 适合对精度要求更高的场景。
勇闯python的第44天@浙大疏锦行