基于TensorFlow 2的猴痘病识别系统开发实践-开发者社区

1. 项目概述

在医疗影像诊断领域，深度学习技术正发挥着越来越重要的作用。本文将分享一个基于TensorFlow 2实现的猴痘病识别系统开发全过程。这个项目使用卷积神经网络(CNN)对2142张皮肤病变图像进行分类，准确区分猴痘(Monkeypox)与其他皮肤病症(Others)。最终模型在验证集上达到88.78%的准确率，为医疗辅助诊断提供了一个可行的技术方案。

提示：本项目完整代码已开源，建议配合Jupyter Notebook边阅读边实践。需要RTX 3080 Ti及以上级别GPU以获得最佳训练效率。

2. 环境配置与数据准备

2.1 GPU环境设置

现代深度学习模型训练强烈建议使用GPU加速。以下是正确配置TensorFlow GPU环境的专业做法：

import tensorflow as tf gpus = tf.config.list_physical_devices("GPU") if gpus: # 如果有多个GPU，通常选择第一个进行处理 gpu0 = gpus[0] # 启用GPU显存动态增长，避免一次性占用全部显存 tf.config.experimental.set_memory_growth(gpu0, True) # 明确指定使用的GPU设备 tf.config.set_visible_devices([gpu0],"GPU") print(gpus)

这段代码实现了三个关键功能：

检测可用GPU设备
设置显存按需分配（避免OOM错误）
指定使用的GPU设备

常见问题：如果遇到"Could not create cudnn handle"错误，通常是因为CUDA与cuDNN版本不匹配。建议使用TensorFlow官方推荐的版本组合。

2.2 数据导入与检查

我们使用的数据集包含2142张jpg格式的皮肤病变图像，分为两个类别：

Monkeypox：猴痘病例图像
Others：其他皮肤病症图像

import os, PIL, pathlib import matplotlib.pyplot as plt import numpy as np from tensorflow import keras from tensorflow.keras import layers, models # 设置数据路径（建议使用绝对路径） data_dir = './data/day04/' data_dir = pathlib.Path(data_dir) # 获取类别名称 classNames = [path.name for path in data_dir.glob('*')] print("类别列表：", classNames) # 统计图像数量 image_count = len(list(data_dir.glob('*/*.jpg'))) print("图像总数：", image_count)

执行后会输出：

类别列表： ['Others', 'Monkeypox'] 图像总数： 2142

2.3 数据可视化分析

在正式训练前，先观察样本数据特征：

# 随机查看20张训练图片 plt.figure(figsize=(20, 10)) for images, labels in train_ds.take(1): for i in range(20): ax = plt.subplot(5, 10, i + 1) plt.imshow(images[i].numpy().astype("uint8")) plt.title(class_names[labels[i]]) plt.axis("off")

通过可视化可以发现：

猴痘病变通常表现为集中分布的脓疱
其他皮肤病则呈现更多样化的形态特征
图像拍摄角度、光照条件存在差异

这些观察将指导我们后续的数据增强策略设计。

3. 数据预处理流程

3.1 数据集划分与加载

使用TensorFlow的image_dataset_from_directory方法可以高效加载图像数据：

batch_size = 32 img_height = 224 img_width = 224 # 训练集（80%数据） train_ds = tf.keras.preprocessing.image_dataset_from_directory( data_dir, validation_split=0.2, subset="training", seed=123, # 固定随机种子确保可复现 image_size=(img_height, img_width), batch_size=batch_size) # 验证集（20%数据） val_ds = tf.keras.preprocessing.image_dataset_from_directory( data_dir, validation_split=0.2, subset="validation", seed=123, image_size=(img_height, img_width), batch_size=batch_size) class_names = train_ds.class_names print("类别标签：", class_names)

关键参数说明：

validation_split=0.2：保留20%数据作为验证集
seed=123：固定随机种子确保每次划分一致
image_size=(224,224)：统一调整图像尺寸，符合CNN输入要求

3.2 数据集性能优化

使用以下方法显著提升数据加载效率：

AUTOTUNE = tf.data.AUTOTUNE train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE) val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

这三步优化分别实现：

.cache()：将数据集缓存到内存中，避免每个epoch重复磁盘IO
.shuffle(1000)：打乱数据顺序，增强模型泛化能力
.prefetch()：后台预加载数据，减少GPU等待时间

实测显示，经过优化后训练速度可提升2-3倍，特别是当使用机械硬盘存储数据时。

4. CNN模型构建与训练

4.1 网络架构设计

我们的CNN模型采用经典"卷积-池化-全连接"结构：

num_classes = 2 model = models.Sequential([ # 归一化层：将像素值缩放到0-1范围 layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)), # 第一卷积块 layers.Conv2D(16, (3,3), activation='relu', input_shape=(img_height, img_width, 3)), layers.AveragePooling2D((2,2)), # 第二卷积块 layers.Conv2D(32, (3,3), activation='relu'), layers.AveragePooling2D((2,2)), layers.Dropout(0.3), # 随机丢弃30%神经元，防止过拟合 # 第三卷积块 layers.Conv2D(64, (3,3), activation='relu'), layers.Dropout(0.3), # 分类头 layers.Flatten(), layers.Dense(128, activation='relu'), layers.Dense(num_classes) ])

架构特点分析：

使用3×3小卷积核，平衡特征提取能力与参数数量
采用平均池化而非最大池化，更适合医学图像处理
添加Dropout层有效控制过拟合
最终全连接层使用128个神经元，足够捕获高级特征

打印网络结构：

model.summary()

输出显示模型共有22,175,138个可训练参数，适合中等规模数据集。

4.2 模型编译配置

选择Adam优化器并配置适当的学习率：

opt = tf.keras.optimizers.Adam(learning_rate=1e-4) model.compile( optimizer=opt, loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'] )

参数选择考量：

初始学习率1e-4：小学习率适合医学图像精细特征学习
SparseCategoricalCrossentropy：适用于整数标签分类任务
from_logits=True：模型最后一层未使用softmax激活

4.3 模型训练过程

实施带checkpoint的模型训练：

from tensorflow.keras.callbacks import ModelCheckpoint epochs = 50 # 设置模型检查点，只保存验证准确率最高的模型 checkpointer = ModelCheckpoint('best_model.h5', monitor='val_accuracy', verbose=1, save_best_only=True, save_weights_only=True) history = model.fit( train_ds, validation_data=val_ds, epochs=epochs, callbacks=[checkpointer] )

训练过程显示：

初期训练准确率快速上升，验证准确率稳步提高
约30个epoch后验证指标趋于稳定
最佳验证准确率达到88.78%

专业建议：当验证准确率连续5个epoch不再提升时，可考虑提前终止训练以节省计算资源。

5. 模型评估与预测

5.1 训练曲线分析

绘制训练过程中的准确率和损失曲线：

acc = history.history['accuracy'] val_acc = history.history['val_accuracy'] loss = history.history['loss'] val_loss = history.history['val_loss'] epochs_range = range(epochs) plt.figure(figsize=(12, 4)) plt.subplot(1, 2, 1) plt.plot(epochs_range, acc, label='Training Accuracy') plt.plot(epochs_range, val_acc, label='Validation Accuracy') plt.legend(loc='lower right') plt.title('Training and Validation Accuracy') plt.subplot(1, 2, 2) plt.plot(epochs_range, loss, label='Training Loss') plt.plot(epochs_range, val_loss, label='Validation Loss') plt.legend(loc='upper right') plt.title('Training and Validation Loss') plt.show()

曲线显示：

训练准确率最终达到99%，存在轻微过拟合
验证准确率稳定在87-89%之间
损失曲线收敛平稳，没有剧烈波动

5.2 单图像预测测试

加载最佳模型进行单张图像预测：

# 加载保存的最佳模型权重 model.load_weights('best_model.h5') from PIL import Image import numpy as np # 选择测试图像路径 img = Image.open("./data/day04/Others/NM15_02_11.jpg") image = tf.image.resize(img, [img_height, img_width]) # 扩展维度匹配模型输入要求 img_array = tf.expand_dims(image, 0) # 执行预测 predictions = model.predict(img_array) print("预测结果为：", class_names[np.argmax(predictions)])

输出显示模型正确将测试图像分类为"Others"。

6. 优化建议与改进方向

在实际部署中，可以考虑以下优化措施：

数据增强：添加旋转、翻转等增强操作，提升模型泛化能力

data_augmentation = keras.Sequential([ layers.RandomFlip("horizontal"), layers.RandomRotation(0.1), ])

迁移学习：使用预训练的EfficientNet等模型作为特征提取器
```
base_model = tf.keras.applications.EfficientNetB0(include_top=False)
```
类别平衡：检查数据集类别分布，必要时采用过采样/欠采样
超参数调优：系统调整学习率、批大小、网络深度等参数
测试集评估：保留部分数据作为最终测试集，避免开发过程过拟合

这个项目展示了如何使用TensorFlow 2构建实用的医学图像分类系统。虽然当前模型已经表现不错，但在实际医疗应用中还需要更严格的验证和临床测试。建议感兴趣的读者可以尝试增加数据量、改进模型架构，或者将该方法应用到其他医学影像分类任务中。