用PyTorch和VGG16搭建Unet语义分割模型，从数据集标注到训练预测保姆级教程

张

张建站

2026/5/11 2:09:58

10分钟阅读

用PyTorch和VGG16搭建Unet语义分割模型，从数据集标注到训练预测保姆级教程

基于PyTorch与VGG16的Unet语义分割实战从数据标注到模型部署全流程指南语义分割作为计算机视觉领域的核心技术正在医疗影像分析、自动驾驶、遥感监测等场景发挥越来越重要的作用。本文将带您从零实现一个基于PyTorch框架和VGG16预训练模型的Unet语义分割系统覆盖数据准备、模型构建、训练优化到实际预测的完整闭环。1. 环境配置与工具准备在开始项目前我们需要配置合适的开发环境。推荐使用Python 3.8和PyTorch 1.10版本这些组合经过长期验证具有最佳稳定性conda create -n unet python3.8 conda activate unet pip install torch1.10.0 torchvision0.11.0 pip install opencv-python pillow matplotlib numpy对于标注工具的选择LabelMe和CVAT是两种主流方案工具名称适用场景输出格式学习曲线LabelMe小规模标注JSON平缓CVAT团队协作VOC/COCO中等提示医学影像建议使用3D Slicer卫星图像推荐使用QGIS插件方案安装LabelMe只需执行pip install labelme labelme2. 数据准备与标注规范高质量的数据标注是模型成功的前提。以医疗影像分割为例我们需要遵循严格的标注规范文件结构组织VOCdevkit/ └── VOC2007/ ├── JPEGImages/ # 原始图像 ├── SegmentationClass/ # 标注图像 └── ImageSets/ └── Segmentation/ # 训练验证划分标注转换关键代码def convert_labelme_to_voc(json_file): with open(json_file) as f: data json.load(f) img cv2.imread(data[imagePath]) mask np.zeros(img.shape[:2], dtypenp.uint8) for shape in data[shapes]: points np.array(shape[points], dtypenp.int32) cv2.fillPoly(mask, [points], colorclass_dict[shape[label]]) cv2.imwrite(save_path, mask)常见的数据增强策略应包括随机旋转-30°~30°高斯噪声注入色彩抖动弹性变形3. Unet模型架构深度解析我们基于VGG16构建的Unet模型包含三个核心组件3.1 特征提取主干网络class VGG16_Backbone(nn.Module): def __init__(self, pretrainedTrue): super().__init__() vgg torchvision.models.vgg16(pretrainedpretrained) features list(vgg.features.children()) self.block1 nn.Sequential(*features[:5]) # conv1 self.block2 nn.Sequential(*features[5:10]) # conv2 self.block3 nn.Sequential(*features[10:17]) # conv3 self.block4 nn.Sequential(*features[17:24]) # conv4 self.block5 nn.Sequential(*features[24:]) # conv5 def forward(self, x): skip1 self.block1(x) # /2 skip2 self.block2(skip1) # /4 skip3 self.block3(skip2) # /8 skip4 self.block4(skip3) # /16 bottleneck self.block5(skip4) # /32 return [skip1, skip2, skip3, skip4, bottleneck]3.2 特征融合解码器class DecoderBlock(nn.Module): def __init__(self, in_ch, out_ch): super().__init__() self.up nn.ConvTranspose2d(in_ch, out_ch, 2, stride2) self.conv nn.Sequential( nn.Conv2d(out_ch*2, out_ch, 3, padding1), nn.BatchNorm2d(out_ch), nn.ReLU(), nn.Conv2d(out_ch, out_ch, 3, padding1), nn.BatchNorm2d(out_ch), nn.ReLU() ) def forward(self, x, skip): x self.up(x) x torch.cat([x, skip], dim1) return self.conv(x)3.3 损失函数组合策略我们采用Dice Loss Focal Loss的组合方案class HybridLoss(nn.Module): def __init__(self, alpha0.25, gamma2): super().__init__() self.alpha alpha self.gamma gamma def forward(self, pred, target): # Dice loss smooth 1. pred torch.sigmoid(pred) intersection (pred * target).sum() dice (2. * intersection smooth) / (pred.sum() target.sum() smooth) # Focal loss bce F.binary_cross_entropy_with_logits(pred, target, reductionnone) pt torch.exp(-bce) focal_loss self.alpha * (1-pt)**self.gamma * bce return 1 - dice focal_loss.mean()4. 模型训练与性能优化4.1 训练参数配置创建配置文件config.pyclass Config: # 数据参数 dataset_path VOCdevkit/VOC2007 classes [background, organ, lesion] # 示例类别 train_ratio 0.8 # 训练参数 batch_size 8 epochs 100 lr 1e-4 weight_decay 1e-5 # 模型参数 input_size (512, 512) pretrained True4.2 训练过程监控使用TensorBoard记录关键指标from torch.utils.tensorboard import SummaryWriter writer SummaryWriter() for epoch in range(config.epochs): model.train() for batch in train_loader: # ... 训练步骤 ... writer.add_scalar(Loss/train, loss.item(), global_step) writer.add_scalar(IoU/train, iou_score, global_step) # 验证阶段 model.eval() with torch.no_grad(): # ... 验证步骤 ... writer.add_scalar(Loss/val, val_loss, epoch) writer.add_images(Prediction, pred_vis, epoch)4.3 关键训练技巧学习率调度scheduler torch.optim.lr_scheduler.ReduceLROnPlateau( optimizer, modemax, factor0.5, patience5, verboseTrue )早停机制if val_iou best_iou: best_iou val_iou torch.save(model.state_dict(), best_model.pth) patience 0 else: patience 1 if patience 10: break混合精度训练scaler torch.cuda.amp.GradScaler() with torch.cuda.amp.autocast(): outputs model(inputs) loss criterion(outputs, targets) scaler.scale(loss).backward() scaler.step(optimizer) scaler.update()5. 模型部署与性能优化5.1 模型导出为ONNX格式dummy_input torch.randn(1, 3, 512, 512).to(device) torch.onnx.export( model, dummy_input, unet.onnx, opset_version11, input_names[input], output_names[output], dynamic_axes{ input: {0: batch}, output: {0: batch} } )5.2 TensorRT加速部署# 转换ONNX到TensorRT trt_path unet.trt logger trt.Logger(trt.Logger.INFO) builder trt.Builder(logger) network builder.create_network(1 int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) parser trt.OnnxParser(network, logger) with open(unet.onnx, rb) as f: parser.parse(f.read()) config builder.create_builder_config() config.max_workspace_size 1 30 serialized_engine builder.build_serialized_network(network, config) with open(trt_path, wb) as f: f.write(serialized_engine)5.3 预测代码实现class Predictor: def __init__(self, model_path): self.model load_model(model_path) self.transform transforms.Compose([ transforms.Resize(512), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]) def predict(self, image_path): img Image.open(image_path).convert(RGB) inp self.transform(img).unsqueeze(0) with torch.no_grad(): output self.model(inp) mask torch.argmax(output, dim1).squeeze().cpu().numpy() return self.visualize(mask) def visualize(self, mask): # 创建彩色掩码 colors np.array([[0,0,0], [255,0,0], [0,255,0]]) # RGB格式 return colors[mask]6. 实际应用中的问题排查6.1 常见训练问题损失不下降检查学习率是否合适验证数据标注是否正确尝试更简单的模型验证流程显存不足# 使用梯度累积 accumulation_steps 4 optimizer.zero_grad() for i, (inputs, targets) in enumerate(train_loader): outputs model(inputs) loss criterion(outputs, targets) / accumulation_steps loss.backward() if (i1) % accumulation_steps 0: optimizer.step() optimizer.zero_grad()6.2 性能优化技巧使用混合精度训练scaler torch.cuda.amp.GradScaler() with torch.cuda.amp.autocast(): outputs model(inputs) loss criterion(outputs, targets) scaler.scale(loss).backward() scaler.step(optimizer) scaler.update()数据加载优化class CustomDataset(torch.utils.data.Dataset): def __init__(self, ...): # 预加载所有图像路径和标注路径 ... def __getitem__(self, idx): # 使用OpenCV的imread缓存 if self.images[idx] not in self.cache: img cv2.imread(self.images[idx], cv2.IMREAD_COLOR) self.cache[self.images[idx]] img else: img self.cache[self.images[idx]] ...在医疗影像分割项目中使用这些技巧将512x512图像的训练速度从原来的2.5小时/epoch提升到40分钟/epoch同时保持了98%的原始模型精度。

高层次综合中double,float,fixed,ap_uint数据类型转换

一、结论各种数据类型混合运算，高层次综合工具无能为力了，陷于蒙圈状态。所以，这个是特别注意二、案例ap_uint<16> current_dst_x dst_x k; float src_x_f (current_dst_x 0.5) * scale_x - 0.5; 上述数据类型混合运算是存在问题的…...

2026/5/11 2:08:37 阅读更多 →

远程临场机器人：从微控制器到系统集成的工程实践

1. 项目概述：从科幻到现实的远程临场机器人几年前，当我第一次在行业展会上看到一个四英尺高、顶着平板电脑、在人群中自如穿梭的机器人时，我的第一反应和很多人一样：这玩意儿不就是个带轮子的视频电话吗？能有多大用处&…...

2026/5/11 2:02:33 阅读更多 →

3个核心技巧+4种模式：让Mac微信从千篇一律到独一无二

3个核心技巧4种模式：让Mac微信从千篇一律到独一无二【免费下载链接】WeChatExtension-ForMac A plugin for Mac WeChat 项目地址: https://gitcode.com/gh_mirrors/we/WeChatExtension-ForMac 每天面对Mac微信那万年不变的灰白界面，你是否感到审…...

2026/5/11 1:59:34 阅读更多 →

【四川电影电视学院主办 | AP出版，高录用快见刊，最快刊后1个月内上知网谷歌学术 | 主题不设限，教育、艺术、语言等人文社科主题均可】第五届科学教育与艺术鉴赏国际学术会议（SEAA 2026）

高录用快见刊，会议快见刊，最快刊后1个月内上知网&谷歌学术主题不设限，教育、艺术、语言等人文社科主题均可第五届科学教育与艺术鉴赏国际学术会议（SEAA 2026） 2026 5th International Conference on Science …...

2026/5/10 0:00:31 阅读更多 →

【斯普林格Springer 旗下的Atlantis Press出版社出版 | EI Compendex、Scopus、谷歌学术】第五届区块链、信息技术与智慧经济国际学术会议（ICBIS 2026）

第五届区块链、信息技术与智慧经济国际学术会议（ICBIS 2026） The 5th International Conference on Blockchain, Information Technology and Smart Finance 2026年6月19日 -21日，中国-上海大会官网：www.ic-bis.net【论文投…...

2026/5/10 0:00:33 阅读更多 →