保姆级教程：用Python脚本搞定VisDrone和CARPK数据集，为YOLOv5训练做好数据准备

张

张建站

2026/4/27 11:18:52

10分钟阅读

保姆级教程：用Python脚本搞定VisDrone和CARPK数据集，为YOLOv5训练做好数据准备

从零开始处理无人机视觉数据集VisDrone与CARPK的YOLOv5实战指南无人机视觉数据因其独特的视角和丰富的场景信息正成为计算机视觉研究的热点。VisDrone和CARPK作为两大主流无人机数据集为开发者提供了宝贵的训练资源。本文将带您深入理解这两个数据集的特点并手把手教您完成从原始数据到YOLOv5训练就绪格式的全流程转换。1. 数据集认知与准备工作在开始处理数据前我们需要对VisDrone和CARPK数据集有基本了解VisDrone数据集包含10个类别行人、人群、自行车、汽车、面包车、卡车、三轮车、带篷三轮车、公交车、摩托车标注格式为[x_min,y_min,width,height,score,class_id,truncation,occlusion]CARPK数据集专注于停车场车辆检测仅包含汽车单一类别标注格式为[x_min,y_min,x_max,y_max,class_id]提示建议在开始前创建以下目录结构便于项目管理dataset_preparation/ ├── raw_data/ │ ├── VisDrone/ │ └── CARPK/ ├── processed/ │ ├── images/ │ └── labels/ └── scripts/安装必要的Python库pip install opencv-python numpy tqdm Pillow2. VisDrone数据集深度处理2.1 格式转换与坐标归一化VisDrone使用绝对坐标标注而YOLO需要相对坐标。以下是核心转换代码from pathlib import Path from PIL import Image from tqdm import tqdm import os def convert_visdrone_to_yolo(anno_path, img_dir, output_dir): 将VisDrone标注转换为YOLO格式 os.makedirs(output_dir, exist_okTrue) for anno_file in tqdm(list(anno_path.glob(*.txt))): img_file img_dir / anno_file.name.replace(.txt, .jpg) img Image.open(img_file) img_width, img_height img.size with open(anno_file) as f: lines [line.strip().split(,) for line in f if line.strip()] yolo_lines [] for line in lines: if line[4] 0: # 忽略区域 continue class_id int(line[5]) - 1 # VisDrone类别ID从1开始 x_min, y_min int(line[0]), int(line[1]) width, height int(line[2]), int(line[3]) # 转换为YOLO格式 x_center (x_min width / 2) / img_width y_center (y_min height / 2) / img_height norm_width width / img_width norm_height height / img_height yolo_lines.append(f{class_id} {x_center:.6f} {y_center:.6f} {norm_width:.6f} {norm_height:.6f}\n) output_file output_dir / anno_file.name with open(output_file, w) as f: f.writelines(yolo_lines)2.2 类别合并与过滤策略VisDrone原始类别较多实际应用中常需要合并简化# 类别映射关系 CLASS_MAPPING { # 车辆类合并 3: 0, # car → 0 4: 0, # van → 0 5: 0, # truck → 0 8: 0, # bus → 0 # 人类合并 0: 1, # pedestrian → 1 1: 1, # people → 1 } def filter_and_merge_classes(input_dir, output_dir): 过滤并合并类别 os.makedirs(output_dir, exist_okTrue) for label_file in tqdm(os.listdir(input_dir)): with open(os.path.join(input_dir, label_file)) as f: lines [line.strip().split() for line in f if line.strip()] filtered_lines [] for line in lines: original_class int(line[0]) if original_class in CLASS_MAPPING: new_class CLASS_MAPPING[original_class] filtered_lines.append(f{new_class} { .join(line[1:])}\n) output_file os.path.join(output_dir, label_file) with open(output_file, w) as f: f.writelines(filtered_lines)3. CARPK数据集处理实战CARPK数据集处理相对简单但需要注意坐标转换def convert_carpk_to_yolo(anno_dir, output_dir, img_width1280, img_height720): CARPK数据集转换 os.makedirs(output_dir, exist_okTrue) for anno_file in tqdm(os.listdir(anno_dir)): with open(os.path.join(anno_dir, anno_file)) as f: lines [line.strip().split() for line in f if line.strip()] yolo_lines [] for line in lines: x_min, y_min, x_max, y_max map(int, line[:4]) # 转换为YOLO格式 x_center ((x_min x_max) / 2) / img_width y_center ((y_min y_max) / 2) / img_height width (x_max - x_min) / img_width height (y_max - y_min) / img_height yolo_lines.append(f0 {x_center:.6f} {y_center:.6f} {width:.6f} {height:.6f}\n) output_file os.path.join(output_dir, anno_file) with open(output_file, w) as f: f.writelines(yolo_lines)4. 可视化验证与质量检查处理后的数据需要验证确保转换正确import cv2 import random def visualize_yolo_labels(img_dir, label_dir, output_dir, class_names): 可视化YOLO标注 os.makedirs(output_dir, exist_okTrue) colors [(random.randint(0,255), random.randint(0,255), random.randint(0,255)) for _ in range(len(class_names))] for img_file in tqdm(os.listdir(img_dir)): if not img_file.lower().endswith((.jpg, .png)): continue img_path os.path.join(img_dir, img_file) label_path os.path.join(label_dir, os.path.splitext(img_file)[0] .txt) img cv2.imread(img_path) if img is None: continue height, width img.shape[:2] if os.path.exists(label_path): with open(label_path) as f: lines [line.strip().split() for line in f if line.strip()] for line in lines: class_id, x_center, y_center, w, h map(float, line) class_id int(class_id) # 转换为绝对坐标 x_center * width y_center * height w * width h * height x_min int(x_center - w/2) y_min int(y_center - h/2) x_max int(x_center w/2) y_max int(y_center h/2) # 绘制边界框和标签 cv2.rectangle(img, (x_min, y_min), (x_max, y_max), colors[class_id], 2) cv2.putText(img, class_names[class_id], (x_min, y_min-5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, colors[class_id], 2) output_path os.path.join(output_dir, img_file) cv2.imwrite(output_path, img)使用示例class_names [vehicle, person] visualize_yolo_labels(processed/images, processed/labels, visualization, class_names)5. 常见问题与解决方案在实际处理过程中可能会遇到以下典型问题问题现象可能原因解决方案转换后标注框位置偏移图像尺寸不匹配确保使用实际图像尺寸而非预设值类别ID混乱原始数据集类别索引理解错误仔细检查数据集文档确认类别编号规则可视化时边界框异常坐标归一化错误检查归一化计算是否在[0,1]范围内处理速度慢单线程处理大文件使用多进程或优化IO操作性能优化技巧使用多进程加速处理from multiprocessing import Pool def process_single_file(args): 包装函数用于多进程 file, input_dir, output_dir args # 处理单个文件... if __name__ __main__: files [(f, input_dir, output_dir) for f in os.listdir(input_dir)] with Pool(processes4) as pool: pool.map(process_single_file, files)内存优化对于超大图像使用流式处理而非一次性加载处理无人机数据集时特别要注意高空视角带来的特性目标尺寸变化大近处大、远处小遮挡情况复杂光照条件多变在实际项目中我们通常会保留约10%的原始数据不做过滤用于后续可能的模型调优。当处理CARPK数据集时由于所有目标都是车辆可以适当增加数据增强策略如旋转和尺度变换来提升模型鲁棒性。

定期更新文娱活动，丰富晚年精神生活—智慧养老系统活动管理模块

文娱活动是提升老人晚年生活质量、丰富精神文化生活的重要载体，也是智慧养老服务的重要组成部分。智慧养老系统活动管理模块，聚焦入住老人精神需求，以定期更新文娱活动、精准推送活动通知核心，构建规范化、多元化的活动管理体系&a…...

2026/4/27 11:18:01 阅读更多 →

Java 基础 - 中级 - 高级”面试集结，2026最新版

前言2026 面试跳槽不迷茫，大家可以先收藏再看，后续跳槽都能用上的！Java 程序员绝大部分工作的时间都是增删改查，很多人觉得这项工作没什么技术含量，任何一件事情都要站在不同的角度去考虑，对于大部分的 jav…...

2026/4/27 11:17:56 阅读更多 →

RVC语音转换实战指南：16个核心问题完整解决方案

RVC语音转换实战指南：16个核心问题完整解决方案【免费下载链接】Retrieval-based-Voice-Conversion-WebUI Easily train a good VC model with voice data < 10 mins! 项目地址: https://gitcode.com/GitHub_Trending/re/Retrieval-based-Voice-Conversion-We…...

2026/4/27 11:16:58 阅读更多 →

Arm Cortex-A520AE核心架构与优化实战解析

1. Arm Cortex-A520AE核心架构深度解析在汽车电子和工业控制领域，处理器的高效性与可靠性同样重要。Cortex-A520AE作为Armv9.2-A架构下的安全增强型核心，采用独特的双发射流水线设计，在保持低功耗的同时实现了可预测的实时性能。我曾参与过基…...

2026/4/26 0:01:51 阅读更多 →

015、使用AutoGen框架搭建多Agent对话系统

015、使用AutoGen框架搭建多Agent对话系统告别单打独斗，让多个智能体通过协作与对话，共同解决复杂任务。前言在上一篇《多Agent系统入门：协作与竞争的基础模型》中，我们探讨了多智能体系统的核心概念、基础架构以及简单的协作模式。你可能已经意识到，手动协调多个Agen…...

2026/4/26 0:05:24 阅读更多 →

大模型量化实战评测：GPTQ、GGUF、AWQ 在显存、速度与精度上的真实表现

1. 大模型量化技术入门：为什么我们需要量化？ 如果你尝试在消费级显卡上运行大语言模型，大概率会遇到显存不足的报错。比如用16GB显存的RTX 4080直接加载Qwen1.5-7B模型时，系统会无情地提示"CUDA out of memory"。这就是…...

2026/4/26 0:05:42 阅读更多 →

Display Driver Uninstaller终极指南：彻底清理显卡驱动的专业工具

Display Driver Uninstaller终极指南：彻底清理显卡驱动的专业工具【免费下载链接】display-drivers-uninstaller Display Driver Uninstaller (DDU) a driver removal utility / cleaner utility 项目地址: https://gitcode.com/gh_mirrors/di/display-drivers-u…...

2026/4/26 0:08:05 阅读更多 →