首页分享 YOLO数据集划分教程：如何划分训练、验证和测试集

YOLO数据集划分教程：如何划分训练、验证和测试集

来源：萌宠菠菠乐园时间：2024-10-09 16:28

YOLO数据集划分教程：如何划分训练、验证和测试集

关于作者

作者：小白熊

作者简介：精通python、matlab、c#语言，擅长机器学习，深度学习，机器视觉，目标检测，图像分类，姿态识别，语义分割，路径规划，智能优化算法，数据分析，各类创新融合等等。

科研辅导、知识付费答疑、个性化定制以及其他合作需求请联系作者~

前言

在目标检测任务中，YOLO是一种非常流行的检测模型。训练YOLO模型时，数据集通常需要划分为训练集、验证集和测试集，以评估模型的性能。本文将介绍如何使用Python进行数据集的划分，并将图像和标签文件按相应比例划分到不同文件夹中。

具体步骤

假设我们有一组标注好的YOLO数据集，包含图像文件（如jpg格式）和对应的标签文件（txt格式）。我们希望将这些文件按一定比例划分为训练集（train）、验证集（val）和测试集（test）。具体步骤如下：

第一步：导入环境

import os import shutil from sklearn.model_selection import train_test_split 123os：用于操作系统文件路径和目录。shutil：用于复制文件。train_test_split：用于划分数据集，随机将数据分配到不同子集中。第二步：设置参数

在进行数据集划分前，我们需要定义一些重要参数：

val_size：验证集的比例；test_size：测试集的比例；postfix：图像文件的后缀（例如jpg）；imgpath：图像文件所在的目录；txtpath：标签文件所在的目录；new_imgpath：划分后的图像文件保存路径；new_txtpath：划分后的标签文件保存路径。

val_size = 0.1 test_size = 0.2 postfix = 'jpg' imgpath = './data1/images' txtpath = './data1/labels' new_imgpath = './data/images' new_txtpath = './data/labels' 12345678

在这个例子中，验证集占比10%，测试集占比20%。

第三步：创建目标目录

为了保证划分后的文件能正确存储，我们需要预先创建相应的目录。这里将会为训练集、验证集和测试集分别创建独立的文件夹。

os.makedirs(os.path.join(new_imgpath, 'train'), exist_ok=True) os.makedirs(os.path.join(new_imgpath, 'val'), exist_ok=True) os.makedirs(os.path.join(new_imgpath, 'test'), exist_ok=True) os.makedirs(os.path.join(new_txtpath, 'train'), exist_ok=True) os.makedirs(os.path.join(new_txtpath, 'val'), exist_ok=True) os.makedirs(os.path.join(new_txtpath, 'test'), exist_ok=True) 123456

exist_ok=True保证如果文件夹已经存在，不会抛出错误。

第四步：划分数据集

通过遍历标签文件目录中的所有txt文件，我们将标签文件和对应的图像文件分成训练集、验证集和测试集。具体的操作使用train_test_split函数。

listdir = [i for i in os.listdir(txtpath) if 'txt' in i] train, test = train_test_split(listdir, test_size=test_size, shuffle=True, random_state=0) train, val = train_test_split(train, test_size=val_size, shuffle=True, random_state=0) print(f'train set size:{len(train)} val set size:{len(val)} test set size:{len(test)}') 12345train_test_split：随机将数据集按比例划分。shuffle=True：打乱数据顺序，以保证划分的随机性。random_state=0：保证每次运行代码时划分结果相同，方便调试和复现。第五步：复制文件

接下来，我们将根据划分结果，把对应的图像和标签文件复制到相应的文件夹中。首先是训练集：

for i in train: try: shutil.copy('{}/{}.{}'.format(imgpath, i[:-4], postfix), os.path.join(new_imgpath, 'train/{}.{}'.format(i[:-4], postfix))) shutil.copy('{}/{}'.format(txtpath, i), os.path.join(new_txtpath, 'train/{}'.format(i))) except Exception as e: print(e) 1234567shutil.copy：用于将文件从一个目录复制到另一个目录。i[:-4]：去掉文件名的后缀（txt），以获取对应的图像文件名。

同样的方式，我们也可以复制验证集和测试集的文件：

for i in val: try: shutil.copy('{}/{}.{}'.format(imgpath, i[:-4], postfix), os.path.join(new_imgpath, 'val/{}.{}'.format(i[:-4], postfix))) shutil.copy('{}/{}'.format(txtpath, i), os.path.join(new_txtpath, 'val/{}'.format(i))) except Exception as e: print(e) for i in test: try: shutil.copy('{}/{}.{}'.format(imgpath, i[:-4], postfix), os.path.join(new_imgpath, 'test/{}.{}'.format(i[:-4], postfix))) shutil.copy('{}/{}'.format(txtpath, i), os.path.join(new_txtpath, 'test/{}'.format(i))) except Exception as e: print(e) 123456789101112131415

完整代码

import os, shutil from sklearn.model_selection import train_test_split val_size = 0.1 test_size = 0.2 postfix = 'jpg' imgpath = './data1/images' txtpath = './data1/labels' new_imgpath = './data/images' new_txtpath = './data/labels' os.makedirs(os.path.join(new_imgpath, 'train'), exist_ok=True) os.makedirs(os.path.join(new_imgpath, 'val'), exist_ok=True) os.makedirs(os.path.join(new_imgpath, 'test'), exist_ok=True) os.makedirs(os.path.join(new_txtpath, 'train'), exist_ok=True) os.makedirs(os.path.join(new_txtpath, 'val'), exist_ok=True) os.makedirs(os.path.join(new_txtpath, 'test'), exist_ok=True) listdir = [i for i in os.listdir(txtpath) if 'txt' in i] train, test = train_test_split(listdir, test_size=test_size, shuffle=True, random_state=0) train, val = train_test_split(train, test_size=val_size, shuffle=True, random_state=0) print(f'train set size:{len(train)} val set size:{len(val)} test set size:{len(test)}') for i in train: try: shutil.copy('{}/{}.{}'.format(imgpath, i[:-4], postfix), os.path.join(new_imgpath, 'train/{}.{}'.format(i[:-4], postfix))) shutil.copy('{}/{}'.format(txtpath, i), os.path.join(new_txtpath, 'train/{}'.format(i))) except Exception as e: print(e) for i in val: try: shutil.copy('{}/{}.{}'.format(imgpath, i[:-4], postfix), os.path.join(new_imgpath, 'val/{}.{}'.format(i[:-4], postfix))) shutil.copy('{}/{}'.format(txtpath, i), os.path.join(new_txtpath, 'val/{}'.format(i))) except Exception as e: print(e) for i in test: try: shutil.copy('{}/{}.{}'.format(imgpath, i[:-4], postfix), os.path.join(new_imgpath, 'test/{}.{}'.format(i[:-4], postfix))) shutil.copy('{}/{}'.format(txtpath, i), os.path.join(new_txtpath, 'test/{}'.format(i))) except Exception as e: print(e)

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647

结束语

通过本文的介绍，我们学习了如何对YOLO数据集进行划分，并将图像和标签文件按训练集、验证集和测试集进行存储。这样可以帮助我们在训练模型时进行更好的数据管理和模型评估。合理的划分比例有助于模型的泛化能力，同时也能确保模型在验证和测试阶段的表现更加可靠。

希望这篇教程能为你提供有价值的参考。如果有任何疑问或建议，欢迎在评论区留言！

YOLO数据集划分教程：如何划分训练、验证和测试集