开始今天的学习-走起:)
1. 导入用到的所有包
import os, sys, glob, shutil, json
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
import cv2
from PIL import Image
import numpy as np
from tqdm import tqdm, tqdm_notebook
import torch
torch.manual_seed(0)
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.benchmark = True
import torchvision.models as models
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
from torch.utils.data.dataset import Dataset
如果这里出现错误,参考下面修改方法:
Q: No module named 'cv2'
A: pip install jupyter tqdm opencv-python matplotlib pandas
Q: libSM.so.6: cannot open shared object file: No such file or directory
A: apt update && apt install -y libsm6 libxext6
Q: ibXrender.so.1: cannot open shared object file: No such file or directory
A: apt-get install libxrender1
2. 针对标题设定,先看数据读取
- 图像处理方面:Pillow(易/简单)和OpenCV(难/复杂)
2.1 Pillow走起
2.1.1 先读个小猫(人见人爱)的图像,代码如下:
# 读取图片
im =Image.open('./cat.jpg')
2.1.2 进一步,想用个应用模糊滤镜(蓝色的)
from PIL import Image, ImageFilter
im = Image.open('./cat.png')
# 应用模糊滤镜
im2 = im.filter(ImageFilter.BLUR)
im2.save('blur.jpg', 'jpeg')
2.1.3 更有常有的缩小
#注意定义下w,h
w = 150
h = 200
# 打开一个jpg图像文件,注意是当前路径
im = Image.open('./cat.jpg')
im.thumbnail((w//2, h//2))
im.save('thumbnail.jpg', 'jpeg')
小结(pillow)
上面只是小试牛刀,想用更好的请看官方网站:
https://pillow.readthedocs.io/en/stable/
2.2 OpenCV
- 由Intel开源得来
- 跨平台的计算机视觉库
- 比Pillow更加强大
- 学习成本也高
2.2.1 以同样的小猫为例(变蓝了!):
img = cv2.imread('./cat.jpg')
# Opencv默认颜色通道顺序是BRG,转换一下
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
cv2.imwrite('cv2.jpg', img)
2.2.2 把小猫变灰
img = cv2.imread('./cat.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imwrite('cv2.jpg', img)
2.2.3 Canny边缘检测(这图就是简笔画)
edges = cv2.Canny(img, 30, 70)
cv2.imwrite('canny.jpg', edges)
2.2.4 二值化
import matplotlib.pyplot as plt
img = cv2.imread('cat.jpg',0) #直接读为灰度图像
ret,thresh1 = cv2.threshold(img,127,255,cv2.THRESH_BINARY)
ret,thresh2 = cv2.threshold(img,127,255,cv2.THRESH_BINARY_INV)
ret,thresh3 = cv2.threshold(img,127,255,cv2.THRESH_TRUNC)
ret,thresh4 = cv2.threshold(img,127,255,cv2.THRESH_TOZERO)
ret,thresh5 = cv2.threshold(img,127,255,cv2.THRESH_TOZERO_INV)
titles = ['img','BINARY','BINARY_INV','TRUNC','TOZERO','TOZERO_INV']
images = [img,thresh1,thresh2,thresh3,thresh4,thresh5]
for i in range(6):
plt.subplot(2,3,i+1),plt.imshow(images[i],'gray')
plt.title(titles[i])
plt.xticks([]),plt.yticks([])
plt.show()
小结
OpenCV包含了众多的图像处理的功能,OpenCV包含了你能想得到的只要与图像相关的操作。此外OpenCV还内置了很多的图像特征处理算法,如关键点检测、边缘检测和直线检测等。
OpenCV官网:https://opencv.org/
OpenCV Github:https://github.com/opencv/opencv
OpenCV 扩展算法库:https://github.com/opencv/opencv_contrib
2.3 数据扩增方法
学完上面Pillow和OpenCV的使用,转回赛题街道字符识别任务中。
需要两步:数据读取,数据扩增(Data Augmentation)操作
2.3.1 数据扩增介绍
- 好处
- 增加训练集的样本
- 有效缓解模型过拟合
- 给模型更强的泛化能力
- 数据扩增方法
- 颜色空间
- 尺寸空间
- 样本空间
对于图像分类,数据扩增一般不会改变标签;对于物体检测,数据扩增会改变物体坐标位置;对于图像分割,数据扩增会改变像素标签。
2.3.2 常见的数据扩增方法
从图像颜色、尺寸、形态、空间和像素等角度进行变换。
以torchvision为例,常见的数据扩增方法包括(小猫):
from torchvision import transforms
from PIL import Image
from torchvision.transforms import functional as TF
import torch
path = "cat.jpg"
img = Image.open(path)
- transforms.CenterCrop 对图片中心进行裁剪
size = (300, 500)
transform = transforms.Compose([
transforms.CenterCrop(size),
])
new_img = transform(img)
new_img
- transforms.ColorJitter 对图像颜色的对比度、饱和度和零度进行变换
transform = transforms.Compose([
transforms.ColorJitter(brightness=(0, 16), contrast=(
0, 10), saturation=(0, 25), hue=(-0.5, 0.5))
])
new_img = transform(img)
new_img
- transforms.FiveCrop 对图像四个角和中心进行裁剪得到五分图像
UNIT_SIZE = 200 # 每张图片的宽度是固定的
size = (100, UNIT_SIZE)
transform = transforms.Compose([
transforms.FiveCrop(size)
])
new_img = transform(img)
delta = 20 # 偏移量,几个图片间隔看起来比较明显
new_img_2 = Image.new("RGB", (UNIT_SIZE*5+delta, 100))
top_right = 0
for im in new_img:
new_img_2.paste(im, (top_right, 0)) # 将image复制到target的指定位置中
top_right += UNIT_SIZE + int(delta/5) # 左上角的坐标,因为是横向的图片,所以只需要 x 轴的值变化就行
new_img_2
- transforms.Grayscale 对图像进行灰度变换
my_trans = transforms.Grayscale(num_output_channels=1)
new_img = my_trans(img)
new_img
- transforms.Pad 使用固定值进行像素填充
from torchvision import transforms
from PIL import Image
padding_img = transforms.Pad(padding=50, fill=10)
img = Image.open('cat.jpg')
print(type(img))
print(img.size)
padded_img=padding_img(img)
print(type(padded_img))
print(padded_img.size)
plt.imshow(padded_img)
<class 'PIL.PngImagePlugin.PngImageFile'>
(500, 375)
<class 'PIL.Image.Image'>
(600, 475)
<matplotlib.image.AxesImage at 0x7fcc8127e080>
- transforms.RandomAffine 随机仿射变换
my_trans = transforms.RandomAffine(degrees=30, translate=None, scale=None,
shear=None, resample=False, fillcolor=0)
new_img = my_trans(img)
new_img
- transforms.RandomCrop 随机区域裁剪
my_trans = transforms.RandomCrop(size, padding=None,
pad_if_needed=False, fill=0, padding_mode='constant')
new_img = my_trans(img)
new_img
- transforms.RandomHorizontalFlip 随机水平翻转
my_trans = transforms.RandomHorizontalFlip(p=0.8)
new_img = my_trans(img)
new_img
- transforms.RandomRotation 随机旋转
my_trans = transforms.RandomRotation(degrees=90, resample=False,expand=False, center=None)
new_img = my_trans(img)
new_img
- transforms.RandomVerticalFlip 随机垂直翻转
my_trans = transforms.RandomVerticalFlip(p=0.5)
new_img = my_trans(img)
new_img
2.3.3 常用的数据扩增库
- torchvision
https://github.com/pytorch/vision
pytorch官方提供的数据扩增库,提供了基本的数据数据扩增方法,可以无缝与torch进行集成;但数据扩增方法种类较少,且速度中等;
imgaug是常用的第三方数据扩增库,提供了多样的数据扩增方法,且组合起来非常方便,速度较快;
- albumentations
https://albumentations.readthedocs.io/
是常用的第三方数据扩增库,提供了多样的数据扩增方法,对图像分类、语义分割、物体检测和关键点检测都支持,速度较快。
2.4 Pytorch读取数据
- Pytorch读取赛题数据
- 通过Dataset进行封装
- 通过DataLoder进行并行读取
import os, sys, glob, shutil, json
import cv2
from PIL import Image
import numpy as np
import torch
from torch.utils.data.dataset import Dataset
import torchvision.transforms as transforms
class SVHNDataset(Dataset):
def __init__(self, img_path, img_label, transform=None):
self.img_path = img_path
self.img_label = img_label
if transform is not None:
self.transform = transform
else:
self.transform = None
def __getitem__(self, index):
img = Image.open(self.img_path[index]).convert('RGB')
if self.transform is not None:
img = self.transform(img)
# 原始SVHN中类别10为数字0
lbl = np.array(self.img_label[index], dtype=np.int)
lbl = list(lbl) + (5 - len(lbl)) * [10]
return img, torch.from_numpy(np.array(lbl[:5]))
def __len__(self):
return len(self.img_path)
train_path = glob.glob('../input/train/*.png')
train_path.sort()
train_json = json.load(open('../input/train.json'))
train_label = [train_json[x]['label'] for x in train_json]
data = SVHNDataset(train_path, train_label,
transforms.Compose([
# 缩放到固定尺寸
transforms.Resize((64, 128)),
# 随机颜色变换
transforms.ColorJitter(0.2, 0.2, 0.2),
# 加入随机旋转
transforms.RandomRotation(5),
# 将图片转换为pytorch 的tesntor
# transforms.ToTensor(),
# 对图像像素进行归一化
# transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225])
]))
通过上述代码,可以将赛题的图像数据和对应标签进行读取,在读取过程中的进行数据扩增,效果如下所示:
1 | 2 | 3 |
---|---|---|
接下来我们将在定义好的Dataset基础上构建DataLoder
- Dataset:对数据集的封装,提供索引方式的对数据样本进行读取
- DataLoder:对Dataset进行封装,提供批量读取的迭代读取
加入DataLoder后,数据读取代码改为如下:
import os, sys, glob, shutil, json
import cv2
from PIL import Image
import numpy as np
import torch
from torch.utils.data.dataset import Dataset
import torchvision.transforms as transforms
class SVHNDataset(Dataset):
def __init__(self, img_path, img_label, transform=None):
self.img_path = img_path
self.img_label = img_label
if transform is not None:
self.transform = transform
else:
self.transform = None
def __getitem__(self, index):
img = Image.open(self.img_path[index]).convert('RGB')
if self.transform is not None:
img = self.transform(img)
# 原始SVHN中类别10为数字0
lbl = np.array(self.img_label[index], dtype=np.int)
lbl = list(lbl) + (5 - len(lbl)) * [10]
return img, torch.from_numpy(np.array(lbl[:5]))
def __len__(self):
return len(self.img_path)
train_path = glob.glob('../input/train/*.png')
train_path.sort()
train_json = json.load(open('../input/train.json'))
train_label = [train_json[x]['label'] for x in train_json]
train_loader = torch.utils.data.DataLoader(
SVHNDataset(train_path, train_label,
transforms.Compose([
transforms.Resize((64, 128)),
transforms.ColorJitter(0.3, 0.3, 0.2),
transforms.RandomRotation(5),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])),
batch_size=10, # 每批样本个数
shuffle=False, # 是否打乱顺序
num_workers=10, # 读取的线程个数
)
for data in train_loader:
break
在加入DataLoder后,数据按照批次获取,每批次调用Dataset读取单个样本进行拼接。此时data的格式为:
torch.Size([10, 3, 64, 128]), torch.Size([10, 6])
前者为图像文件,为batchsize * chanel * height * width次序;后者为字符标签。
2.5 小节
先讲解数据读取,再讲解数据扩增及使用;最后再上Pytorch框架对数据读取的代码。
这一步步掰开了揉碎了,庖丁解牛式的学习真是过瘾,受用了多谢。