深度学习笔记:三维图片分类与三维卷积神经网络

简介

做为机器学习领域里的“Hello world”,MNIST 手写数字图片数据集,是许多人研初学机器学习时都接触过的数据集。近期,为了研究深度学习在时空序列数据方面的应用,我想要了解三维卷积神经网络。在入门阶段,我接触到了三维的 MNIST 数据集,并且根据国外研究者给出示例代码来理解了三维卷积神经网络的基本结构。

数据集:3D MNIST

2D vs 3D MNIST

3D MNIST 的 Kaggle 地址是 3D MNIST
相关数据的储存格式是.h5格式,数据集分割成了一下的数组:

X_train (10000, 4096)
y_train (10000)
X_test(2000, 4096)
y_test (2000)

训练集10000张图片,测试集2000张图片,每张图片被拉平成了4096维度的向量(长16X宽16X高16=4096)。

读取数据集的示例代码:

with h5py.File("../input/train_point_clouds.h5", "r") as hf:    
     X_train = hf["X_train"][:]
     y_train = hf["y_train"][:]    
     X_test = hf["X_test"][:]  
     y_test = hf["y_test"][:]  

既然数据集是三维的,那么,在识别图片所属数字的任务中,使用三维的卷积神经网络,是否比二维的卷积神经网络表现更佳呢?我们来实验一次。

二维卷积神经网络

本次试验,使用的是 Keras 框架,首先,载入所需模块。

from __future__ import division, print_function, absolute_import

from keras.models import Sequential, model_from_json
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D, BatchNormalization
from keras.optimizers import RMSprop
from keras.preprocessing.image import ImageDataGenerator
from keras.utils.np_utils import to_categorical
from keras.callbacks import ReduceLROnPlateau, TensorBoard

import h5py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')

from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.model_selection import train_test_split

设置超参数

# set up hyperparameter
batch_size = 64
epochs = 20

在本地读取数据集

with h5py.File("/Users/apple/pydata/3d_mnist/full_dataset_vectors.h5","r") as h5:
    X_train, y_train = h5["X_train"][:], h5["y_train"][:]
    X_test, y_test = h5["X_test"][:], h5["y_test"][:]

验证集所用的图片标签转化为One-Hot的数组

y_train = to_categorical(y_train, num_classes=10)

这一次用的是二维的卷积神经网络,需要一个3D的矩阵,因此,没有添加RGB 彩色通道。

X_train = X_train.reshape(-1, 16, 16, 16)
X_test = X_test.reshape(-1, 16, 16, 16)
X_train,X_val,y_train,y_val = train_test_split(X_train, y_train,
                                              test_size=0.25,
                                              random_state=42)

定义二维卷积层

# Conv2D layer
def Conv(filters=16, kernel_size=(3,3), activation='relu', input_shape=None):
    if input_shape:
        return Conv2D(filters=filters, kernel_size = kernel_size, padding='Same'
                      , activation=activation, input_shape=input_shape)
    else:
        return Conv2D(filters=filters, kernel_size = kernel_size, padding='Same'
                      , activation=activation)

定义模型架构

# Define model
def CNN(input_dim, num_classes):
    model = Sequential()
    
    model.add((Conv(8, (3,3), input_shape=input_dim)))
    model.add((Conv(16,(3,3))))
    # model.add(BatchNormalization())
    model.add(MaxPool2D(pool_size=(2,2)))
    model.add(Dropout(0.25))
    
    model.add(Conv(32,(3,3)))
    model.add(Conv(64, (3,3)))
    model.add(BatchNormalization())
    model.add(MaxPool2D())
    model.add(Dropout(0.25))
    
    model.add(Flatten())
    
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))
    
    model.add(Dense(1024, activation='relu'))
    model.add(Dropout(0.5))
    
    model.add(Dense(num_classes, activation='softmax'))
    
    return model

定义训练参数,验证方法,保存模型以及加载模型

# Train Model

def train(optimizer, scheduler, gen):
    global model
    
    print("Training...Please wait")
    model.compile(optimizer='adam', loss = "categorical_crossentropy", metrics=["accuracy"])
    
    model.fit_generator(gen.flow(X_train, y_train, batch_size=batch_size),
                    epochs=epochs, validation_data=(X_val, y_val),
                    verbose=2, steps_per_epoch=X_train.shape[0]//batch_size,
                    callbacks=[scheduler, tensorboard])

def evaluate():
    global model
    
    pred = model.predict(X_test)
    pred = np.argmax(pred, axis=1)
    
    print(accuracy_score(pred, y_test))
    
    # Heat map
    
    array = confusion_matrix(y_test, pred)
    cm = pd.DataFrame(array, index = range(10), columns = range(10))
    plt.figure(figsize=(20,20))
    sns.heatmap(cm, annot=True)
    plt.show()
def save_model():
    global model
    
    model_json = model.to_json()
    with open('/Users/apple/pydata/3d_mnist/model/model_2D.json','w') as f:
        f.write(model_json)
        
    model.save_weights('/Users/apple/pydata/3d_mnist/model/model_2D.h5')
    
    print("Model Saved")

def load_model():
    f = open("/Users/apple/pydata/3d_mnist/model/model_2D.json","r")
    model_json = f.read()
    f.close()
    
    loaded_model = model_from_json(model_json)
    loaded_model.load_weights('/Users/apple/pydata/3d_mnist/model/model_2D.h5')
    
    print("Model Loaded.")
    
    return loaded_model

if __name__ == '__main__':

    optimizer = RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
    scheduler = ReduceLROnPlateau(monitor='val_acc', patience=3, verbose=1, factor=0.5, min_lr=1e-5)

    model = CNN((16,16,16), 10)

    gen = ImageDataGenerator(rotation_range=10, zoom_range = 0.1, width_shift_range=0.1, height_shift_range=0.1)
    gen.fit(X_train)

    train(optimizer, scheduler, gen)
    evaluate()
    save_model()

二维卷积神经网络结果:最高准确率68.5%

Training...Please wait
Epoch 1/20
 - 40s - loss: 2.2051 - acc: 0.2574 - val_loss: 1.4624 - val_acc: 0.4936
Epoch 2/20
 - 42s - loss: 1.4804 - acc: 0.4842 - val_loss: 1.2500 - val_acc: 0.5528
Epoch 3/20
 - 33s - loss: 1.3187 - acc: 0.5341 - val_loss: 1.2400 - val_acc: 0.5648
Epoch 4/20
 - 31s - loss: 1.2488 - acc: 0.5604 - val_loss: 1.0896 - val_acc: 0.6132
Epoch 5/20
 - 31s - loss: 1.2123 - acc: 0.5740 - val_loss: 1.1378 - val_acc: 0.5868
Epoch 6/20
 - 31s - loss: 1.1782 - acc: 0.5833 - val_loss: 1.0483 - val_acc: 0.6284
Epoch 7/20
 - 31s - loss: 1.1431 - acc: 0.5967 - val_loss: 1.0335 - val_acc: 0.6328
Epoch 8/20
 - 31s - loss: 1.1129 - acc: 0.6054 - val_loss: 1.0082 - val_acc: 0.6412
Epoch 9/20
 - 30s - loss: 1.1071 - acc: 0.6059 - val_loss: 1.0608 - val_acc: 0.6224
Epoch 10/20
 - 31s - loss: 1.0878 - acc: 0.6127 - val_loss: 0.9602 - val_acc: 0.6580
Epoch 11/20
 - 31s - loss: 1.0756 - acc: 0.6169 - val_loss: 1.0182 - val_acc: 0.6424
Epoch 12/20
 - 31s - loss: 1.0649 - acc: 0.6221 - val_loss: 0.9905 - val_acc: 0.6560
Epoch 13/20
 - 30s - loss: 1.0508 - acc: 0.6321 - val_loss: 0.9642 - val_acc: 0.6628
Epoch 14/20
 - 32s - loss: 1.0567 - acc: 0.6289 - val_loss: 0.9452 - val_acc: 0.6696
Epoch 15/20
 - 35s - loss: 1.0271 - acc: 0.6346 - val_loss: 0.9287 - val_acc: 0.6748
Epoch 16/20
 - 36s - loss: 1.0169 - acc: 0.6386 - val_loss: 0.9542 - val_acc: 0.6668
Epoch 17/20
 - 38s - loss: 0.9975 - acc: 0.6456 - val_loss: 0.9509 - val_acc: 0.6656
Epoch 18/20
 - 35s - loss: 1.0139 - acc: 0.6456 - val_loss: 0.9452 - val_acc: 0.6716

Epoch 00018: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 19/20
 - 36s - loss: 0.9616 - acc: 0.6586 - val_loss: 0.9114 - val_acc: 0.6856
Epoch 20/20
 - 31s - loss: 0.9359 - acc: 0.6652 - val_loss: 0.9137 - val_acc: 0.6832
0.6845

混淆矩阵 Confusion Matrix

image.png

Keras 的三维卷积神经网络

相对于常见的二维卷积,三维卷积的资料较少。下面是一个三维卷积的示例图:


3D CNN

三维卷积是一个三维的滤波器,它从三个维度(x,y,z)来计算低维的特征表示,输出是一个三维的卷积空间。它在视频的事件检测,三维医学影像图片等非常有用。当然,它的使用,不仅局限于三维空间,也可应用于二维的输入,比如图片等。

下面是代码实施部分:

首先,载入所需模块


from __future__ import division, print_function, absolute_import

from keras.models import Sequential, model_from_json
from keras.layers import Dense, Dropout, Flatten, Conv3D, MaxPool3D, BatchNormalization, Input
from keras.optimizers import RMSprop
from keras.preprocessing.image import ImageDataGenerator
from keras.utils.np_utils import to_categorical
from keras.callbacks import ReduceLROnPlateau, TensorBoard
Using TensorFlow backend.

import h5py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')

from sklearn.metrics import confusion_matrix, accuracy_score
# Hyper Parameter
batch_size = 86
epochs = 20
# Set up TensorBoard
tensorboard = TensorBoard(batch_size=batch_size)

读取数据

with h5py.File("/Users/apple/pydata/3d_mnist/full_dataset_vectors.h5", 'r') as h5:
    X_train, y_train = h5["X_train"][:], h5["y_train"][:]
    X_test, y_test = h5["X_test"][:], h5["y_test"][:]

给图片添加 RGB 数据通道的维度(根据Kaggle数据页里提供plot3D.py文件,第一个函数)

# Translate data to color
def array_to_color(array, cmap="Oranges"):
    s_m = plt.cm.ScalarMappable(cmap=cmap)
    return s_m.to_rgba(array)[:,:-1]

def translate(x):
    xx = np.ndarray((x.shape[0], 4096, 3))
    for i in range(x.shape[0]):
        xx[i] = array_to_color(x[i])
        if i % 1000 == 0:
            print(i)
    # Free Memory
    del x

    return xx

数据转换为矢量形式

y_train = to_categorical(y_train, num_classes=10)
# y_test = to_categorical(y_test, num_classes=10)

X_train = translate(X_train).reshape(-1, 16, 16, 16, 3)
X_test  = translate(X_test).reshape(-1, 16, 16, 16, 3)

定义模型结构

# Conv3D layer
def Conv(filters=16, kernel_size=(3,3,3), activation='relu', input_shape=None):
    if input_shape:
        return Conv3D(filters=filters, kernel_size=kernel_size, padding='Same', activation=activation, input_shape=input_shape)
    else:
        return Conv3D(filters=filters, kernel_size=kernel_size, padding='Same', activation=activation)

# Define Model
def CNN(input_dim, num_classes):
    model = Sequential()

    model.add(Conv(8, (3,3,3), input_shape=input_dim))
    model.add(Conv(16, (3,3,3)))
    # model.add(BatchNormalization())
    model.add(MaxPool3D())
    # model.add(Dropout(0.25))

    model.add(Conv(32, (3,3,3)))
    model.add(Conv(64, (3,3,3)))
    model.add(BatchNormalization())
    model.add(MaxPool3D())
    model.add(Dropout(0.25))

    model.add(Flatten())

    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))

    model.add(Dense(1024, activation='relu'))
    model.add(Dropout(0.5))

    model.add(Dense(num_classes, activation='softmax'))

    return model

定义训练参数,验证方法,保存模型以及加载模型

# Train Model
def train(optimizer, scheduler):
    global model

    print("Training...")
    model.compile(optimizer = 'adam' , loss = "categorical_crossentropy", metrics=["accuracy"])

    model.fit(x=X_train, y=y_train, batch_size=batch_size, epochs=epochs, validation_split=0.15,
                    verbose=2, callbacks=[scheduler, tensorboard])

def evaluate():
    global model

    pred = model.predict(X_test)
    pred = np.argmax(pred, axis=1)

    print(accuracy_score(pred,y_test))
    # Heat Map
    array = confusion_matrix(y_test, pred)
    cm = pd.DataFrame(array, index = range(10), columns = range(10))
    plt.figure(figsize=(20,20))
    sns.heatmap(cm, annot=True)
    plt.show()

def save_model():
    global model

    model_json = model.to_json()
    with open('/Users/apple/pydata/3d_mnist/model/model_3D.json', 'w') as f:
        f.write(model_json)

    model.save_weights('/Users/apple/pydata/3d_mnist/model/model_3D.h5')

    print('Model Saved.')

def load_model():
    f = open('model/model_3D.json', 'r')
    model_json = f.read()
    f.close()

    loaded_model = model_from_json(model_json)
    loaded_model.load_weights('/Users/apple/pydata/3d_mnist/model/model_3D.h5')

    print("Model Loaded.")
    return loaded_model

if __name__ == '__main__':

    optimizer = RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
    scheduler = ReduceLROnPlateau(monitor='val_acc', patience=3, verbose=1, factor=0.5, min_lr=1e-5)

    model = CNN((16,16,16,3), 10)

    train(optimizer, scheduler)
    evaluate()
    save_model()

三维卷积神经网络结果:最高准确率75%

Training...
Train on 8500 samples, validate on 1500 samples
Epoch 1/20
 - 696s - loss: 3.1408 - acc: 0.1760 - val_loss: 7.5856 - val_acc: 0.1973
Epoch 2/20
 - 703s - loss: 1.6178 - acc: 0.4213 - val_loss: 7.9127 - val_acc: 0.2127
Epoch 3/20
 - 798s - loss: 1.2917 - acc: 0.5452 - val_loss: 6.1975 - val_acc: 0.2987
Epoch 4/20
 - 757s - loss: 1.1254 - acc: 0.6035 - val_loss: 1.0294 - val_acc: 0.6527
Epoch 5/20
 - 691s - loss: 1.0346 - acc: 0.6421 - val_loss: 1.0982 - val_acc: 0.6247
Epoch 6/20
 - 707s - loss: 0.9758 - acc: 0.6581 - val_loss: 0.9593 - val_acc: 0.6673
Epoch 7/20
 - 791s - loss: 0.9062 - acc: 0.6854 - val_loss: 0.9851 - val_acc: 0.6520
Epoch 8/20
 - 776s - loss: 0.8520 - acc: 0.7064 - val_loss: 1.1886 - val_acc: 0.6320
Epoch 9/20
 - 771s - loss: 0.7860 - acc: 0.7273 - val_loss: 3.0187 - val_acc: 0.5213

Epoch 00009: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 10/20
 - 767s - loss: 0.6525 - acc: 0.7728 - val_loss: 1.0288 - val_acc: 0.6793
Epoch 11/20
 - 728s - loss: 0.5816 - acc: 0.7995 - val_loss: 1.0606 - val_acc: 0.6760
Epoch 12/20
 - 688s - loss: 0.5443 - acc: 0.8114 - val_loss: 0.8698 - val_acc: 0.7247
Epoch 13/20
 - 696s - loss: 0.4823 - acc: 0.8326 - val_loss: 0.9301 - val_acc: 0.7007
Epoch 14/20
 - 740s - loss: 0.4209 - acc: 0.8561 - val_loss: 0.9847 - val_acc: 0.7100
Epoch 15/20
 - 730s - loss: 0.3656 - acc: 0.8746 - val_loss: 0.9250 - val_acc: 0.7260
Epoch 16/20
 - 804s - loss: 0.3150 - acc: 0.8928 - val_loss: 0.9000 - val_acc: 0.7387
Epoch 17/20
 - 759s - loss: 0.2949 - acc: 0.8999 - val_loss: 0.8230 - val_acc: 0.7387
Epoch 18/20
 - 778s - loss: 0.2401 - acc: 0.9180 - val_loss: 0.9853 - val_acc: 0.7460
Epoch 19/20
 - 759s - loss: 0.1829 - acc: 0.9365 - val_loss: 1.0410 - val_acc: 0.7493
Epoch 20/20
 - 695s - loss: 0.1827 - acc: 0.9392 - val_loss: 0.9528 - val_acc: 0.7507
0.753

Confusion Matrix 混淆矩阵


Confusion Matrix

讨论

  • 结论:从本机上复现的结果来看,在3D MNIST 数据集上,三维卷积神经网络的预测准确率,相比二维卷积神经网络,有着显著提升,最高提升约6%。
  • 不足之处:仅仅是复用了开源代码,修改了batch_size 和 epoch,识别准确率还不够高。

To-do

  • 调整超参数,修改模型结构,试着提高准确率
    • 更多神经层,更深的结构
    • 学习率、梯度下降的其他方法、不同的批尺寸(batch_size)等等
  • 在其他3D 数据集上实验三维卷积神经网络

参考资料

3D-MNIST Image Classification
3D Convolutions : Understanding and Implementation

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 212,686评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,668评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 158,160评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,736评论 1 284
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,847评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,043评论 1 291
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,129评论 3 410
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,872评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,318评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,645评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,777评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,470评论 4 333
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,126评论 3 317
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,861评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,095评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,589评论 2 362
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,687评论 2 351

推荐阅读更多精彩内容

  • 独居陋室不寂寞,网络连着你和我。说是陋室,因为它仅十五平米,一室一厨无卫浴。熟悉的人都知道。但是,斯是陋室,唯吾德...
    米雷聪聪阅读 271评论 0 0
  • 今天去西沟街跑步,和k来了个偶遇,看到了烟火里的哈尔滨。 理发师在路边撑了一个简易的理发小摊,上了年纪的大爷大妈在...
    亚茹_我是阿茹阅读 150评论 0 0
  • 臭小子恶作剧 1)今天放学,生活老师投诉:你家儿子中午干了件坏事。人家躺着睡觉,他把别人的袜子脱下来,放到别人脸上...
    米勒Li阅读 264评论 0 1
  • 一款产品(app),首先要是能解决目标用户群体一个什么样的问题,只有当目标明确了以后(方向确定了)。之后才考虑产品...
    DQLee阅读 169评论 0 0