Pytorch版-计算机视觉之一

image.png

本系列文章主要是基于《Modern Computer Vision with Pytorch》一书的学习笔记。
本书以最基础的神经网络讲起并覆盖了50个计算机视觉应用，让读者能够由浅入深理解并掌握计算机视觉相关知识及应用。

Section 1 Fundamentals of Deep Learning for Computer Vision

The basic building blocks of a neural networks
- The role of each block
Chapter 1 Artificial Neural Network Fundamentals
Chapter 2 Pytorch Fundamentals
Chapter 3 Building a Deep Neural Network with Pytorch

Chapter 1 Artificial Neural Network Fundamentals

主要简述了人工神经网络的一些发展史，此处略去，有需要自行百度获取相关人工智能发展史内容。

An Artificial Neural Network (ANN) is a supervised learning algorithm that is loosely inspired by the way the human brain functions.

下图为ImageNet从2010年到2016年的分类错误率数据图，从图中我们可以明显看出最近这些年来，随着深度学习逐渐应用，各种网络结构及tricks的使用，在图片分类上是取得相当不错的成绩。

image.png

2012年 Alex Net取得了ImageNet比赛的冠军后（准确率得到很大的提升），促使了深度学习在计算机视觉上的广泛应用。

Over time since then, with more deep and complex neural networks, the classification error kept reducing and has beaten human-level performance.

在第一章节中，主要通过一个简单网络结构，让大家了解一个神经网络的基本构成（前向网络，反向传播，学习率等基础模块）。

In this chapter, we will create a very simple architecture on a simple dataset and mainly focus on how the various building blocks (feedforward, backpropagation, learning rate) of an ANN help in adjusting the weights so that the network learns to predict the expected outputs from given inputs.

通过以下几个方向展开本章节的学习

AI 与传统机器学习比较
学习人工神经网络的模块
完成前向网络构建
后向传播构建
前向网络及后向传播网络结合起来
理解学习率的影响
总述一个神经网络的训练过程

Comparing AI and traditional machine learning

传统模式下，所谓的智能系统一般是由程序员们编写的一些复杂算法构成（人为的特征提取等）。

Traditional ML.png

示例一
如下图所示，从图片中假设我们可以提取出这么一条特征，一张图片中若有3个黑圆圈且形成三角形状，即可判断该图片是狗。

Dogs.png

但是这个特征（规则）也很快能被其他图片所“攻破”的，如一张松饼的局部图片。当然仅这么一条判断规则，也是很容易误判其他相似的图片。

muffin.png

因此为了提高图片分类的准确率，人们就需要不断地提取各种可能的规则出来，尤其对于那些特别复杂的图片，所需要的人为特征规则可能是指数级的。

We can extend the same line of thought to any domain, such as text or structured data. In the past, if someone was interested in programming to solve a real-world
task, it became necessary for them to understand everything about the input data and write as many rules as possible to cover every scenario.

与传统机器学习模式不同的是，现在人工神经网络，我们只需要一步即可完成传统机器学习的工作。神经网络有一个很大的优势就是可以通过少量特征工程环节自动提取特征并用这些特征进行分类或回归任务。在这个过程中，只需要提代有标签数据及神经网络结构即可。在整个过程中无须人工进行规则提取等工作，极大地解放了程序员。
需要注意的是，在神经网络进行解决问题时，是需要有足够多的数据来支撑整个模型的训练。

Notice that the main requirement is that we provide a considerable amount of examples for the task that needs a solution.
For example, in the preceding case, we need to provide lots and lots of dog and not-dog pictures to the model so it learns the features.

AI architecture.png

Learning about the artificial neural network building blocks

An ANN is a collection of tensors (weights) and mathematical operations, arranged in such a way to loosely replicate the functioning of a human brain.
ANN一般由几个部分组成：

输入层数据输入
隐藏层连接输入层与输出层，同时该层有多个节点（神经元），可以通过改变这些神经元来改变输入层数据的复杂程度（维度）变化。
输出层期望的结果输出

ANN.png

下图为隐藏层里单个节点（神经元）的细节：

neuron.png

上面单个神经元的输出为

output.png

上面式中的f为激活函数，用于增加整个神经元的非线性表达，使其更贴合现实情况。

Further, higher nonlinearity can be achieved by having more than one hidden layer, stacking multitudes of neurons.

正如上面所述的，一个完整的ANN结构基本上是由输入层，隐藏层和输出层这三个模块所组成的。对于隐藏层来说，可以有多个隐藏层连接而成，即深度学习的“深度”就是指多层隐藏层。

Note that you can have a higher number (n) of hidden layers, with the term deep learning referring to the greater number of hidden layers.

Implementing feedforward propagation

本小节主要是通过一例子解释了ANN的前向网络的流程：Input(Data)，Hidden Layer和Output(predict)。

simple ann.png

上图所示，每条带有箭头的线上就有一个权重值（weights）需要我们通过模型训练去得到相应的值（此处省略了偏置）。

Every arrow in the preceding diagram contains exactly one float value (weight) that is adjustable. There are 9 (6 in the first hidden layer and 3 in the second) floats that we need to find, so that when the input is (1,1), the output is as close to (0) as possible.

下面通过几张原书中的截图，看一下整个网络的前向计算：

ann-weights.png

calculate process.png

hidden note values.png

上面的几张图主要展示是从输入层到隐藏层的计算过程，此时并没有引入非线性激活函数，只是简单的线性函数的表达。

Note that, if we do not apply a non-linear activation function in the hidden layer, the neural network becomes a giant linear connection from input to output, no matter how many hidden layers exist.

Applying the activation function
激活函数可以提升整个模型的非线性表达能力，模型可以学习更多的特征信息。

Activation functions help in modeling complex relations between the input and the output.

几个常见的激活函数：

Activation Function

image.png

现在我们对上面的例子引入激活函数（sigmoid）后重新计算隐藏层各节点的输出值：

sigmoid output

计算模型的输出值：

output value.png

calculating output value.png

从上面的初始化的权重值，我们计算得到1.235的输出值（目标值是0），即输出与目标值存在差异，而这差异就是我们要努力去减少的部分，从而我们就又有了损失函数来优化这部分差异。

Calculating loss value
我们对神经网络模型进行优化主要就是通过损失值（也称cost function）。

Loss values (alternatively called cost functions) are the values that we optimize for in a neural network.

对于现在的实际应用中，其实人工智能主要是对以下两种场景进行预测：

类别预测（分类问题）Categorical variable prediction
-- 对于预测离散值（discrete)的类别分类问题，一般采用类别交叉熵作为损失函数（cross-entropy loss function)

二分类交叉熵（Binary Cross-Entropy）
多分类交叉熵（Categorical cross-entropy）

Cross Entropy

样例：

image.png

image.png

连续值预测（回归问题）Continuous variable prediction
-- 对于连续值，一般是用均方差，即预测值与真实值的差值方差作为损失值。

Typically, when the variable is continuous, the loss value is calculated as the mean of the square of the difference in actual values and predictions, that is, we try to minimize the mean squared error by varying the weight values associated with the neural network.

MSE

于是模型的训练就是通过不断的优化更新各连接上的权重值，使得损失值能趋近于0（最小）。
像上面刚刚的例子，其均方差损失值为：

image.png

前向网络

import numpy as np

def feed_forward(inputs, outputs, weights):
    pre_hidden = np.dot(inputs, weights[0]) + weights[1]
    hidden = 1 / (1+np.exp(-pre_hidden))
    pred_out = np.dot(hidden, weights[2]) + weights[3]
    mean_squared_error = np.mean(np.square(pred_out  - outputs))
    
    return mean_squared_error

常见激活函数，基于numpy

import numpy as np

# sigmoid
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# tahn
def tahn(x):
    return (np.exp(x) - np.exp(-x)) / (np.exp(x)+ np.exp(-x))

# ReLU
def relu(x):
    return np.where(x>0, x, 0)

# Linear
def linear(x):
    return x

# softmax
def softmax(x):
    return np.exp(x) / np.sum(np.exp(x))

损失函数，基于numpy

import numpy as np


# MSE
def mse(p, y):
    return np.mean(np.square(p - y ))

# MAE 绝对值均值
def mae(p, y):
    return np.mean(np.abs(p-y))

# Binary cross entropy
"""
Note that binary cross-entropy loss has a high value when the predicted
value is far away from the actual value and a low value when the predicted
and actual values are close
"""
def binary_cross_entropy(p, y):
    return -np.mean(np.sum((y*np.log(p) + (1-y)*np.log(1-p))))

# categorical cross entropy
def categorical_cross_entropy(p, y):
    return -np.mean(np.sum(y*np.log(p)))

Implementing backpropagation

前面我们主要是将输入层与隐藏层通过随机初始化各权重值及激活函数进行联合计算得到了一个预测输出值，但我们发现该值与实际值相差很大。因此为了使模型能够得到更准确的输出值，我们需要通过反向传播算法，迭代优化我们的权重值，逐渐使各权重能够让模型的输出值更精确，即损失值最小。

backpropagation

在后向传播主要理解梯度下降及学习率的作用，就能清楚整个后向传播的工作机制，即网络通过梯度下降和学习率去逐步迭代更新各个权重值，最终使得模型权重值能够获取一个最优解。

from copy import deepcopy
import numpy as np
def update_weights(inputs, outputs, weights, lr):
    original_weights = deepcopy(weights)
    temp_weights = deepcopy(weights)
    updated_weights = deepcopy(weights)
    original_loss = feed_forward(inputs, outputs, original_weights)
    for i, layer in enumerate(original_weights):
        for index, weight in np.ndenumerate(layer):
            temp_weights = deepcopy(weights)
            temp_weights[i][index] += 0.0001
            _loss_plus = feed_forward(inputs, outputs, temp_weights)
            grad = (_loss_plus - original_loss)/(0.0001)
            updated_weights[i][index] -= grad*lr
    return updated_weights, original_loss

链式法则 chain rule

image.png

完整示例

%matplotlib inline
import numpy as np 
from copy import deepcopy
import matplotlib.pyplot as plt

# initial data
x = np.array([[1,1]])
y = np.array([[0]])
W = [
    np.array([[-0.0053, 0.3793],
              [-0.5820, -0.5204],
              [-0.2723, 0.1896]], dtype=np.float32).T, 
    np.array([-0.0140, 0.5607, -0.0628], dtype=np.float32), 
    np.array([[ 0.1528, -0.1745, -0.1135]], dtype=np.float32).T, 
    np.array([-0.5516], dtype=np.float32)
]

# 前向传播
def feed_forward(inputs, outputs, weights):     
    pre_hidden = np.dot(inputs,weights[0])+ weights[1]
    hidden = 1/(1+np.exp(-pre_hidden))
    out = np.dot(hidden, weights[2]) + weights[3]
    mean_squared_error = np.mean(np.square(out - outputs))
    return mean_squared_error

# 更新权重 
def update_weights(inputs, outputs, weights, lr):
    original_weights = deepcopy(weights)
    temp_weights = deepcopy(weights)
    updated_weights = deepcopy(weights)
    original_loss = feed_forward(inputs, outputs, original_weights)
    for i, layer in enumerate(original_weights):
        for index, weight in np.ndenumerate(layer):
            temp_weights = deepcopy(weights)
            temp_weights[i][index] += 0.0001
            _loss_plus = feed_forward(inputs, outputs, temp_weights)
            grad = (_loss_plus - original_loss)/(0.0001)
            updated_weights[i][index] -= grad*lr
    return updated_weights, original_loss

# 迭代优化权重值
losses = []
for epoch in range(100):
    W, loss = update_weights(x,y,W,0.01)
    losses.append(loss)
plt.plot(losses)
plt.title('Loss over increasing number of epochs')

image.png

学习率的作用

以后笔记再详述，本篇笔记不详述。

最后编辑于：2021.08.04 15:25:32

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 218,941评论 6赞 508
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 93,397评论 3赞 395
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 165,345评论 0赞 356
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 58,851评论 1赞 295
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 67,868评论 6赞 392
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 51,688评论 1赞 305
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 40,414评论 3赞 418
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 39,319评论 0赞 276
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 45,775评论 1赞 315
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 37,945评论 3赞 336
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 40,096评论 1赞 350
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 35,789评论 5赞 346
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 41,437评论 3赞 331
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 31,993评论 0赞 22
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 33,107评论 1赞 271
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 48,308评论 3赞 372
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 45,037评论 2赞 355

Pytorch版-计算机视觉之一

Section 1 Fundamentals of Deep Learning for Computer Vision

Chapter 1 Artificial Neural Network Fundamentals

Comparing AI and traditional machine learning

Learning about the artificial neural network building blocks

Implementing feedforward propagation

几个常见的激活函数：

Implementing backpropagation

完整示例

学习率的作用

推荐阅读更多精彩内容