本系列文章主要是基于《Modern Computer Vision with Pytorch》一书的学习笔记。
本书以最基础的神经网络讲起并覆盖了50个计算机视觉应用,让读者能够由浅入深理解并掌握计算机视觉相关知识及应用。
Section 1 Fundamentals of Deep Learning for Computer Vision
- The basic building blocks of a neural networks
- The role of each block
- Chapter 1 Artificial Neural Network Fundamentals
- Chapter 2 Pytorch Fundamentals
- Chapter 3 Building a Deep Neural Network with Pytorch
Chapter 1 Artificial Neural Network Fundamentals
主要简述了人工神经网络的一些发展史,此处略去,有需要自行百度获取相关人工智能发展史内容。
An Artificial Neural Network (ANN) is a supervised learning algorithm that is loosely inspired by the way the human brain functions.
下图为ImageNet从2010年到2016年的分类错误率数据图,从图中我们可以明显看出最近这些年来,随着深度学习逐渐应用,各种网络结构及tricks的使用,在图片分类上是取得相当不错的成绩。
- 2012年 Alex Net取得了ImageNet比赛的冠军后(准确率得到很大的提升),促使了深度学习在计算机视觉上的广泛应用。
Over time since then, with more deep and complex neural networks, the classification error kept reducing and has beaten human-level performance.
在第一章节中,主要通过一个简单网络结构,让大家了解一个神经网络的基本构成(前向网络,反向传播,学习率等基础模块)。
In this chapter, we will create a very simple architecture on a simple dataset and mainly focus on how the various building blocks (feedforward, backpropagation, learning rate) of an ANN help in adjusting the weights so that the network learns to predict the expected outputs from given inputs.
通过以下几个方向展开本章节的学习
- AI 与传统机器学习比较
- 学习人工神经网络的模块
- 完成前向网络构建
- 后向传播构建
- 前向网络及后向传播网络结合起来
- 理解学习率的影响
- 总述一个神经网络的训练过程
Comparing AI and traditional machine learning
传统模式下,所谓的智能系统一般是由程序员们编写的一些复杂算法构成(人为的特征提取等)。
示例一
如下图所示,从图片中假设我们可以提取出这么一条特征,一张图片中若有3个黑圆圈且形成三角形状,即可判断该图片是狗。
但是这个特征(规则)也很快能被其他图片所“攻破”的,如一张松饼的局部图片。当然仅这么一条判断规则,也是很容易误判其他相似的图片。
因此为了提高图片分类的准确率,人们就需要不断地提取各种可能的规则出来,尤其对于那些特别复杂的图片,所需要的人为特征规则可能是指数级的。
We can extend the same line of thought to any domain, such as text or structured data. In the past, if someone was interested in programming to solve a real-world
task, it became necessary for them to understand everything about the input data and write as many rules as possible to cover every scenario.
与传统机器学习模式不同的是,现在人工神经网络,我们只需要一步即可完成传统机器学习的工作。神经网络有一个很大的优势就是可以通过少量特征工程环节自动提取特征并用这些特征进行分类或回归任务。在这个过程中,只需要提代有标签数据及神经网络结构即可。在整个过程中无须人工进行规则提取等工作,极大地解放了程序员。
需要注意的是,在神经网络进行解决问题时,是需要有足够多的数据来支撑整个模型的训练。
Notice that the main requirement is that we provide a considerable amount of examples for the task that needs a solution.
For example, in the preceding case, we need to provide lots and lots of dog and not-dog pictures to the model so it learns the features.
Learning about the artificial neural network building blocks
An ANN is a collection of tensors (weights) and mathematical operations, arranged in such a way to loosely replicate the functioning of a human brain.
ANN一般由几个部分组成:
- 输入层 数据输入
- 隐藏层 连接输入层与输出层,同时该层有多个节点(神经元),可以通过改变这些神经元来改变输入层数据的复杂程度(维度)变化。
-
输出层 期望的结果输出
ANN.png
上面单个神经元的输出为
上面式中的f为激活函数,用于增加整个神经元的非线性表达,使其更贴合现实情况。
Further, higher nonlinearity can be achieved by having more than one hidden layer, stacking multitudes of neurons.
正如上面所述的,一个完整的ANN结构基本上是由输入层,隐藏层和输出层这三个模块所组成的。对于隐藏层来说,可以有多个隐藏层连接而成,即深度学习的“深度”就是指多层隐藏层。
Note that you can have a higher number (n) of hidden layers, with the term deep learning referring to the greater number of hidden layers.
Implementing feedforward propagation
本小节主要是通过一例子解释了ANN的前向网络的流程:Input(Data),Hidden Layer和Output(predict)。
上图所示,每条带有箭头的线上就有一个权重值(weights)需要我们通过模型训练去得到相应的值(此处省略了偏置)。
Every arrow in the preceding diagram contains exactly one float value (weight) that is adjustable. There are 9 (6 in the first hidden layer and 3 in the second) floats that we need to find, so that when the input is (1,1), the output is as close to (0) as possible.
下面通过几张原书中的截图,看一下整个网络的前向计算:
上面的几张图主要展示是从输入层到隐藏层的计算过程,此时并没有引入非线性激活函数,只是简单的线性函数的表达。
Note that, if we do not apply a non-linear activation function in the hidden layer, the neural network becomes a giant linear connection from input to output, no matter how many hidden layers exist.
Applying the activation function
激活函数可以提升整个模型的非线性表达能力,模型可以学习更多的特征信息。
Activation functions help in modeling complex relations between the input and the output.
几个常见的激活函数:
现在我们对上面的例子引入激活函数(sigmoid)后重新计算隐藏层各节点的输出值:
计算模型的输出值:
从上面的初始化的权重值,我们计算得到1.235的输出值(目标值是0),即输出与目标值存在差异,而这差异就是我们要努力去减少的部分,从而我们就又有了损失函数来优化这部分差异。
Calculating loss value
我们对神经网络模型进行优化主要就是通过损失值(也称cost function)。
Loss values (alternatively called cost functions) are the values that we optimize for in a neural network.
对于现在的实际应用中,其实人工智能主要是对以下两种场景进行预测:
- 类别预测(分类问题)Categorical variable prediction
-- 对于预测离散值(discrete)的类别分类问题,一般采用类别交叉熵作为损失函数(cross-entropy loss function)
- 二分类交叉熵(Binary Cross-Entropy)
-
多分类交叉熵(Categorical cross-entropy)
Cross Entropy
样例:
image.png
image.png
- 连续值预测(回归问题)Continuous variable prediction
-- 对于连续值,一般是用均方差,即预测值与真实值的差值方差作为损失值。
Typically, when the variable is continuous, the loss value is calculated as the mean of the square of the difference in actual values and predictions, that is, we try to minimize the mean squared error by varying the weight values associated with the neural network.
于是模型的训练就是通过不断的优化更新各连接上的权重值,使得损失值能趋近于0(最小)。
像上面刚刚的例子,其均方差损失值为:
前向网络
import numpy as np
def feed_forward(inputs, outputs, weights):
pre_hidden = np.dot(inputs, weights[0]) + weights[1]
hidden = 1 / (1+np.exp(-pre_hidden))
pred_out = np.dot(hidden, weights[2]) + weights[3]
mean_squared_error = np.mean(np.square(pred_out - outputs))
return mean_squared_error
- 常见激活函数,基于numpy
import numpy as np
# sigmoid
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# tahn
def tahn(x):
return (np.exp(x) - np.exp(-x)) / (np.exp(x)+ np.exp(-x))
# ReLU
def relu(x):
return np.where(x>0, x, 0)
# Linear
def linear(x):
return x
# softmax
def softmax(x):
return np.exp(x) / np.sum(np.exp(x))
- 损失函数,基于numpy
import numpy as np
# MSE
def mse(p, y):
return np.mean(np.square(p - y ))
# MAE 绝对值均值
def mae(p, y):
return np.mean(np.abs(p-y))
# Binary cross entropy
"""
Note that binary cross-entropy loss has a high value when the predicted
value is far away from the actual value and a low value when the predicted
and actual values are close
"""
def binary_cross_entropy(p, y):
return -np.mean(np.sum((y*np.log(p) + (1-y)*np.log(1-p))))
# categorical cross entropy
def categorical_cross_entropy(p, y):
return -np.mean(np.sum(y*np.log(p)))
Implementing backpropagation
前面我们主要是将输入层与隐藏层通过随机初始化各权重值及激活函数进行联合计算得到了一个预测输出值,但我们发现该值与实际值相差很大。因此为了使模型能够得到更准确的输出值,我们需要通过反向传播算法,迭代优化我们的权重值,逐渐使各权重能够让模型的输出值更精确,即损失值最小。
在后向传播主要理解梯度下降及学习率的作用,就能清楚整个后向传播的工作机制,即网络通过梯度下降和学习率去逐步迭代更新各个权重值,最终使得模型权重值能够获取一个最优解。
from copy import deepcopy
import numpy as np
def update_weights(inputs, outputs, weights, lr):
original_weights = deepcopy(weights)
temp_weights = deepcopy(weights)
updated_weights = deepcopy(weights)
original_loss = feed_forward(inputs, outputs, original_weights)
for i, layer in enumerate(original_weights):
for index, weight in np.ndenumerate(layer):
temp_weights = deepcopy(weights)
temp_weights[i][index] += 0.0001
_loss_plus = feed_forward(inputs, outputs, temp_weights)
grad = (_loss_plus - original_loss)/(0.0001)
updated_weights[i][index] -= grad*lr
return updated_weights, original_loss
链式法则 chain rule
完整示例
%matplotlib inline
import numpy as np
from copy import deepcopy
import matplotlib.pyplot as plt
# initial data
x = np.array([[1,1]])
y = np.array([[0]])
W = [
np.array([[-0.0053, 0.3793],
[-0.5820, -0.5204],
[-0.2723, 0.1896]], dtype=np.float32).T,
np.array([-0.0140, 0.5607, -0.0628], dtype=np.float32),
np.array([[ 0.1528, -0.1745, -0.1135]], dtype=np.float32).T,
np.array([-0.5516], dtype=np.float32)
]
# 前向传播
def feed_forward(inputs, outputs, weights):
pre_hidden = np.dot(inputs,weights[0])+ weights[1]
hidden = 1/(1+np.exp(-pre_hidden))
out = np.dot(hidden, weights[2]) + weights[3]
mean_squared_error = np.mean(np.square(out - outputs))
return mean_squared_error
# 更新权重
def update_weights(inputs, outputs, weights, lr):
original_weights = deepcopy(weights)
temp_weights = deepcopy(weights)
updated_weights = deepcopy(weights)
original_loss = feed_forward(inputs, outputs, original_weights)
for i, layer in enumerate(original_weights):
for index, weight in np.ndenumerate(layer):
temp_weights = deepcopy(weights)
temp_weights[i][index] += 0.0001
_loss_plus = feed_forward(inputs, outputs, temp_weights)
grad = (_loss_plus - original_loss)/(0.0001)
updated_weights[i][index] -= grad*lr
return updated_weights, original_loss
# 迭代优化权重值
losses = []
for epoch in range(100):
W, loss = update_weights(x,y,W,0.01)
losses.append(loss)
plt.plot(losses)
plt.title('Loss over increasing number of epochs')
学习率的作用
以后笔记再详述,本篇笔记不详述。