本系列文章主要是基于《Modern Computer Vision with Pytorch》一书的学习笔记。
Section 1 Fundamentals of Deep Learning for Computer Vision
- The basic building blocks of a neural networks
- The role of each block
- Chapter 1 Artificial Neural Network Fundamentals
- Chapter 2 Pytorch Fundamentals
- Chapter 3 Building a Deep Neural Network with Pytorch
Chapter 1 Artificial Neural Network Fundamentals
An Artificial Neural Network (ANN) is a supervised learning algorithm that is loosely inspired by the way the human brain functions.
- 2012年 Alex Net取得了ImageNet比赛的冠军后(准确率得到很大的提升),促使了深度学习在计算机视觉上的广泛应用。
Over time since then, with more deep and complex neural networks, the classification error kept reducing and has beaten human-level performance.
In this chapter, we will create a very simple architecture on a simple dataset and mainly focus on how the various building blocks (feedforward, backpropagation, learning rate) of an ANN help in adjusting the weights so that the network learns to predict the expected outputs from given inputs.
- AI 与传统机器学习比较
- 学习人工神经网络的模块
- 完成前向网络构建
- 后向传播构建
- 前向网络及后向传播网络结合起来
- 理解学习率的影响
- 总述一个神经网络的训练过程
Comparing AI and traditional machine learning
We can extend the same line of thought to any domain, such as text or structured data. In the past, if someone was interested in programming to solve a real-world
task, it became necessary for them to understand everything about the input data and write as many rules as possible to cover every scenario.
Notice that the main requirement is that we provide a considerable amount of examples for the task that needs a solution.
For example, in the preceding case, we need to provide lots and lots of dog and not-dog pictures to the model so it learns the features.
Learning about the artificial neural network building blocks
An ANN is a collection of tensors (weights) and mathematical operations, arranged in such a way to loosely replicate the functioning of a human brain.
- 输入层 数据输入
- 隐藏层 连接输入层与输出层,同时该层有多个节点(神经元),可以通过改变这些神经元来改变输入层数据的复杂程度(维度)变化。
输出层 期望的结果输出
Further, higher nonlinearity can be achieved by having more than one hidden layer, stacking multitudes of neurons.
Note that you can have a higher number (n) of hidden layers, with the term deep learning referring to the greater number of hidden layers.
Implementing feedforward propagation
本小节主要是通过一例子解释了ANN的前向网络的流程:Input(Data),Hidden Layer和Output(predict)。
Every arrow in the preceding diagram contains exactly one float value (weight) that is adjustable. There are 9 (6 in the first hidden layer and 3 in the second) floats that we need to find, so that when the input is (1,1), the output is as close to (0) as possible.
Note that, if we do not apply a non-linear activation function in the hidden layer, the neural network becomes a giant linear connection from input to output, no matter how many hidden layers exist.
Applying the activation function
Activation functions help in modeling complex relations between the input and the output.
Calculating loss value
我们对神经网络模型进行优化主要就是通过损失值(也称cost function)。
Loss values (alternatively called cost functions) are the values that we optimize for in a neural network.
- 类别预测(分类问题)Categorical variable prediction
-- 对于预测离散值(discrete)的类别分类问题,一般采用类别交叉熵作为损失函数(cross-entropy loss function)
- 二分类交叉熵(Binary Cross-Entropy)
多分类交叉熵(Categorical cross-entropy)
Cross Entropy
- 连续值预测(回归问题)Continuous variable prediction
-- 对于连续值,一般是用均方差,即预测值与真实值的差值方差作为损失值。
Typically, when the variable is continuous, the loss value is calculated as the mean of the square of the difference in actual values and predictions, that is, we try to minimize the mean squared error by varying the weight values associated with the neural network.
import numpy as np
def feed_forward(inputs, outputs, weights):
pre_hidden = np.dot(inputs, weights[0]) + weights[1]
hidden = 1 / (1+np.exp(-pre_hidden))
pred_out = np.dot(hidden, weights[2]) + weights[3]
mean_squared_error = np.mean(np.square(pred_out - outputs))
return mean_squared_error
- 常见激活函数,基于numpy
import numpy as np
# sigmoid
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# tahn
def tahn(x):
return (np.exp(x) - np.exp(-x)) / (np.exp(x)+ np.exp(-x))
# ReLU
def relu(x):
return np.where(x>0, x, 0)
# Linear
def linear(x):
return x
# softmax
def softmax(x):
return np.exp(x) / np.sum(np.exp(x))
- 损失函数,基于numpy
import numpy as np
def mse(p, y):
return np.mean(np.square(p - y ))
# MAE 绝对值均值
def mae(p, y):
return np.mean(np.abs(p-y))
# Binary cross entropy
Note that binary cross-entropy loss has a high value when the predicted
value is far away from the actual value and a low value when the predicted
and actual values are close
def binary_cross_entropy(p, y):
return -np.mean(np.sum((y*np.log(p) + (1-y)*np.log(1-p))))
# categorical cross entropy
def categorical_cross_entropy(p, y):
return -np.mean(np.sum(y*np.log(p)))
Implementing backpropagation
from copy import deepcopy
import numpy as np
def update_weights(inputs, outputs, weights, lr):
original_weights = deepcopy(weights)
temp_weights = deepcopy(weights)
updated_weights = deepcopy(weights)
original_loss = feed_forward(inputs, outputs, original_weights)
for i, layer in enumerate(original_weights):
for index, weight in np.ndenumerate(layer):
temp_weights = deepcopy(weights)
temp_weights[i][index] += 0.0001
_loss_plus = feed_forward(inputs, outputs, temp_weights)
grad = (_loss_plus - original_loss)/(0.0001)
updated_weights[i][index] -= grad*lr
return updated_weights, original_loss
链式法则 chain rule
%matplotlib inline
import numpy as np
from copy import deepcopy
import matplotlib.pyplot as plt
# initial data
x = np.array([[1,1]])
y = np.array([[0]])
W = [
np.array([[-0.0053, 0.3793],
[-0.5820, -0.5204],
[-0.2723, 0.1896]], dtype=np.float32).T,
np.array([-0.0140, 0.5607, -0.0628], dtype=np.float32),
np.array([[ 0.1528, -0.1745, -0.1135]], dtype=np.float32).T,
np.array([-0.5516], dtype=np.float32)
# 前向传播
def feed_forward(inputs, outputs, weights):
pre_hidden = np.dot(inputs,weights[0])+ weights[1]
hidden = 1/(1+np.exp(-pre_hidden))
out = np.dot(hidden, weights[2]) + weights[3]
mean_squared_error = np.mean(np.square(out - outputs))
return mean_squared_error
# 更新权重
def update_weights(inputs, outputs, weights, lr):
original_weights = deepcopy(weights)
temp_weights = deepcopy(weights)
updated_weights = deepcopy(weights)
original_loss = feed_forward(inputs, outputs, original_weights)
for i, layer in enumerate(original_weights):
for index, weight in np.ndenumerate(layer):
temp_weights = deepcopy(weights)
temp_weights[i][index] += 0.0001
_loss_plus = feed_forward(inputs, outputs, temp_weights)
grad = (_loss_plus - original_loss)/(0.0001)
updated_weights[i][index] -= grad*lr
return updated_weights, original_loss
# 迭代优化权重值
losses = []
for epoch in range(100):
W, loss = update_weights(x,y,W,0.01)
plt.title('Loss over increasing number of epochs')