https://tensorflow.google.cn/tutorials/recurrent?hl=zh-CN

简介

看一看这篇伟大的文章，介绍了循环神经网络和LSTM。
Understanding LSTM Networks
https://blog.csdn.net/xingzhedai/article/details/53144126

语言建模

在本教程中，我们将展示如何在一个具有挑战性的语言建模任务上训练一个循环神经网络。
问题的目标是:要fit一个概率模型, 将概率分配给句子。
通过预测文本中的下一个单词来实现。
为了这个目的,我们将使用Penn Tree Bank (PTB)数据集，该数据集是衡量这些模型质量的一个流行基准，这个数据集小而且训练相对快。
语言建模是许多有趣问题的关键，如语音识别、机器翻译或图像字幕。
这也很有趣——看看这里。
为了本教程的目的，我们将重现来自【Zaremba et al.， 2014 (pdf)】的结果，它在PTB数据集上获得了非常好的质量。

教程文件

本教程引用了[TensorFlow models repo]中“models/tutorials/rnn/ptb”中的以下文件:

ptb_word_lm.py
在PTB数据集上训练语言模型的代码。
reader.py
读取数据集的代码

下载并准备数据。

本教程所需的数据来自于Tomas Mikolov网站的“data/”目录(PTB数据集)
数据集已经被预处理，包含了10000个不同的单词，包括句尾标记和一个特殊符号()。
在“reader.py '，我们将每个单词转换为一个唯一的整数标识符，以便使神经网络更容易处理数据。

模型

LSTM（Long Short-Term Memory）

该模型的核心是一个LSTM单元，它一次处理一个单词，并计算句子中下一个单词可能值的概率。
网络的内存状态用一个0向量初始化，读取每个单词后得到更新。
由于计算原因，我们将处理小批量的batch_size的数据。
在本例中，需要注意的是current_batch_of_words不对应单词的“句子”。
批次中的每一个字都对应一个时间t。
TensorFlow将自动对每个批次的渐变进行求和。
For example:

 t=0  t=1    t=2  t=3     t=4
[The, brown, fox, is,     quick]
[The, red,   fox, jumped, high]

words_in_dataset[0] = [The, The]
words_in_dataset[1] = [brown, red]
words_in_dataset[2] = [fox, fox]
words_in_dataset[3] = [is, jumped]
words_in_dataset[4] = [quick, high]
batch_size = 2, time_steps = 5

The basic pseudocode is as follows:

words_in_dataset = tf.placeholder(tf.float32, [time_steps, batch_size, num_features])
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
hidden_state = tf.zeros([batch_size, lstm.state_size])
current_state = tf.zeros([batch_size, lstm.state_size])
state = hidden_state, current_state
probabilities = []
loss = 0.0
for current_batch_of_words in words_in_dataset:
    # The value of state is updated after processing each batch of words.
    output, state = lstm(current_batch_of_words, state)

    # The LSTM output can be used to make next word predictions
    logits = tf.matmul(output, softmax_w) + softmax_b
    probabilities.append(tf.nn.softmax(logits))
    loss += loss_function(probabilities, target_words)

截断反向传播（Truncated Backpropagation）

按照设计，递归神经网络(RNN)的输出依赖于任意距离的输入。
不幸的是，这使得反向传播计算变得困难。
I为了使学习过程易于处理，创建一个“未滚动”的网络版本是很常见的做法，该版本包含了LSTM输入和输出的固定数字(num_steps)。
然后将模型训练到RNN的有限逼近。
这可以通过每次输入长度num_steps的输入来实现，并在每个输入块后执行向后传递。
下面是一个简化的代码块，用于创建一个执行截断反向传播的图:

# Placeholder for the inputs in a given iteration.
words = tf.placeholder(tf.int32, [batch_size, num_steps])

lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
initial_state = state = tf.zeros([batch_size, lstm.state_size])

for i in range(num_steps):
    # The value of state is updated after processing each batch of words.
    output, state = lstm(words[:, i], state)

    # The rest of the code.
    # ...

final_state = state

这是如何在整个数据集上实现迭代:

# A numpy array holding the state of LSTM after each batch of words.
numpy_state = initial_state.eval()
total_loss = 0.0
for current_batch_of_words in words_in_dataset:
    numpy_state, current_loss = session.run([final_state, loss],
        # Initialize the LSTM state from the previous iteration.
        feed_dict={initial_state: numpy_state, words: current_batch_of_words})
    total_loss += current_loss

输入

在向LSTM输入之前，单词id将被嵌入到一个密集的表示中(参见向量表示教程)。
这使得模型能够有效地表示特定单词的知识。
It is also easy to write:

# embedding_matrix is a tensor of shape [vocabulary_size, embedding size]
word_embeddings = tf.nn.embedding_lookup(embedding_matrix, word_ids)

The embedding matrix will be initialized randomly and the model will learn to differentiate the meaning of words just by looking at the data.
嵌入矩阵将被随机初始化，模型将学习通过查看数据来区分单词的含义。

损失函数

我们希望最小化目标词的负对数平均概率:

loss

它并不很难实现，但是函数sequence_loss_by_example已经可用，所以我们可以在这里使用它。
在论文中所报道的典型措施是平均每个词的复杂度(通常被称为复杂度)，它等于：

loss

我们将在整个培训过程中监控其值。

叠加多个LSTMs

为了让模型更有表现力，我们可以添加多层LSTMs来处理数据。
第一层的输出将成为第二个的输入，以此类推。
我们有一个名为MultiRNNCell的类，它能无缝实现:

def lstm_cell():
  return tf.contrib.rnn.BasicLSTMCell(lstm_size)
stacked_lstm = tf.contrib.rnn.MultiRNNCell(
    [lstm_cell() for _ in range(number_of_layers)])

initial_state = state = stacked_lstm.zero_state(batch_size, tf.float32)
for i in range(num_steps):
    # The value of state is updated after processing each batch of words.
    output, state = stacked_lstm(words[:, i], state)

    # The rest of the code.
    # ...

final_state = state

运行代码

在运行代码之前，下载PTB数据集，如本教程的开头所述。
Then, extract the PTB dataset underneath your home directory as follows:
然后，在您的主目录下提取PTB数据集如下:

tar xvfz simple-examples.tgz -C $HOME

(Note: On Windows, you may need to use other tools.)

Now, clone the TensorFlow models repo from GitHub. Run the following commands:
现在，从GitHub上克隆TensorFlow models repo。运行以下命令:

cd models/tutorials/rnn/ptb
python ptb_word_lm.py --data_path=$HOME/simple-examples/data/ --model=small

在教程代码中有3个支持的模型配置:“small”、“medium”和“large”。
它们之间的区别在于LSTMs的大小和用于训练的超参数的集合。
模型越大，得到的结果就越好。
这个“small”模型应该能够在测试集上达到120以下的复杂度，而“large”模型达到80以下，但是可能需要几个小时的时间来训练。

下一步?

有几个使模型更好的技巧我们没有提到，包括:
减少学习速率时间计划,
在LSTM层之间dropout 。
研究代码并修改它以进一步改进模型。

递归神经网络

简介