Is a type of artificial neural network where connections between nodes form a sequence. This allows temporal dynamic behavior for time sequence.
There are 3 types of vanilla recurrent neural network: the simple (RNN), gated recurrent unit (GRU) and long short term memory unit (LSTM).
RNN – Recurrent Neural Network
Notation
: input vector ().
: hidden layer vector .
: output vector .
: bias vector .
: parameter matrices .
: parameter matrix .
: activation functions.
Feed-Forward
Backpropagation
Example
- Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification).
- Sequence output (e.g. image captioning takes an image and outputs a sentence of words).
- Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment)
- Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French).
- Synced sequence input and output (e.g. video classification where we wish to label each frame of the video).
LSTM – Long-Short Term Memory
Notation
, : hidden layer vectors.
: input vector.
, , , : bias vector.
, , , : parameter matrices.
, : activation functions.
Feed-Forward
Backpropagation
GRU – Gated Recurrent Unit
Notation
: hidden layer vectors.
: input vector.
, , : bias vector.
, , : parameter matrices.
, : activation functions.
Feed-Forward
Backpropagation
Thanks for this very compact summary on recurrent networks!
I think I have found some minor inconsistencies with LSTM and GRU.
I think x_t is not the output vector but the input vector. And you split for RNN the signal at the end into output vector o_t and hidden vector h_t. You don’t do that for LSTM and GRU, although it seems like it would apply there, too.
You are very right, I will change. Thank you, I appreciate it!