Is a type of artificial neural network where connections between nodes form a sequence. This allows temporal dynamic behavior for time sequence.

There are 3 types of vanilla recurrent neural network: the simple (RNN), gated recurrent unit (GRU) and long short term memory unit (LSTM).

#### RNN – Recurrent Neural Network

##### Notation

: input vector ().

: hidden layer vector .

: output vector .

: bias vector .

: parameter matrices .

: parameter matrix .

: activation functions.

##### Feed-Forward

##### Backpropagation

##### Example

- Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification).
- Sequence output (e.g. image captioning takes an image and outputs a sentence of words).
- Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment)
- Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French).
- Synced sequence input and output (e.g. video classification where we wish to label each frame of the video).

#### LSTM – Long-Short Term Memory

##### Notation

, : hidden layer vectors.

: input vector.

, , , : bias vector.

, , , : parameter matrices.

, : activation functions.

##### Feed-Forward

##### Backpropagation

#### GRU – Gated Recurrent Unit

##### Notation

: hidden layer vectors.

: input vector.

, , : bias vector.

, , : parameter matrices.

, : activation functions.

##### Feed-Forward

##### Backpropagation

Thanks for this very compact summary on recurrent networks!

I think I have found some minor inconsistencies with LSTM and GRU.

I think x_t is not the output vector but the input vector. And you split for RNN the signal at the end into output vector o_t and hidden vector h_t. You don’t do that for LSTM and GRU, although it seems like it would apply there, too.

You are very right, I will change. Thank you, I appreciate it!