Recurrent Neural Network
- A Network that processes “Sequences”
- A RNN can process a sequence of vectors x by applying a recurrence formula at every time step
$$
h_t=f_W(h_{t-1},x_t)
$$
- $h_t$ is the state for time t and is based on $h_{t-1}$ and $x_t$
- $f_W$ is the function with parameter W - Same function for every time step
(Vanilla) Recurrent Neural Network
$$
h_t=f_W(h_{t-1},x_t)\\h_t=tanh(W_{hh}h_{t-1}+W_{xh}x_t)\\y_t=W_{hy}h_t
$$

- Above is an example of Many-to-Many
Sequence to Sequence (seq2seq)
Back-Propagation Through Time
- Forward through entire sequence to compute loss, then backward through entire sequence to compute gradient
- Takes a lot of memory for long sequences
Truncated Back-Propagation Through time
- Run forward and backward through chunks of sequence instead of whole sequence
- Carry hidden states forward in time forever, but only back-propagate for some smaller number of steps
Implementing RNN