Recurrent Neural Network
- A Network that processes “Sequences”
- A RNN can process a sequence of vectors x by applying a recurrence formula at every time step
$$
h_t=f_W(h_{t-1},x_t)
$$
- $h_t$ is the state for time t and is based on $h_{t-1}$ and $x_t$
- $f_W$ is the function with parameter W - Same function for every time step
(Vanilla) Recurrent Neural Network
$$
h_t=f_W(h_{t-1},x_t)\\h_t=tanh(W_{hh}h_{t-1}+W_{xh}x_t)\\y_t=W_{hy}h_t
$$
![Screen Shot 2022-07-19 at 8.25.18 PM.png](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/15a6cb42-583e-4dd4-8cf2-6175b9be4fcf/Screen_Shot_2022-07-19_at_8.25.18_PM.png)
- Above is an example of Many-to-Many
Sequence to Sequence (seq2seq)
Back-Propagation Through Time
- Forward through entire sequence to compute loss, then backward through entire sequence to compute gradient
- Takes a lot of memory for long sequences
Truncated Back-Propagation Through time
- Run forward and backward through chunks of sequence instead of whole sequence
- Carry hidden states forward in time forever, but only back-propagate for some smaller number of steps
Implementing RNN