Recurrent Neural Network

A Network that processes “Sequences”
A RNN can process a sequence of vectors x by applying a recurrence formula at every time step

$$ h_t=f_W(h_{t-1},x_t) $$

$h_t$ is the state for time t and is based on $h_{t-1}$ and $x_t$
$f_W$ is the function with parameter W - Same function for every time step

(Vanilla) Recurrent Neural Network

$$ h_t=f_W(h_{t-1},x_t)\\h_t=tanh(W_{hh}h_{t-1}+W_{xh}x_t)\\y_t=W_{hy}h_t $$

Screen Shot 2022-07-19 at 8.25.18 PM.png

Above is an example of Many-to-Many

Sequence to Sequence (seq2seq)

(Many to One) + (One to Many)
- Many to One: Encode input sequence in a single vector
- One to Many: Produce output sequence from single input vector
Common Example: English to French Translator

Back-Propagation Through Time

Forward through entire sequence to compute loss, then backward through entire sequence to compute gradient
- Takes a lot of memory for long sequences

Truncated Back-Propagation Through time

Run forward and backward through chunks of sequence instead of whole sequence
Carry hidden states forward in time forever, but only back-propagate for some smaller number of steps

Implementing RNN