Components of Convolutional Network
Fully connected Layers
- 32 x 32 x 3 image → stretch to 4072 x 1
- Multiply Weight with dimension 10 x 3072 to output 10 values
Convolution Layers
- Instead of stretching the data, the input image preserves spacial structure: 32 x 32 x 3
- Weight matrix also has such dimension. i.e. 5 x 5 x 3 filter
- The weight matrix needs the same weight
- 1 output number = result of dot product between filter and small chunk in input data
- Output would be a 1 x 28 x 28 matrix (called Activation Map)
- Usually there are multiple filters of size 5 x 5 x 3, so the Convolution layer would be sth like. 6 x 5 x 5 x 3, and out put would be 6 activation maps, with each size 1 x 28 x 28
- Each layer filters would also have a bias vector
- Also, the input will usually be a batch of images
Convolutional Layer Dimensions
Input: $N\times C_{in}\times H\times W$
- N, H, W: Number of images in the mini-batch, Height, Width
- $C_{in}$ : number of channels for each image (3 for RGB)
filters: $C_{out}\times C_{in}\times K_w \times K_h$
Output: $N\times C_{out}\times H'\times W'$
When doing convolution, the size shrinks based on size of filter
- So we need to use padding
- Input: W
- Filter: K