Ladder Network
An upgrade over a Denoising Autoencoder.
- The encoder and decoder have the same number of layers.
- The softmaxed bottleneck is used for the predictions.
The overall loss function is built up of several individual components:
- for each corresponding layer pair of encoder(from start) and decoder(from end), one cost expression penalizes the squared difference between these (Euclidean distance).
- one component for the actual label prediction from the softmaxed bottleneck using a negative log-likelihood cost function.
- the final combination is a linear combination of the above with the coefficients of the layer components being hyperparameters.
- these hyperparameters help control the tradeoff between the classification and the encoding-decoding quality.
- In addition to jittering the input with guassian noise, each intermediate layer of the encoder is corrupted with guassian noise as well.
- this is only done during training and not when inferring a feature vector's prediction