Contractive Autoencoders
Basic Idea of Contractive Autoencoder
To add an explicit term in the loss that penalizes the solution and make the learned representation to be robust towards small changes around the training examples. It achieves that by using different penalty term imposed to the representation.
Contractive Loss Function
The loss function for the reconstruction term is ℓ2 loss as similar to previous Autoencoders. The penalty term is to calculate the representation’s jacobian matrix with regards of the training data. Calculating a jacobian of the hidden layer (h) with respect to input (X) is similar to gradient calculation. Recall than jacobian (J) is the generalization of gradient, i.e. when a function is a vector valued function, the partial derivative is a matrix called jacobian.
Hence, the loss function is as follows:
where
the penalty term is the Frobenius norm of the jacobian matrix, which is the sum squared over all elements inside the matrix. We could think Frobenius norm as the generalization of euclidean norm.
Let’s calculate the jacobian of the hidden layer of our autoencoder then. Let’s say:
where is sigmoid nonlinearity. That is, to get the hidden unit, we need to get the dot product of the feature and the corresponding weight. Then using chain rule:
Note: If we look at the above equation, the first term doesn’t depend on i. Hence, for all values of Wi, we want to multiply it with the correspondent hj.
To accomplish the task, we can form a diagonal matrix of the gradient of h.
Let be a diagonal matrix, the matrix form of the above derivative is as follows:
As our main objective is to calculate the norm, we could simplify that in our implementation so that we don’t need to construct the diagonal matrix:
Let us see a code snippet to evalute the CAE loss, which is composed as the summation of a Mean Squared Error and the weighted l2-norm of the Jacobian of the hidden units with respect to the inputs.
def loss_function(W, x, recons_x, h, lam):
""" Args:
`W` (FloatTensor): (N_hidden x N), where N_hidden and N are the
dimensions of the hidden units and input respectively.
`x` (Variable): the input to the network, with dims (N_batch x N)
recons_x (Variable): the reconstruction of the input, with dims
N_batch x N.
`h` (Variable): the hidden units of the network, with dims
batch_size x N_hidden
`lam` (float): the weight given to the jacobian regulariser term
Returns:
Variable: the (scalar) CAE loss
"""
mse = mse_loss(recons_x, x)
# Since: W is shape of N_hidden x N. So, we do not need to transpose it as
# opposed to #1
dh = h * (1 - h) # Hadamard product produces size N_batch x N_hidden
# Sum through the input dimension to improve efficiency, as suggested in #1
w_sum = torch.sum(Variable(W)**2, dim=1)
# unsqueeze to avoid issues with torch.mv
w_sum = w_sum.unsqueeze(1) # shape N_hidden x 1
contractive_loss = torch.sum(torch.mm(dh**2, w_sum), 0)
return mse + contractive_loss
Please run and play with the code to understand the details more.
Reference:
Reference:
- Rifai, Salah, et al. “Contractive auto-encoders: Explicit invariance during feature extraction.” Proceedings of the 28th international conference on machine learning (ICML-11). 2011.
Comments
Post a Comment