Contractive Autoencoders

Basic Idea of Contractive Autoencoder

To add an explicit term in the loss that penalizes the solution and make the learned representation to be robust towards small changes around the training examples. It achieves that by using different penalty term imposed to the representation.

Contractive Loss Function

The loss function for the reconstruction term is ℓ2 loss as similar to previous Autoencoders. The penalty term is to calculate the representation’s jacobian matrix with regards of the training data. Calculating a jacobian of the hidden layer (h) with respect to input (X) is similar to gradient calculation. Recall than jacobian (J) is the generalization of gradient, i.e. when a function is a vector valued function, the partial derivative is a matrix called jacobian.

Hence, the loss function is as follows:

L = ‖ X - \hat{X} ‖_{2}^{2} + λ ‖ J_{h} (X) ‖_{F}^{2}

where

‖ J_{h} (X) ‖_{F}^{2} = \sum_{i j} {(\frac{\partial h_{j} (X)}{\partial X_{i}})}^{2}

the penalty term is the Frobenius norm of the jacobian matrix, which is the sum squared over all elements inside the matrix. We could think Frobenius norm as the generalization of euclidean norm.

Let’s calculate the jacobian of the hidden layer of our autoencoder then. Let’s say:

\begin{aligned} Z_{j} & = W_{i} X_{i} \\ h_{j} & = ϕ (Z_{j}) \end{aligned}

where

ϕ

is sigmoid nonlinearity. That is, to get the

j -th

hidden unit, we need to get the dot product of the

i -th

feature and the corresponding weight. Then using chain rule:

\begin{aligned} \frac{\partial h_{j}}{\partial X_{i}} & = \frac{\partial ϕ (Z_{j})}{\partial X_{i}} \\ = \frac{\partial ϕ (W_{i} X_{i})}{\partial W_{i} X_{i}} \frac{\partial W_{i} X_{i}}{\partial X_{i}} \\ = [ϕ (W_{i} X_{i}) (1 - ϕ (W_{i} X_{i}))] W_{i} \\ = [h_{j} (1 - h_{j})] W_{i} \end{aligned}

Note: If we look at the above equation, the first term doesn’t depend on i. Hence, for all values of Wi, we want to multiply it with the correspondent hj.

To accomplish the task, we can form a diagonal matrix of the gradient of h.

Let

d i a g (x)

be a diagonal matrix, the matrix form of the above derivative is as follows:

\frac{\partial h}{\partial X} = d i a g [h (1 - h)] W^{T}

As our main objective is to calculate the norm, we could simplify that in our implementation so that we don’t need to construct the diagonal matrix:

\begin{matrix} ∥ J_{h} (X) ∥_{F}^{2} & = \sum i j {(\frac{\partial h_{j}}{\partial X_{i}})}^{2} = \sum i \sum j [h_{j} (1 - h_{j})]^{2} (W_{j i}^{T})^{2} = \sum j [h_{j} (1 - h_{j})]^{2} \sum i (W_{j i}^{T})^{2} \end{matrix}

Let us see a code snippet to evalute the CAE loss, which is composed as the summation of a Mean Squared Error and the weighted l2-norm of the Jacobian of the hidden units with respect to the inputs.

def loss_function(W, x, recons_x, h, lam):

""" Args:

`W` (FloatTensor): (N_hidden x N), where N_hidden and N are the

dimensions of the hidden units and input respectively.

`x` (Variable): the input to the network, with dims (N_batch x N)

recons_x (Variable): the reconstruction of the input, with dims

N_batch x N.

`h` (Variable): the hidden units of the network, with dims

batch_size x N_hidden

`lam` (float): the weight given to the jacobian regulariser term

Returns:

Variable: the (scalar) CAE loss

"""

mse = mse_loss(recons_x, x)

# Since: W is shape of N_hidden x N. So, we do not need to transpose it as

# opposed to #1

dh = h * (1 - h) # Hadamard product produces size N_batch x N_hidden

# Sum through the input dimension to improve efficiency, as suggested in #1

w_sum = torch.sum(Variable(W)**2, dim=1)

# unsqueeze to avoid issues with torch.mv

w_sum = w_sum.unsqueeze(1) # shape N_hidden x 1

contractive_loss = torch.sum(torch.mm(dh**2, w_sum), 0)

return mse + contractive_loss

Please run and play with the code to understand the details more.

Reference:

Rifai, Salah, et al. “Contractive auto-encoders: Explicit invariance during feature extraction.” Proceedings of the 28th international conference on machine learning (ICML-11). 2011.

Search This Blog

Contractive Autoencoder