Electronic Thesis and Dissertation Repository

Inverse Mapping of Generative Adversarial Networks

Nicky Bayat, The University of Western Ontario

Abstract

Generative adversarial networks (GANs) synthesize realistic samples (image, audio, video, etc.) from a random latent vector. While many studies have explored various training configurations and architectures for GANs, the problem of inverting a generative model to extract latent vectors of given input images/audio has been inadequately investigated. Although there is exactly one generated output per given random vector, the mapping from an image/audio to its recovered latent vector can have more than one solution. We train a deep residual neural network (ResNet18) architecture to recover a latent vector for a given target that can be used to generate a face image or a spoken digit audio nearly identical to the target. Here we focus on precise latent vector recovery of human face and voice. We use a perceptual loss to embed texture details in the recovered latent vector while maintaining quality using a reconstruction loss. The vast majority of studies on latent vector recovery perform well only on synthesized examples, we argue that our method can be used to determine a mapping between real human faces and latent-space vectors that contain most of the important face style details. In addition, our proposed method projects generated faces to their latent-space with high fidelity and speed. Applying a few further gradient descent steps on predicted latent vectors of face can further improve performance, however the hybrid technique does not help audio inverse mapping. Our audio inverse mapper can reconstruct both synthesized and real spoken digits with high quantitative and qualitative accuracy. At last, we demonstrate the performance of our approach on both real and generated examples.