If you want to create convincing output for some purpose, something that isn't real but seems to be, how do you do it? Generative Adversarial Networks (GANs) are an approach which holds a lot of promise. They provide a form of unsupervised learning, where a system can improve its performance over time without human feedback.
Improvement by competition
Imagine that you're an artist hoping to get rich by creating fake "previously undiscovered" Rembrandts. Your works have to be good enough to fool the experts. To get to that point, you could hire a top expert. You give him your forgeries along with real works and challenge him to tell which is which. You use his feedback to improve your deception. But at the same time, he's learning how to spot your fakes, so the better you get, the harder it is to fool him. This double feedback will eventually hone your abilities till you're a master forger. He'll come out as a better expert as well.
That's how a GAN works. A 2014 paper at the University of Montreal, by Ian Goodfellow and associates, proposed and defined the idea. It runs two models in competition with each other. It starts with a class of data, with the aim of imitating it as closely as possible. The generative model tries to produce data which is indistinguishable from real data belonging to the class. It passes its output to a discriminative model, which tries to determine whether its input belongs to the class. The discriminative model uses a set of representative data from the class, called training data. This is the "adversarial" part. The generative model is trying to fool the discriminative model, and the discriminative model is trying not to be fooled.
In effect, the generative and discriminative parts are competitors in a game. The generative model is trying to get its opponent to misidentify constructed data as authentic, and the discriminative model is trying to catch all the imitations.
It's almost as if the system is running a Turing test on itself. In Turing's classic paper, a computer program tries to get a human judge to think its conversation comes from another human. With a GAN, the discriminative model is the judge, and the attempt at imitation could center on any kind of data.
Not all GAN applications have imitation as their main objective. The goal could be to train a discrimination model which then serves on its own to distinguish between members and non-members of a class. This can be useful in many situations, such as identification of features in real-world environments.
GAN basics
The Montreal paper doesn't specify the design of the models, but convolutional neural networks (CNNs) have been used in much of the work. A CNN uses a series of data layers, with transformation functions applied to the data from one layer to the next.
A video by Goodfellow offers a good introduction to the concepts without going into the mathematical details.
It isn't enough to produce one sample or a narrow range of samples that pass for a member of the class. The goal is to produce an open-ended set of data, with significant variation, that can fool the discriminative model. The generative model does this by starting with random noise and modifying the transformation parameters in response to feedback. Its earliest attempts have no resemblance to its target, but it improves as it adjusts its parameters.
The discriminator generates a probability from 0 to 1.0 that the data item is real. The generator uses a loss function (also called a cost function) which is based on the probability. A low probability from the discriminator maps to a high loss value. A high loss value tells the generator its attempt was far off the mark, while a low one says it's close.
The generator needs a nonzero gradient in its loss function in order to improve. It will try to stay close to data that returns a loss value and change data with a high loss value.
As the generator improves, the ideal state produces a probability of 0.5 (not 1.0!) for all inputs. A perfect generative model produces fakes that are indistinguishable from the training data, so the discriminator can only guess by the equivalent of flipping a coin. The generator would like the discriminator to assign higher probabilities to the fakes than to the training data, but that can happen only if the discriminator is defective. Both models have achieved the best result possible, so they're stalemated.
Uses for GANs
So far, GANs have been used mostly for images. An interesting example is the reconstruction of images from poor-quality or damaged source material. The result won't necessarily be an exact match for the original, but it will be a plausible-looking high-quality image which is consistent with the original version.
Here are some other use cases for GANs which researchers have explored:
- Reconstructing 3-D objects from a single 2-dimensional view.
- Generating aged and rejuvenated versions of faces.
- Transforming a sketch to a photorealistic image and vice versa.
- Cracking simple forms of encryption.
- Modification of facial expressions.
- Generating images from textual descriptions.
- Detection of deceptive product and service reviews.
So far, GANs haven't seen much use in natural language processing. Ian Goodfellow has said there are difficulties to overcome in NLP. Generative models in a GAN make slight changes to improve the data they produce, but this approach doesn't lend itself to natural language. It isn't possible to make a slight, incremental change to a sentence and produce another coherent sentence. A modified approach might get around the difficulty, but so far this is an unresolved issue.
Difficulties in practice
Getting a GAN to produce useful results is difficult. In some cases, it won't converge even after a long time. This can happen through mode collapse. Realistic classes of data are usually multimodal; they consist of multiple clusters (modes), whose members are more similar to each other than to the rest of the class. In a mode collapse, the generative model produces data that all fall into the same mode. Even if the imitations are very good, the discriminative model will respond by assigning low probabilities to all data similar to that mode and high ones to everything else.
The GAN may oscillate rather than degenerating to a single mode. The generator may respond to the situation just described by imitating a different mode. The discriminator will catch up with it after a while and assign low probabilities to that mode.
Ironically, a discriminator which is too successful can bring a GAN to a standstill. If it detects all fakes with such confidence that it gives them a probability of zero, then the generator has no way to adapt. The gradient of its loss function is zero, so it lacks direction. Tuning the probability function so that it gives a nonzero probability to the better fakes can avoid this problem.
These difficulties, especially mode collapse, are a major issue with GANs. Generator models, like people, tend to "stick with what works," so getting them to cover the whole range of modes in the target class requires careful tuning.
Conditional GANs
An ordinary GAN treats all members of a class as equivalent. Adding some information can improve the likelihood and speed of convergence. In a conditional GAN, a label from the target class is applied to both the generator and the discriminator. The generator's task is not just to generate a data item which will fit into the class, but one which will fit the subset of the class which has that label.
This approach is considered semi-supervised rather than unsupervised learning.
Supporting software
Currently there aren't any out-of-the-box software frameworks for developing a GAN. Anyone working in this area needs to start with a solid grasp of convolutional neural networks and the software for building them. These packages can be useful in GAN development.
- TensorFlow, a framework for machine learning software, may be the most popular starting point for GANs. APIs are available in Python, JavaScript, C++, Java, Go, and Swift. The module tf.contrib.gan provides an infrastructure for training and evaluating a GAN.
- PyTorch, a Python framework for machine learning software, includes a package for building neural networks. It's based on Torch, which is no longer in active development.
Tutorials
The available tutorials on the Web tend to use Python and TensorFlow. All of the following rely on this basis.
- Towards Data Science offers a tutorial on using a GAN to draw human faces. The sample code is in Python and uses the TensorFlow library.
- John Glover presents an introduction to generative adversarial networks, also using Python and TensorFlow.
- Aayal Hayat and Dillon give a simple example of a GAN with just a few lines of code, along with graphs illustrating the GAN's performance.
- A more detailed example has been created by Diego Gomez Mosquera, using PyTorch as well as SensorFlow.
Further study
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks extends the original concept by proposing deep convolutional generative adversarial networks (DCGANs).
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets adds the idea of "disentangling" features of the model, so that the generator and discriminator develop knowledge about its structure.
Conditional Generative Adversarial Nets introduces the conditional version of GANs, where labels are supplied to the generator and discriminator.