-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Exploring Deepfakes
By :

Deepfakes are a unique variation of a generative auto-encoder being used to generate the face swap. This requires a special structure, which we will explain in this section.
The particular type of neural network that regular deepfakes use is called a generative auto-encoder. Unlike a Generative Adversarial Network (GAN), an auto-encoder does not use a discriminator or any “adversarial” techniques.
All auto-encoders work by training a collection of neural network models to solve a problem. In the case of generative auto-encoders, the AI is used to generate a new image with new details that weren’t in the original image. However, with a normal auto-encoder, the problem is usually something such as classification (deciding what an image is), object identification (finding something inside an image), or segmentation (identifying different parts of an image). To do this, there are two types of models used in the autoencoder – the encoder and decoder. Let’s see how this works.
The training cycle is a cyclical process in which the model is continuously trained on images until stopped. The process can be broken down into four steps:
Figure 1.2 – Diagram of the training cycle
In more detail, the process unfolds as follows:
Figure 1.3 – Encoder and decoder
Note
The loss is where an auto-encoder differs from a GAN. In a GAN, the comparison loss is either replaced or supplemented with an additional network (usually an auto-encoder itself), which then produces a loss score of its own. The theory behind this structure is that the loss model (called a discriminator) can learn to get better at detecting the output of the generating model (called a generator) while the generator can learn to get better at fooling the discriminator.
Figure 1.4 – Loss and backpropagation
Once complete, the whole process starts back over at the encoder again. This continues to repeat until the neural network has finished training. The decision of when to end training can happen in several ways. It can happen when a certain number of repetitions have occurred (called iterations), when all the data has been gone through (called an epoch), or when the results meet a certain loss score.
GANs are one of the current darlings of generative networks. They are extremely popular and used extensively, being used particularly for super-resolution (intelligent upscaling), music generation, and even sometimes deepfakes. However, there are some reasons that they’re not used in all deepfake solutions.
GANs are popular due to their “imaginative” nature. They learn through the interaction of their generator and discriminator to fill in gaps in the data. Because they can fill in missing pieces, they are great at reconstruction tasks or at tasks where new data is required.
The ability of a GAN to create new data where it is missing is great for numerous tasks, but it has a critical flaw when used for deepfakes. In deepfakes, the goal is to replace one face with another face. An imaginative GAN would likely learn to fill the gaps in the data from one face with the data from the other. This leads to a problem that we call “identity bleed” where the two faces aren’t swapped properly; instead, they’re blended into a face that doesn’t look like either person, but a mix of the two.
This flaw in a GAN-created deepfake can be corrected or prevented but requires much more careful data collection and processing. In general, it’s easier to get a full swap instead of a blending by using a generative auto-encoder instead of a GAN.
Another name for an auto-encoder is an “hourglass” model. The reason for this is that each layer of an encoder is smaller than the layer before it while each layer of a decoder is larger than the one before. Because of this, the auto-encoder figure starts out large at the beginning, shrinks toward the middle, and then widens back out again as it reaches the end:
Figure 1.5 – Hourglass structure of an autoencoder
While these methods are flexible and have many potential uses, there are limitations. Let’s examine those limitations now.