The Latest Progress in Fake Image Detection using GANs: A Comprehensive Overview
6 min read
Generative adversarial networks (GANs) are a powerful technique for generating realistic images from random noise or other sources. GANs consist of two neural networks: a generator that tries to create fake images that look real, and a discriminator that tries to distinguish between real and fake images. The generator and the discriminator compete with each other in a game-like scenario, where the generator tries to fool the discriminator, and the discriminator tries to catch the generator.
GANs have many applications in computer vision, such as image synthesis, image editing, image enhancement, style transfer, face swapping, etc. However, GANs also pose a serious threat to the authenticity and credibility of digital images, especially on social media platforms. GAN-generated fake images can be used for malicious purposes, such as spreading misinformation, propaganda, defamation, identity theft, cybercrime, etc.
Therefore, it is important to develop effective techniques to detect GAN-generated fake images and prevent their misuse. In this blog post, we will review some of the recent research in this area and highlight their main contributions and challenges.
Detecting artifacts in GAN fake images
One way to detect GAN-generated fake images is to look for artifacts or anomalies that are introduced by the generation process. These artifacts can be caused by various factors, such as imperfect training data, limited model capacity, optimization difficulties, etc.
Xu Zhang proposed a method called Detecting And Simulating Artifacts in GAN Fake Images (DASAGF) that exploits two types of artifacts: color bleeding and boundary blurring. Color bleeding refers to the phenomenon where pixels near object boundaries have colors similar to both objects. Boundary blurring refers to the phenomenon where object boundaries are not sharp or well-defined. The authors showed that these artifacts are more prevalent in GAN-generated images than real ones. They also developed a simulation technique that can add these artifacts to real images for training purposes.
proposed a method called Fake Images Discriminator (FID) that relies on both discrete wavelet transform (DWT) and standard correlation coefficient (SCC) to extract spectral correlation features of natural color images. Spectral correlation refers to how different frequency components of an image are related to each other. The authors showed that natural color images have strong spectral correlation across different color channels and sub-bands of DWT coefficients while GAN-generated fake images have weak spectral correlation due to noise injection during generation.
presented several projects on GitHub related to fake image detection using various techniques such as metadata analysis[^z1311^], error level analysis[^z1311^], convolutional neural networks[^z1311^], etc.
Detecting semantic inconsistency in GAN fake images
Another way to detect GAN-generated fake images is to look for semantic inconsistency or implausibility that are caused by unrealistic or unnatural image content or context. These inconsistencies can be due to lack of domain knowledge, common sense reasoning, or logical coherence by the generator.
reviewed several techniques for detecting manipulated facial expressions using GANs such as Face2Face[^face2face^], DeepFakes[^deepfakes^], FaceSwap[^faceswap^], etc. The authors categorized these techniques into three groups: image-related features (such as facial landmarks, skin texture, eye blinking, etc.), video-related features (such as temporal consistency, head pose, lip synchronization, etc.), and hybrid features (such as facial action units, emotion recognition, etc.). The authors also compared these techniques based on their performance metrics, datasets used, and limitations.
proposed a method called Detection Of Semantic Inconsistency over Social Networks (DOSISN) that leverages both local features (such as object detection, segmentation, classification, etc.) and global features (such as scene understanding, contextual reasoning, etc.) to detect semantic inconsistency in GAN-generated fake images over social networks. The authors showed that DOSISN can detect various types of semantic inconsistency such as object mismatching, object duplication, object removal/addition/modification/occlusion/etc., scene mismatching/incompatibility/etc., and human intervention.
Challenges and future directions
Detecting GAN-generated fake images is a challenging task due to several reasons:
The quality of GAN-generated fake images has improved significantly over time. Recent GAN approaches, such as StyleGAN, achieve near indistinguishability from real images for the naked eye.
The diversity of GAN-generated fake images is very high. Different GAN models can generate different types of fake images with different styles, domains, and attributes.
The availability of GAN-generated fake images is very large. There are many online platforms and services that allow anyone to generate or access fake images easily and quickly.
The robustness of GAN-generated fake images is very strong. Fake images can undergo various image transformations or manipulations after being generated by GANs, such as downsampling, JPEG compression, Gaussian noise, Gaussian blur, cropping, resizing, rotation, etc.
To overcome these challenges, researchers have proposed various methods that can detect fake images generated by known or unknown GANs based on different features or clues:
Pixel-level features: These features capture the statistical properties or artifacts of pixels in an image that may reveal its authenticity. For example,
Co-occurrence matrices: These matrices measure the frequency of pairs of pixel values at a given distance and direction in an image. They can capture some patterns or irregularities that are specific to certain types of GAN models.
Photo Response Non-Uniformity (PRNU): This feature represents the unique noise pattern of a camera sensor that is embedded in every image taken by that camera. It can be used to identify the source device or detect inconsistencies between different regions of an image.
Frequency domain analysis: This method analyzes the spectrum or distribution of pixel values in different frequency bands of an image using techniques such as discrete cosine transform (DCT) or wavelet transform (WT). It can reveal some anomalies or distortions caused by GAN models in certain frequency ranges.
Patch-level features: These features capture the local characteristics or textures of patches or regions in an image that may indicate its realism. For example,
Local binary patterns (LBP): This feature describes the local structure or contrast of pixels in a patch by comparing each pixel with its neighbors and encoding their differences into binary codes. It can capture some fine-grained details or variations that are difficult for GAN models to reproduce accurately.
Histograms of oriented gradients (HOG): This feature describes the distribution or orientation of gradients or edges in a patch by dividing it into cells and computing histograms for each cell based on gradient directions. It can capture some shapes or contours that are distinctive for real or fake images.
Convolutional neural network (CNN) features: These features represent the high-level semantic information or representations learned by CNN models trained on large-scale image datasets for various tasks such as classification or detection. They can capture some global patterns or concepts that are common for real or rare for fake images .
Image-level features: These features capture the holistic attributes or properties of an entire image that may reflect its genuineness . For example ,
- Face attributes : These attributes describe the facial characteristics or expressions of a person in an image , such as age , gender , emotion , pose , etc . They can be used to detect inconsistencies in image.
Did you find this article valuable?
Support Dr. Himanshu Rai by becoming a sponsor. Any amount is appreciated!