摘要 :
Abstract The creation of an image from another and from different types of data including text, scene graph, and object layout, is one of the very challenging tasks in computer vision. In addition, capturing images from different ...
展开
Abstract The creation of an image from another and from different types of data including text, scene graph, and object layout, is one of the very challenging tasks in computer vision. In addition, capturing images from different views for generating an object or a product can be exhaustive and expansive to do manually. Now, using deep learning and artificial intelligence techniques, the generation of new images from different type of data has become possible. For that, a significant effort has been devoted recently to develop image generation strategies with a great achievement. To that end, we present in this paper, to the best of the authors’ knowledge, the first comprehensive overview of existing image generation methods. Accordingly, a description of each image generation technique is performed based on the nature of the adopted algorithms, type of data used, and main objective. Moreover, each image generation category is discussed by presenting the proposed approaches. In addition, a presentation of existing image generation datasets is given. The evaluation metrics that are suitable for each image generation category are discussed and a comparison of the performance of existing solutions is provided to better inform the state-of-the-art and identify their limitations and strengths. Lastly, the current challenges that are facing this subject are presented.
收起
摘要 :
Image conversion has attracted mounting attention due to its practical applications. This paper proposes a lightweight network structure that can implement unpaired training sets to complete one-way image mapping, based on the gen...
展开
Image conversion has attracted mounting attention due to its practical applications. This paper proposes a lightweight network structure that can implement unpaired training sets to complete one-way image mapping, based on the generative adversarial network (GAN) and a fixed-parameter edge detection convolution kernel. Compared with the cycle consistent adversarial network (CycleGAN), the proposed network features simpler structure, fewer parameters (only 37.48% of the parameters in CycleGAN), and less training cost (only 35.47% of the GPU memory usage and 17.67% of the single iteration time in CycleGAN). Remarkably, the cyclic consistency becomes not mandatory for ensuring the consistency of the content before and after image mapping. This network has achieved significant processing effects in some image translation tasks, and its effectiveness and validity have been well demonstrated through typical experiments. In the quantitative classification evaluation based on VGG-16, the algorithm proposed in this paper has achieved superior performance.
收起
摘要 :
Generative Adversarial Networks have recently demonstrated the capability to synthesize photo-realistic real-world images. However, they still struggle to offer high controllability of the output image, even if several constraints...
展开
Generative Adversarial Networks have recently demonstrated the capability to synthesize photo-realistic real-world images. However, they still struggle to offer high controllability of the output image, even if several constraints are provided as input. In this work, we present a Recursive Text-Image-Conditioned GAN (aRTIC GAN), a novel approach for multi-conditional image generation under concurrent spatial and text constraints. It employs few line drawings and short descriptions to provide informative yet human-friendly conditioning. The proposed scenario is based on accessible constraints with high degrees of freedom sketches are easy to draw and add strong restrictions on the generated objects, such as their orientation or main physical characteristics. Text on its side is so common and expressive that easily enforces information otherwise impossible to provide with minimal illustrations, such as objects components color, color shades, etc. Our aRTIC GAN is suitable for the sequential generation of multiple objects due to its compact design. In fact, the algorithm exploits the previously generated image in conjunction with the sketch and the text caption, resulting in a recurrent approach. We developed three network blocks to tackle the fundamental problems of catching captions’ semantic meanings and of handling the trade-off between smoothing grid-pattern artifacts and visual detail preservation. Furthermore, a compact three-task discriminator (covering global, local and textual aspects) was developed to preserve a lightweight and robust architecture. Extensive experiments proved the validity of aRTIC GAN and show that the combined use of sketch and description allows us to avoid explicit object labeling.
收起
摘要 :
Generative Adversarial Networks (GANs) have been extremely successful in various application domains such as computer vision, medicine, and natural language processing. Moreover, transforming an object or person to a desired shape...
展开
Generative Adversarial Networks (GANs) have been extremely successful in various application domains such as computer vision, medicine, and natural language processing. Moreover, transforming an object or person to a desired shape become a well-studied research in the GANs. GANs are powerful models for learning complex distributions to synthesize semantically meaningful samples. However, there is a lack of comprehensive review in this field, especially lack of a collection of GANs loss-variant, evaluation metrics, remedies for diverse image generation, and stable training. Given the current fast GANs development, in this survey, we provide a comprehensive review of adversarial models for image synthesis. We summarize the synthetic image generation methods, and discuss the categories including image-to-image translation, fusion image generation, label-to image mapping, and text-to-image translation. We organize the literature based on their base models, developed ideas related to architectures, constraints, loss functions, evaluation metrics, and training datasets. We present milestones of adversarial models, review an extensive selection of previous works in various categories, and present insights on the development route from the model-based to data-driven methods. Further, we highlight a range of potential future research directions. One of the unique features of this review is that all software implementations of these GAN methods and datasets have been collected and made available in one place at https://github.com/pshams55/GAN-Case-Study.
收起
摘要 :
Recent deep generative models allow real-time generation of hair images
from sketch inputs. Existing solutions often require a user-provided binary
mask to specify a target hair shape. This not only costs users extra labor but
...
展开
Recent deep generative models allow real-time generation of hair images
from sketch inputs. Existing solutions often require a user-provided binary
mask to specify a target hair shape. This not only costs users extra labor but
also fails to capture complicated hair boundaries. Those solutions usually
encode hair structures via orientation maps, which, however, are not very effective
to encode complex structures. We observe that colored hair sketches
already implicitly define target hair shapes as well as hair appearance and
are more flexible to depict hair structures than orientation maps. Based on
these observations, we present SketchHairSalon, a two-stage framework for
generating realistic hair images directly from freehand sketches depicting desired hair structure and appearance. At the first stage, we train a network
to predict a hair matte from an input hair sketch, with an optional set of
non-hair strokes. At the second stage, another network is trained to synthesize
the structure and appearance of hair images from the input sketch
and the generated matte. To make the networks in the two stages aware of
long-term dependency of strokes, we apply self-attention modules to them.
To train these networks, we present a new dataset containing thousands
of annotated hair sketch-image pairs and corresponding hair mattes. Two
efficient methods for sketch completion are proposed to automatically complete
repetitive braided parts and hair strokes, respectively, thus reducing
the workload of users. Based on the trained networks and the two sketch
completion strategies, we build an intuitive interface to allow even novice
users to design visually pleasing hair images exhibiting various hair structures
and appearance via freehand sketches. The qualitative and quantitative
evaluations show the advantages of the proposed system over the existing
or alternative solutions.
收起
摘要 :
Researchers have recently begun exploring the use of StyleGAN-based models for real image editing. One particularly interesting application is using natural language descriptions to guide the editing process. Existing approaches f...
展开
Researchers have recently begun exploring the use of StyleGAN-based models for real image editing. One particularly interesting application is using natural language descriptions to guide the editing process. Existing approaches for editing images using language either resort to instance-level latent code optimization or map predefined text prompts to some editing directions in the latent space. However, these approaches have inherent limitations. The former is not very efficient, while the latter often struggles to effectively handle multi-attribute changes. To address these weaknesses, we present CLIPInverter, a new text-driven image editing approach that is able to efficiently and reliably perform multi-attribute changes. The core of our method is the use of novel, lightweight text-conditioned adapter layers integrated into pretrained GAN-inversion networks. We demonstrate that by conditioning the initial inversion step on the Contrastive Language-Image Pre-training (CLIP) embedding of the target description, we are able to obtain more successful edit directions. Additionally, we use a CLIP-guided refinement step to make corrections in the resulting residual latent codes, which further improves the alignment with the text prompt. Our method outperforms competing approaches in terms of manipulation accuracy and photo-realism on various domains including human faces, cats, and birds, as shown by our qualitative and quantitative results.
收起
摘要 :
Impressive progress has been made recently in image-to-image translation using generative adversarial networks (GANs). However, existing methods often fail in translating source images with noise to target domain. To address this ...
展开
Impressive progress has been made recently in image-to-image translation using generative adversarial networks (GANs). However, existing methods often fail in translating source images with noise to target domain. To address this problem, we joint image-to-image translation with image denoising and propose an enhanced generative adversarial network (EGAN). In particular, built upon pix2pix, we introduce residual blocks in the generator network to capture deeper multi-level information between source and target image distribution. Moreover, a perceptual loss is proposed to enhance the performance of image-to-image translation. As demonstrated through extensive experiments, our proposed EGAN can alleviate effects of noise in source images, and outperform other state-of-the-art methods significantly. Furthermore, we experimentally indicate that the proposed EGAN is also effective when applied to image denoising.
收起