Show simple item record

dc.contributor.author
Tao, Ming
dc.contributor.author
Tang, Hao
dc.contributor.author
Wu, Fei
dc.contributor.author
Jing, Xiaoyuan
dc.contributor.author
Bao, Bing-Kun
dc.contributor.author
Xu, Changsheng
dc.date.accessioned
2023-01-10T09:43:27Z
dc.date.available
2023-01-06T06:14:17Z
dc.date.available
2023-01-10T09:43:27Z
dc.date.issued
2022
dc.identifier.isbn
978-1-6654-6946-3
en_US
dc.identifier.isbn
978-1-6654-6947-0
en_US
dc.identifier.other
10.1109/CVPR52688.2022.01602
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/590582
dc.description.abstract
Synthesizing high-quality realistic images from text descriptions is a challenging task. Existing text-to-image Generative Adversarial Networks generally employ a stacked architecture as the backbone yet still remain three flaws. First, the stacked architecture introduces the entanglements between generators of different image scales. Second, existing studies prefer to apply and fix extra networks in adversarial learning for text-image semantic consistency, which limits the supervision capability of these networks. Third, the cross-modal attention-based text-image fusion that widely adopted by previous works is limited on several special image scales because of the computational cost. To these ends, we propose a simpler but more effective Deep Fusion Generative Adversarial Networks (DF-GAN). To be specific, we propose: (i) a novel one-stage text-to-image backbone that directly synthesizes high-resolution images without entanglements between different generators, (ii) a novel Target-Aware Discriminator composed of Matching-Aware Gradient Penalty and One-Way Output, which enhances the text-image semantic consistency without introducing extra networks, (iii) a novel deep text-image fusion block, which deepens the fusion process to make a full fusion between text and visual features. Compared with current state-of-the-art methods, our proposed DF-GAN is simpler but more efficient to synthesize realistic and text-matching images and achieves better performance on widely used datasets. Code is available at https://github.com/tobran/DF-GAN.
en_US
dc.language.iso
en
en_US
dc.publisher
IEEE
en_US
dc.subject
Vision + language
en_US
dc.subject
Image and video synthesis and generation
en_US
dc.subject
Visualization
en_US
dc.subject
Computer vision
en_US
dc.subject
Codes
en_US
dc.subject
Semantics
en_US
dc.subject
Computer architecture
en_US
dc.subject
Generative adversarial networks
en_US
dc.subject
Generators
en_US
dc.title
DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis
en_US
dc.type
Conference Paper
dc.date.published
2022-09-27
ethz.book.title
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
en_US
ethz.pages.start
16494
en_US
ethz.pages.end
16504
en_US
ethz.event
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022)
en_US
ethz.event.location
New Orleans, LA, USA
en_US
ethz.event.date
June 18-24, 2022
en_US
ethz.identifier.wos
ethz.publication.place
Piscataway, NJ
en_US
ethz.publication.status
published
en_US
ethz.date.deposited
2023-01-06T06:14:19Z
ethz.source
WOS
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2023-01-10T09:43:28Z
ethz.rosetta.lastUpdated
2023-01-10T09:43:28Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=DF-GAN:%20A%20Simple%20and%20Effective%20Baseline%20for%20Text-to-Image%20Synthesis&rft.date=2022&rft.spage=16494&rft.epage=16504&rft.au=Tao,%20Ming&Tang,%20Hao&Wu,%20Fei&Jing,%20Xiaoyuan&Bao,%20Bing-Kun&rft.isbn=978-1-6654-6946-3&978-1-6654-6947-0&rft.genre=proceeding&rft_id=info:doi/10.1109/CVPR52688.2022.01602&rft.btitle=2022%20IEEE/CVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%20(CVPR)
 Search print copy at ETH Library

Files in this item

FilesSizeFormatOpen in viewer

There are no files associated with this item.

Publication type

Show simple item record