DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

Tao, Ming; Tang, Hao; Wu, Fei; Jing, Xiaoyuan; Bao, Bing-Kun; Xu, Changsheng

doi:10.1109/CVPR52688.2022.01602

Show simple item record

dc.contributor.author

Tao, Ming

dc.contributor.author

Tang, Hao

dc.contributor.author

Wu, Fei

dc.contributor.author

Jing, Xiaoyuan

dc.contributor.author

Bao, Bing-Kun

dc.contributor.author

Xu, Changsheng

dc.date.accessioned

2023-01-10T09:43:27Z

dc.date.available

2023-01-06T06:14:17Z

dc.date.available

2023-01-10T09:43:27Z

dc.date.issued

2022

dc.identifier.isbn

978-1-6654-6946-3

en_US

dc.identifier.isbn

978-1-6654-6947-0

en_US

dc.identifier.other

10.1109/CVPR52688.2022.01602

en_US

dc.identifier.uri

http://hdl.handle.net/20.500.11850/590582

dc.description.abstract

Synthesizing high-quality realistic images from text descriptions is a challenging task. Existing text-to-image Generative Adversarial Networks generally employ a stacked architecture as the backbone yet still remain three flaws. First, the stacked architecture introduces the entanglements between generators of different image scales. Second, existing studies prefer to apply and fix extra networks in adversarial learning for text-image semantic consistency, which limits the supervision capability of these networks. Third, the cross-modal attention-based text-image fusion that widely adopted by previous works is limited on several special image scales because of the computational cost. To these ends, we propose a simpler but more effective Deep Fusion Generative Adversarial Networks (DF-GAN). To be specific, we propose: (i) a novel one-stage text-to-image backbone that directly synthesizes high-resolution images without entanglements between different generators, (ii) a novel Target-Aware Discriminator composed of Matching-Aware Gradient Penalty and One-Way Output, which enhances the text-image semantic consistency without introducing extra networks, (iii) a novel deep text-image fusion block, which deepens the fusion process to make a full fusion between text and visual features. Compared with current state-of-the-art methods, our proposed DF-GAN is simpler but more efficient to synthesize realistic and text-matching images and achieves better performance on widely used datasets. Code is available at https://github.com/tobran/DF-GAN.

en_US

dc.language.iso

en

en_US

dc.publisher

IEEE

en_US

dc.subject

Vision + language

en_US

dc.subject

Image and video synthesis and generation

en_US

dc.subject

Visualization

en_US

dc.subject

Computer vision

en_US

dc.subject

Codes

en_US

dc.subject

Semantics

en_US

dc.subject

Computer architecture

en_US

dc.subject

Generative adversarial networks

en_US

dc.subject

Generators

en_US

dc.title

DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

en_US

dc.type

Conference Paper

dc.date.published

2022-09-27

ethz.book.title

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

en_US

ethz.pages.start

16494

en_US

ethz.pages.end

16504

en_US

ethz.event

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022)

en_US

ethz.event.location

New Orleans, LA, USA

en_US

ethz.event.date

June 18-24, 2022

en_US

ethz.identifier.wos

000870783002030

ethz.publication.place

Piscataway, NJ

en_US

ethz.publication.status

published

en_US

ethz.date.deposited

2023-01-06T06:14:19Z

ethz.source

WOS

ethz.eth

yes

en_US

ethz.availability

Metadata only

en_US

ethz.rosetta.installDate

2023-01-10T09:43:28Z

ethz.rosetta.lastUpdated

2023-01-10T09:43:28Z

ethz.rosetta.versionExported

true

ethz.COinS

ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=DF-GAN:%20A%20Simple%20and%20Effective%20Baseline%20for%20Text-to-Image%20Synthesis&rft.date=2022&rft.spage=16494&rft.epage=16504&rft.au=Tao,%20Ming&Tang,%20Hao&Wu,%20Fei&Jing,%20Xiaoyuan&Bao,%20Bing-Kun&rft.isbn=978-1-6654-6946-3&978-1-6654-6947-0&rft.genre=proceeding&rft_id=info:doi/10.1109/CVPR52688.2022.01602&rft.btitle=2022%20IEEE/CVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%20(CVPR)

Search print copy at ETH Library

Files in this item

Files	Size	Format	Open in viewer
There are no files associated with this item.

Publication type

Conference Paper [35397]

Show simple item record

Research Collection

Search

DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis Mendeley CSV RIS BibTeX

Files in this item

Publication type

DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

Mendeley

CSV

RIS

BibTeX