Learning Deep Models with Primitive-Based Representations

Paschalidou, Despoina

doi:10.3929/ethz-b-000521013

Zur Kurzanzeige

dc.contributor.author

Paschalidou, Despoina

dc.contributor.supervisor

Van Gool, Luc

dc.contributor.supervisor

Geiger, Andreas

dc.contributor.supervisor

Ferrari, Vittorio

dc.contributor.supervisor

Tombari, Federico

dc.contributor.supervisor

Savva, Manolis

dc.date.accessioned

2021-12-16T12:07:52Z

dc.date.available

2021-12-16T11:47:53Z

dc.date.available

2021-12-16T12:07:52Z

dc.date.issued

2021

dc.identifier.uri

http://hdl.handle.net/20.500.11850/521013

dc.identifier.doi

10.3929/ethz-b-000521013

dc.description.abstract

Humans develop a common-sense understanding of the physical behaviour of the world, within the first year of their life. We are able to identify 3D objects in a scene, infer their geometric and physical properties, predict physical events in dynamic environments and act based on our interaction with the world. Our understanding of our surroundings relies heavily on our ability to properly reason about the arrangement of elements in a scene. Inspired by early works in cognitive science that stipulate that the human visual system perceives objects as a collection of semantically coherent parts and in turn uses them to easily associate unknown objects with object parts whose functionality is already known, researchers developed compositional representations capable of capturing the functional composition and spatial arrangement of objects and object parts in a scene. In the first two parts of this dissertation, we propose learning-based solutions for recovering the 3D object geometry using semantically consistent part arrangements. Finally, we introduce a network architecture that synthesizes indoor environments as object arrangements, whose functional composition and spatial configuration follows clear patterns that are directly inferred from data. First, we present an unsupervised learning-based approach for recovering shape abstractions using superquadric surfaces as atomic elements. We demonstrate that superquadrics lead to more expressive part decompositions while being easier to learn than cuboidal primitives. Moreover, we provide an analytical solution to the Chamfer loss which avoids the need for computational expensive reinforcement learning or iterative prediction. Next, we introduce a novel 3D primitive representation that defines primitives using an Invertible Neural Network (INN) that implements homeomorphic mappings between a sphere and the target object. Since this representation does not impose any constraint on the shape of the predicted primitives, they can capture complex geometries using an order of magnitude fewer parts than existing primitive-based representations. We consider this representation a first step towards bridging the gap between interpretable and high fidelity primitive-based reconstructions. Subsequently, we introduce a structure-aware representation that jointly recovers the geometry of a 3D object as a set of primitives as well as its latent hierarchical structure without any part-level supervision. Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives, where simple parts are represented with fewer primitives and more complex parts are modeled with more components. We demonstrate that considering the latent hierarchical layout of an object into parts facilitates reasoning about the 3D object geometry. Finally, we propose a neural network architecture for synthesizing indoor scenes by plausibly arranging objects within the scene boundaries. In particular, given a room type (e.g. bedroom, living room) and its shape, our model generates meaningful object arrangements by sequentially placing objects in a permutation-invariant fashion. In contrast to prior work, which poses scene synthesis as a sequence generation problem, our model generates rooms as unordered sets of objects. This allows us to perform various interactive scenarios such as room completion, failure case correction, object suggestions with user-provided constraints etc. To summarize, we propose novel primitive-based representations that do not limit the available shape vocabulary on a specific set of shapes such as cuboids, spheres, planes etc. Next, we introduce a structure-aware representation that considers part relationships and represents object parts with multiple levels of granularity, where geometrically complex parts are modeled with more components and simpler parts with fewer components. Finally, we propose a network architecture that generates indoor scenes by properly arranging objects within a room's boundaries. Our model enables new interactive applications for semi-automated scene authoring that were not possible before.

en_US

dc.format

application/pdf

en_US

dc.language.iso

en

en_US

dc.publisher

ETH Zurich

en_US

dc.rights.uri

http://rightsstatements.org/page/InC-NC/1.0/

dc.subject

Primitive-based representations

en_US

dc.subject

3D reconstruction

en_US

dc.subject

Structure-aware representations

en_US

dc.subject

Scene understanding

en_US

dc.subject

Scene synthesis

en_US

dc.subject

Interpretable representations

en_US

dc.subject

Unsupervised learning

en_US

dc.subject

Generative modelling

en_US

dc.title

Learning Deep Models with Primitive-Based Representations

en_US

dc.type

Doctoral Thesis

dc.rights.license

In Copyright - Non-Commercial Use Permitted

dc.date.published

2021-12-16

ethz.size

218 p.

en_US

ethz.code.ddc

DDC - DDC::0 - Computer science, information & general works::004 - Data processing, computer science

en_US

ethz.identifier.diss

28066

en_US

ethz.publication.place

Zurich

en_US

ethz.publication.status

published

en_US

ethz.leitzahl

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02652 - Institut für Bildverarbeitung / Computer Vision Laboratory::03514 - Van Gool, Luc (emeritus) / Van Gool, Luc (emeritus)

en_US

ethz.leitzahl.certified

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02652 - Institut für Bildverarbeitung / Computer Vision Laboratory::03514 - Van Gool, Luc (emeritus) / Van Gool, Luc (emeritus)

en_US

ethz.date.deposited

2021-12-16T11:47:59Z

ethz.source

FORM

ethz.eth

yes

en_US

ethz.availability

Open access

en_US

ethz.rosetta.installDate

2021-12-16T12:08:00Z

ethz.rosetta.lastUpdated

2022-03-29T16:37:44Z

ethz.rosetta.exportRequired

true

ethz.rosetta.versionExported

true

ethz.COinS

ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Learning%20Deep%20Models%20with%20Primitive-Based%20Representations&rft.date=2021&rft.au=Paschalidou,%20Despoina&rft.genre=unknown&rft.btitle=Learning%20Deep%20Models%20with%20Primitive-Based%20Representations

Printexemplar via ETH-Bibliothek suchen

Dateien zu diesem Eintrag

Name:: paschalidoud_thesis.pdf
Größe:: 77.17Mb
Format:: Adobe PDF
Label:: Full text

Download

Publikationstyp

Doctoral Thesis [30264]

Zur Kurzanzeige

Research Collection

Suche

Learning Deep Models with Primitive-Based Representations Mendeley CSV RIS BibTeX

Dateien zu diesem Eintrag

Publikationstyp

Learning Deep Models with Primitive-Based Representations

Mendeley

CSV

RIS

BibTeX