Learning Deep Models with Primitive-Based Representations
dc.contributor.author
Paschalidou, Despoina
dc.contributor.supervisor
Van Gool, Luc
dc.contributor.supervisor
Geiger, Andreas
dc.contributor.supervisor
Ferrari, Vittorio
dc.contributor.supervisor
Tombari, Federico
dc.contributor.supervisor
Savva, Manolis
dc.date.accessioned
2021-12-16T12:07:52Z
dc.date.available
2021-12-16T11:47:53Z
dc.date.available
2021-12-16T12:07:52Z
dc.date.issued
2021
dc.identifier.uri
http://hdl.handle.net/20.500.11850/521013
dc.identifier.doi
10.3929/ethz-b-000521013
dc.description.abstract
Humans develop a common-sense understanding of the physical behaviour of the
world, within the first year of their life. We are able to identify 3D objects
in a scene, infer their geometric and physical properties, predict physical
events in dynamic environments and act based on our interaction with the world.
Our understanding of our surroundings relies heavily on our ability to properly
reason about the arrangement of elements in a scene. Inspired by early works in
cognitive science that stipulate that the human visual system perceives objects
as a collection of semantically coherent parts and in turn uses them to easily
associate unknown objects with object parts whose functionality is already
known, researchers developed compositional representations capable of capturing
the functional composition and spatial arrangement of objects and object parts
in a scene. In the first two parts of this dissertation, we propose learning-based
solutions for recovering the 3D object geometry using semantically consistent
part arrangements. Finally, we introduce a network architecture that
synthesizes indoor environments as object arrangements, whose
functional composition and spatial configuration follows clear patterns
that are directly inferred from data.
First, we present an unsupervised learning-based approach for recovering shape
abstractions using superquadric surfaces as atomic elements. We demonstrate
that superquadrics lead to more expressive part decompositions while being
easier to learn than cuboidal primitives. Moreover, we provide an analytical
solution to the Chamfer loss which avoids the need for computational expensive
reinforcement learning or iterative prediction.
Next, we introduce a novel 3D primitive representation that defines
primitives using an Invertible Neural Network (INN) that implements homeomorphic
mappings between a sphere and the target object. Since this representation does
not impose any constraint on the shape of the predicted primitives, they can
capture complex geometries using an order of magnitude fewer parts than
existing primitive-based representations. We consider this representation a
first step towards bridging the gap between interpretable and high fidelity
primitive-based reconstructions.
Subsequently, we introduce a structure-aware representation that jointly recovers
the geometry of a 3D object as a set of primitives as well as its latent
hierarchical structure without any part-level supervision. Our model recovers
the higher level structural decomposition of various objects in the form of a
binary tree of primitives, where simple parts are represented with fewer
primitives and more complex parts are modeled with more components. We
demonstrate that considering the latent hierarchical layout of an object into
parts facilitates reasoning about the 3D object geometry.
Finally, we propose a neural network architecture for synthesizing indoor scenes
by plausibly arranging objects within the scene boundaries. In particular,
given a room type (e.g. bedroom, living room) and its shape, our model
generates meaningful object arrangements by sequentially placing objects in a
permutation-invariant fashion. In contrast to prior work, which poses scene
synthesis as a sequence generation problem, our model generates rooms as unordered sets
of objects. This allows us to perform various interactive scenarios such as
room completion, failure case correction, object suggestions with
user-provided constraints etc.
To summarize, we propose novel primitive-based representations that do not
limit the available shape vocabulary on a specific set of shapes such as
cuboids, spheres, planes etc. Next, we introduce a structure-aware
representation that considers part relationships and represents object parts with
multiple levels of granularity, where geometrically complex parts are modeled
with more components and simpler parts with fewer components. Finally, we
propose a network architecture that generates indoor scenes by properly
arranging objects within a room's boundaries. Our model enables new interactive
applications for semi-automated scene authoring that were not possible before.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.subject
Primitive-based representations
en_US
dc.subject
3D reconstruction
en_US
dc.subject
Structure-aware representations
en_US
dc.subject
Scene understanding
en_US
dc.subject
Scene synthesis
en_US
dc.subject
Interpretable representations
en_US
dc.subject
Unsupervised learning
en_US
dc.subject
Generative modelling
en_US
dc.title
Learning Deep Models with Primitive-Based Representations
en_US
dc.type
Doctoral Thesis
dc.rights.license
In Copyright - Non-Commercial Use Permitted
dc.date.published
2021-12-16
ethz.size
218 p.
en_US
ethz.code.ddc
DDC - DDC::0 - Computer science, information & general works::004 - Data processing, computer science
en_US
ethz.identifier.diss
28066
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02652 - Institut für Bildverarbeitung / Computer Vision Laboratory::03514 - Van Gool, Luc (emeritus) / Van Gool, Luc (emeritus)
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02652 - Institut für Bildverarbeitung / Computer Vision Laboratory::03514 - Van Gool, Luc (emeritus) / Van Gool, Luc (emeritus)
en_US
ethz.date.deposited
2021-12-16T11:47:59Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2021-12-16T12:08:00Z
ethz.rosetta.lastUpdated
2022-03-29T16:37:44Z
ethz.rosetta.exportRequired
true
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Learning%20Deep%20Models%20with%20Primitive-Based%20Representations&rft.date=2021&rft.au=Paschalidou,%20Despoina&rft.genre=unknown&rft.btitle=Learning%20Deep%20Models%20with%20Primitive-Based%20Representations
Files in this item
Publication type
-
Doctoral Thesis [30292]