Analyzing Vectorized Hash Tables Across CPU Architectures
dc.contributor.author
Böther, Maximilian
dc.contributor.author
Benson, Lawrence
dc.contributor.author
Klimovic, Ana
dc.contributor.author
Rabl, Tilmann
dc.date.accessioned
2023-09-07T07:59:04Z
dc.date.available
2023-08-01T01:16:47Z
dc.date.available
2023-08-02T07:00:03Z
dc.date.available
2023-09-07T07:59:04Z
dc.date.issued
2023-07
dc.identifier.issn
2150-8097
dc.identifier.other
10.14778/3611479.3611485
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/624893
dc.identifier.doi
10.3929/ethz-b-000624893
dc.description.abstract
Data processing systems often leverage vector instructions to achieve higher performance. When applying vector instructions, an often overlooked data structure is the hash table, even though it is fundamental in data processing systems for operations such as indexing, aggregating, and joining. In this paper, we characterize and evaluate three fundamental vectorized hashing schemes, vectorized linear probing (VLP), vectorized fingerprinting (VFP), and bucket-based comparison (BBC). We implement these hashing schemes on the x86, ARM, and Power CPU architectures, as modern database systems must provide efficient implementations for multiple platforms due to the continuously increasing hardware heterogeneity. We present various implementation variants and platform-specific optimizations, which we evaluate for integer keys, string keys, large payloads, skewed distributions, and multiple threads. Our extensive evaluation and comparison to three scalar hashing schemes on four servers shows that BBC outperforms scalar linear probing by a factor of more than 2x, while also scaling well to high load factors. We find that vectorized hashing schemes come with caveats that need to be considered, such as the increased engineering overhead, differences between CPUs, and differences between vector ISAs, such as AVX and AVX-512, which impact performance. We conclude with key findings for vectorized hashing scheme implementations.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
Association for Computing Machinery
en_US
dc.rights.uri
http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.title
Analyzing Vectorized Hash Tables Across CPU Architectures
en_US
dc.type
Conference Paper
dc.rights.license
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
dc.date.published
2023-08-24
ethz.journal.title
Proceedings of the VLDB Endowment
ethz.journal.volume
16
en_US
ethz.journal.issue
11
en_US
ethz.journal.abbreviated
Proc. VLDB Endow.
ethz.pages.start
2755
en_US
ethz.pages.end
2768
en_US
ethz.version.deposit
publishedVersion
en_US
ethz.event
49th International Conference on Very Large Data Bases (VLDB 2023)
en_US
ethz.event.location
Vancouver, Canada
en_US
ethz.event.date
August 28 - September 1, 2023
en_US
ethz.grant
MLin: Machine Learning Input Data Processing as a Service
en_US
ethz.grant
Integrated Data Analysis Pipelines for Large-Scale Data Management, HPC, and Machine Learning
en_US
ethz.identifier.wos
ethz.publication.place
New York, NY
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02663 - Institut für Computing Platforms / Institute for Computing Platforms::09683 - Klimovic, Ana / Klimovic, Ana
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02663 - Institut für Computing Platforms / Institute for Computing Platforms::09683 - Klimovic, Ana / Klimovic, Ana
en_US
ethz.grant.agreementno
204620
ethz.grant.agreementno
957407
ethz.grant.fundername
SNF
ethz.grant.fundername
EC
ethz.grant.funderDoi
10.13039/501100001711
ethz.grant.funderDoi
10.13039/501100000780
ethz.grant.program
Projekte MINT
ethz.grant.program
H2020
ethz.date.deposited
2023-08-01T01:16:47Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2023-09-07T07:59:06Z
ethz.rosetta.lastUpdated
2024-02-03T03:21:52Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Analyzing%20Vectorized%20Hash%20Tables%20Across%20CPU%20Architectures&rft.jtitle=Proceedings%20of%20the%20VLDB%20Endowment&rft.date=2023-07&rft.volume=16&rft.issue=11&rft.spage=2755&rft.epage=2768&rft.issn=2150-8097&rft.au=B%C3%B6ther,%20Maximilian&Benson,%20Lawrence&Klimovic,%20Ana&Rabl,%20Tilmann&rft.genre=proceeding&rft_id=info:doi/10.14778/3611479.3611485&
Files in this item
Publication type
-
Conference Paper [35648]