Open access
Date
2023-07Type
- Conference Paper
ETH Bibliography
yes
Altmetrics
Abstract
Data processing systems often leverage vector instructions to achieve higher performance. When applying vector instructions, an often overlooked data structure is the hash table, even though it is fundamental in data processing systems for operations such as indexing, aggregating, and joining. In this paper, we characterize and evaluate three fundamental vectorized hashing schemes, vectorized linear probing (VLP), vectorized fingerprinting (VFP), and bucket-based comparison (BBC). We implement these hashing schemes on the x86, ARM, and Power CPU architectures, as modern database systems must provide efficient implementations for multiple platforms due to the continuously increasing hardware heterogeneity. We present various implementation variants and platform-specific optimizations, which we evaluate for integer keys, string keys, large payloads, skewed distributions, and multiple threads. Our extensive evaluation and comparison to three scalar hashing schemes on four servers shows that BBC outperforms scalar linear probing by a factor of more than 2x, while also scaling well to high load factors. We find that vectorized hashing schemes come with caveats that need to be considered, such as the increased engineering overhead, differences between CPUs, and differences between vector ISAs, such as AVX and AVX-512, which impact performance. We conclude with key findings for vectorized hashing scheme implementations. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000624893Publication status
publishedExternal links
Journal / series
Proceedings of the VLDB EndowmentVolume
Pages / Article No.
Publisher
Association for Computing MachineryEvent
Organisational unit
09683 - Klimovic, Ana / Klimovic, Ana
Funding
204620 - MLin: Machine Learning Input Data Processing as a Service (SNF)
957407 - Integrated Data Analysis Pipelines for Large-Scale Data Management, HPC, and Machine Learning (EC)
More
Show all metadata
ETH Bibliography
yes
Altmetrics