Open access
Datum
2023-07Typ
- Conference Paper
ETH Bibliographie
yes
Altmetrics
Abstract
Data processing systems often leverage vector instructions to achieve higher performance. When applying vector instructions, an often overlooked data structure is the hash table, even though it is fundamental in data processing systems for operations such as indexing, aggregating, and joining. In this paper, we characterize and evaluate three fundamental vectorized hashing schemes, vectorized linear probing (VLP), vectorized fingerprinting (VFP), and bucket-based comparison (BBC). We implement these hashing schemes on the x86, ARM, and Power CPU architectures, as modern database systems must provide efficient implementations for multiple platforms due to the continuously increasing hardware heterogeneity. We present various implementation variants and platform-specific optimizations, which we evaluate for integer keys, string keys, large payloads, skewed distributions, and multiple threads. Our extensive evaluation and comparison to three scalar hashing schemes on four servers shows that BBC outperforms scalar linear probing by a factor of more than 2x, while also scaling well to high load factors. We find that vectorized hashing schemes come with caveats that need to be considered, such as the increased engineering overhead, differences between CPUs, and differences between vector ISAs, such as AVX and AVX-512, which impact performance. We conclude with key findings for vectorized hashing scheme implementations. Mehr anzeigen
Persistenter Link
https://doi.org/10.3929/ethz-b-000624893Publikationsstatus
publishedExterne Links
Zeitschrift / Serie
Proceedings of the VLDB EndowmentBand
Seiten / Artikelnummer
Verlag
Association for Computing MachineryKonferenz
Organisationseinheit
09683 - Klimovic, Ana / Klimovic, Ana
Förderung
204620 - MLin: Machine Learning Input Data Processing as a Service (SNF)
957407 - Integrated Data Analysis Pipelines for Large-Scale Data Management, HPC, and Machine Learning (EC)
ETH Bibliographie
yes
Altmetrics