Optimizing Data-Centric Applications - A Machine Learning Approach to Optimizations in DaCe
![Thumbnail](/bitstream/handle/20.500.11850/679907/Masterthesis_Filip_Dobrosavljevic.pdf.jpg?sequence=6&isAllowed=y)
Open access
Author
Date
2024-06Type
- Master Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Program optimization is an increasingly important task due to the diverse range of hardware platforms available today. Manually finding code transformations that optimize the performance of programs is time-consuming and requires domain-specific expertise. Some compiler frameworks therefore resort to heuristics in combination with automatic optimization techniques that utilize machine learning-based cost models to predict how well code transformations for a specific program perform. In recent years, the DaCe Parallel Programming framework has been developed, aiming to separate program optimization from the underlying code by utilizing a data-centric intermediate representation. Despite this, DaCe does not implement a cost model and solely relies on heuristics for program optimization tasks, which can lead to sub-optimal results. Moreover, existing solutions like Daisytuner only partially solve the problem since they rely on loop optimizations in isolation. Thus, code transformations that span beyond loop nest boundaries, such as loop fusion, are disregarded. In this thesis, we show a cost model implementation for DaCe that achieves a MAPE of 21.7% when predicting the speedup of transformed programs. We generate the training data set through heuristic and search method-based approaches. In comparison to previous cost models, we utilize a comparatively much smaller data set for training. This significantly simplifies the data generation process for new hardware architectures. We show that in one instance our search space allows us to find optimizations leading to a program speedup of 1.51× over the suggested transformations by Daisytuner. By leveraging the data-centric paradigm of DaCe, we enable developers to automatically optimize programs within a framework designed to reduce data movement, thereby addressing one of the primary bottlenecks in contemporary hardware architectures. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000679907Publication status
publishedContributors
Examiner: Hoefler, Torsten
Examiner: Ivanov, Andrei
Examiner: Gianinazzi, Lukas
Examiner: Boudaoud, Afif
Publisher
ETH ZurichSubject
DaCe; Compiler Optimizations; Machine LearningOrganisational unit
03950 - Hoefler, Torsten / Hoefler, Torsten
More
Show all metadata
ETH Bibliography
yes
Altmetrics