Deep Learning-Driven Compiler Enhancements for Efficient Matrix Multiplication

Raunak Kumar; Karma Chhering Negi; Nitish Kumar Sharma; Priya Gupta

doi:10.57159/gadl.jcmm.3.2.240122

Authors

Raunak Kumar School of Engineering, Jawaharlal Nehru University, New Delhi, India 110067
Karma Chhering Negi School of Engineering, Jawaharlal Nehru University, New Delhi, India 110067
Nitish Kumar Sharma School of Engineering, Jawaharlal Nehru University, New Delhi, India 110067
Priya Gupta Atal Bihari Vajpayee School of Management and Entrepreneurship, Jawaharlal Nehru University, New Delhi, India 110067

DOI:

https://doi.org/10.57159/gadl.jcmm.3.2.240122

Keywords:

Deep Learning, Matrix Multiplication, Compiler Optimization, Loop Tiling, High-Performance Computing

Abstract

Matrix multiplication is a fundamental operation in many computational fields, requiring optimization to handle increasing data sizes efficiently. In this paper, the implementation of Deep Learning in Matrix multiplication is reviewed, which is considered important nowadays due to the growing complexity of matrix multiplication for gaming and complex programs. The current standard matrix multiplication and the time taken by it on different matrix sizes are described. The Tiled Matrix multiplication, which trims the matrix into various pieces and calculates the product for each piece, and thereafter combines the result, is also described. The times taken by both methods for different matrix sizes were compared. The main idea was to use Deep Neural Networks (DNN) to compare and rank code variants that are obtained in pieces and determine their relative performance. A tournament-based ranking system is used for assigning ranks to the code versions. The effectiveness of these techniques was evaluated on various matrix multiplication operations commonly found in deep learning workloads. Up to 8.844x speedup over the naive implementation for a matrix size of 1024 is achieved by this approach. The results demonstrate the effectiveness of combining compiler optimization techniques and deep learning models in optimizing matrix multiplication.

References

K. Datta, M. Murphy, V. Volkov, S. Williams, and J. Carter, “Stencil computations on multicore architectures,” ACM Transactions on Architecture and Code Optimization, vol. 5, no. 3, 2008.

P. Gupta, M. T., M. Purushotham, S. L. J., V. N. R., and S. Nanda, “Efficient compiler design for a geometric shape domain-specific language: Emphasizing abstraction and optimization techniques,” EAI Endorsed Transactions on Scalable Information Systems, 2024.

L. Sun, C. Tang, Y. Jiang, X. Lian, and J. Guo, “A comprehensive survey on matrix multiplication optimization techniques for GPU,” Journal of Systems Architecture, vol. 117, p. 102097, 2021.

W. Shao, J. Zhang, W. Jiang, and X. Song, “Design and optimization of a matrix multiplication module for a ray tracing processor,” Journal of Systems Architecture, vol. 96, pp. 1–12, 2019.

P. Gupta, L. Y. Kumar, S. J. V. V. M. S. D., D. C. Kumar, and M. M. V. Chalapathi, “Design of efficient programming language with lexer using $-prefixed identifier,” EAI Endorsed Transactions on Scalable Information Systems, vol. 11, no. 2, 2024.

Z. Wan, Deep Learning & Optimizing Matrix Multiplication. Berlin: Penguin, 2019.

H. Ltaief and H. W. Lin, “Optimizing matrix multiplication on ARMv8-A processors,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, pp. 480–494, Feb 2017.

I. Labs and Oswal, AI-Powered Compiler Techniques for DL Code Optimization, 2021.

S. E. Kurt, A. Sukumaran-Rajam, F. Rastello, and P. Sadayappan, “Efficient tiled sparse matrix multiplication through matrix signatures,” in SC20: International Conference for High-Performance Computing, Networking, Storage and Analysis, pp. 1–14, 2020.

J. Gao, W. Ji, F. Chang, S. Han, B. Wei, Z. Liu, and Y. Wang, “A systematic survey of general sparse matrix-matrix multiplication,” ACM Computing Surveys, vol. 55, no. 12, pp. 1–36, 2023.

G. Moon, H. Kwon, G. Jeong, P. Chatarasi, S. Rajamanickam, and T. Krishna, “Evaluating spatial accelerator architectures with tiled matrix-matrix multiplication,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 4, pp. 1002–1014, 2021.

J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. E. Lefohn, and T. J. Purcell, “A survey of general-purpose computation on graphics hardware,” Computer Graphics Forum, vol. 26, no. 1, pp. 80–113, 2007.

G. Moon, H. Kwon, G. Jeong, P. Chatarasi, S. Rajamanickam, and T. Krishna, “Evaluating spatial accelerator architectures with tiled matrix-matrix multiplication,” ArXiv, 2021.

P. Gupta, R. Rahar, R. K. Yadav, A. Singh, Ramandeep, and S. Kumar, “Combining Forth and Rust: A robust and efficient approach for low-level system programming,” Engineering Proceedings, vol. 59, no. 1, p. 54, 2023.

S. Chandrasekharan, K. Kandasamy, and M. Mehendale, “Compiler optimization for high-performance computing: A survey,” ACM Computing Surveys (CSUR), vol. 51, no. 1, 2018.

L.-N. Pouchet, A. Cohen, and C. Bastoul, “Loop tiling for parallelism and locality in the polyhedral model,” Foundations and Trends in Programming Languages, vol. 6, no. 4, pp. 241–384, 2019.

Y. Wang, G. Yang, Y. Zhang, and Y. Yu, “Efficient parallelization of convolutional neural networks on multi-core CPUs,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 11, pp. 2543–2557, 2018.

S.-J. Yoo, S.-S. Park, and S.-I. Shin, “Cache-conscious optimization of matrix multiplication using deep reinforcement learning,” in Proceedings of the International Conference on Machine Learning, pp. 7246–7255, 2019.

Y. Sharma, R. Sijariya, and P. Gupta, “How deep learning can help in regulating the subscription economy to ensure sustainable consumption and production patterns (12th goal of SDGs),” in Deep Learning Technologies for the Sustainable Development Goals: Issues and Solutions in the Post-COVID Era, pp. 1–20, Singapore: Springer Nature Singapore, 2023.

S. Zhang, W. Ren, and X. Zhang, “Deeptiling: Deep learning based loop tiling for CPU and GPU architectures,” IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 3, pp. 645–658, 2021.

P. Gupta, A. Jha, B. Gupta, K. Sumpi, S. Sahoo, and M. M. V. Chalapathi, “Techniques and trade-offs in function inlining optimization,” EAI Endorsed Transactions on Scalable Information Systems, 2024.

L. Shen, Z. Guo, J. Fan, and H. Li, “Compiler optimization for matrix multiplication on GPU,” in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, pp. 21–29, 2015.

G. H. Golub and C. F. V. Loan, Matrix Computations. Baltimore, MD: Johns Hopkins University Press, 4th ed., 2013.

C. Wu, Y. Lai, X. Li, W. Ma, Y. Zhang, and C. Xu, “Polydl: A framework for polyhedral optimization of deep learning workloads,” IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 10, pp. 2307–2320, 2020.

S. E. Kurt, A. Sukumaran-Rajam, F. Rastello, and P. Sadayappan, “Efficient tiled sparse matrix multiplication through matrix signatures,” in SC20: International Conference for High-Performance Computing, Networking, Storage and Analysis, 2020.

D. Bajaj, U. Bharti, I. Gupta, P. Gupta, and A. Yadav, “GTMicro—Microservice identification approach based on deep NLP transformer model for greenfield developments,” International Journal of Information Technology, pp. 1–11, 2024.

Deep Learning-Driven Compiler Enhancements for Efficient Matrix Multiplication

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

Announcements

Important Notice: Please Update Your Profile Information in OJS

Information

Keywords