MatrixMap: Programming abstraction and implementation of matrix computation for big data applications

Yaguang Huangfu, Jiannong Cao, Hongliang Lu, Guanqing Liang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

The computation core of many big data applications can be expressed as general matrix computations, including linear algebra operations and irregular matrix operations. However, existing parallel programming systems such as Spark do not have programming abstraction and efficient implementation for general matrix computations. In this paper, we present MatrixMap, a unified and efficient data-parallel system for general matrix computations. MatrixMap provides powerful yet simple abstraction, consisting of a distributed data structure called bulk key matrix and a computation interface defined by matrix patterns. Users can easily load data into bulk key matrices and program algorithms into parallel matrix patterns. MatrixMap outperforms current state-of-the-art systems by employing three key techniques: matrix patterns with lambda functions for irregular and linear algebra matrix operations, asynchronous computation pipeline with optimized data shuffling strategies for specific matrix patterns and in-memory data structure reusing data in iterations. Moreover, it can automatically handle the parallelization and distribute execution of programs on a large cluster. The experiment results show that MatrixMap is 12 times faster than Spark.
Original languageEnglish
Title of host publicationProceedings - 2015 IEEE 21st International Conference on Parallel and Distributed Systems, ICPADS 2015
PublisherIEEE Computer Society
Pages19-28
Number of pages10
Volume2016-January
ISBN (Electronic)9780769557854
DOIs
Publication statusPublished - 15 Jan 2016
Event21st IEEE International Conference on Parallel and Distributed Systems, ICPADS 2015 - Melbourne, Australia
Duration: 14 Dec 201517 Dec 2015

Conference

Conference21st IEEE International Conference on Parallel and Distributed Systems, ICPADS 2015
CountryAustralia
CityMelbourne
Period14/12/1517/12/15

Keywords

  • Big Data
  • Graph Processing
  • Machine Learning
  • Matrix Computation
  • Parallel Programming

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this