Extracting top-k insights from multi-dimensional data

Bo Tang, Shi Han, Man Lung Yiu, Rui Ding, Dongmei Zhang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

98 Citations (Scopus)

Abstract

OLAP tools have been extensively used by enterprises to make better and faster decisions. Nevertheless, they require users to specify group-by attributes and know precisely what they are looking for. This paper takes the first attempt towards automatically extracting top-k insights from multi-dimensional data. This is useful not only for non-expert users, but also reduces the manual effort of data analysts. In particular, we propose the concept of insight which captures interesting observation derived from aggregation results in multiple steps (e.g., rank by a dimension, compute the percentage of measure by a dimension). An example insight is: "Brand B's rank (across brands) falls along the year, in terms of the increase in sales". Our problem is to compute the top-k insights by a score function. It poses challenges on (i) the effectiveness of the result and (ii) the efficiency of computation. We propose a meaningful scoring function for insights to address (i). Then, we contribute a computation framework for top-k insights, together with a suite of optimization techniques (i.e., pruning, ordering, specialized cube, and computation sharing) to address (ii). Our experimental study on both real data and synthetic data verifies the effectiveness and efficiency of our proposed solution.
Original languageEnglish
Title of host publicationSIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages1509-1524
Number of pages16
VolumePart F127746
ISBN (Electronic)9781450341974
DOIs
Publication statusPublished - 9 May 2017
Event2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017 - Hilton Chicago, Chicago, United States
Duration: 14 May 201719 May 2017

Conference

Conference2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017
Country/TerritoryUnited States
CityChicago
Period14/05/1719/05/17

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Extracting top-k insights from multi-dimensional data'. Together they form a unique fingerprint.

Cite this