TY - JOUR
T1 - Distance measures in building informatics
T2 - An in-depth assessment through typical tasks in building energy management
AU - Li, Ao
AU - Fan, Cheng
AU - Xiao, Fu
AU - Chen, Zhijie
N1 - Funding Information:
The authors gratefully acknowledge the support of this research by National Key Research and Development Program of China (2021YFE0107400) and the Research Grants Council of the Hong Kong SAR (152133/19E).
Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2022/3/1
Y1 - 2022/3/1
N2 - Distance measurement (also known as similarity measurement) is used to evaluate pairwise similarities between data samples. It has been widely used in diverse building informatics research and applications to classify or cluster massive building data with the aim of improving prediction accuracy, identifying operation patterns, benchmarking and diagnosing building performance, etc. Various distance measures have been adopted to measure the distance/similarity of building data. However, the intrinsic complexity and diversity of building operational data bring considerable difficulties to the selection of a suitable distance measure for a specific task. There is a strong and urgent need for a comprehensive review and systematic comparison of existing distance measures in building informatics. This study provides a comprehensive review of various distance measures and their applications in building operational data analysis. A systematic comparison is undertaken based on two typical tasks relying on building informatics, i.e., building energy usage pattern recognition, and clustering-based weather data segmentation for the customized development of building energy prediction models. Nine widely adopted distance measures have been reviewed and compared, including Euclidean distance, Chebyshev distance, Manhattan distance, Mahalanobis distance, Hausdorff distance, Pearson correlation distance, Dynamic Time Warping, Edit distance on Real Sequence, and Cosine distance. Novel internal and external clustering validation approaches based on the cross-test and prediction accuracy are proposed and adopted to compare the clustering performance. The results in case studies showed that weather data clustering using the Cosine distance and Pearson correlation distance helps to obtain better energy prediction results in terms of MAPE (13.22% and 12.91%, respectively) than the commonly-used Euclidean distance (13.99%). The results also revealed that better clustering performance does not necessarily lead to higher prediction accuracy. The research results and insights obtained are valuable to guide distance-based research in building informatics.
AB - Distance measurement (also known as similarity measurement) is used to evaluate pairwise similarities between data samples. It has been widely used in diverse building informatics research and applications to classify or cluster massive building data with the aim of improving prediction accuracy, identifying operation patterns, benchmarking and diagnosing building performance, etc. Various distance measures have been adopted to measure the distance/similarity of building data. However, the intrinsic complexity and diversity of building operational data bring considerable difficulties to the selection of a suitable distance measure for a specific task. There is a strong and urgent need for a comprehensive review and systematic comparison of existing distance measures in building informatics. This study provides a comprehensive review of various distance measures and their applications in building operational data analysis. A systematic comparison is undertaken based on two typical tasks relying on building informatics, i.e., building energy usage pattern recognition, and clustering-based weather data segmentation for the customized development of building energy prediction models. Nine widely adopted distance measures have been reviewed and compared, including Euclidean distance, Chebyshev distance, Manhattan distance, Mahalanobis distance, Hausdorff distance, Pearson correlation distance, Dynamic Time Warping, Edit distance on Real Sequence, and Cosine distance. Novel internal and external clustering validation approaches based on the cross-test and prediction accuracy are proposed and adopted to compare the clustering performance. The results in case studies showed that weather data clustering using the Cosine distance and Pearson correlation distance helps to obtain better energy prediction results in terms of MAPE (13.22% and 12.91%, respectively) than the commonly-used Euclidean distance (13.99%). The results also revealed that better clustering performance does not necessarily lead to higher prediction accuracy. The research results and insights obtained are valuable to guide distance-based research in building informatics.
KW - Clustering
KW - Distance measure
KW - Pattern recognition
KW - Time-series analysis
UR - http://www.scopus.com/inward/record.url?scp=85122971632&partnerID=8YFLogxK
U2 - 10.1016/j.enbuild.2021.111817
DO - 10.1016/j.enbuild.2021.111817
M3 - Journal article
AN - SCOPUS:85122971632
SN - 0378-7788
VL - 258
JO - Energy and Buildings
JF - Energy and Buildings
M1 - 111817
ER -