Predicting monthly streamflow using data-driven models coupled with data-preprocessing techniques

C. L. Wu, Kwok Wing Chau, Yok Sheung Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

327 Citations (Scopus)


In this paper, the accuracy performance of monthly streamflow forecasts is discussed when using data-driven modeling techniques on the streamflow series. A crisp distributed support vectors regression (CDSVR) model was proposed for monthly streamflow prediction in comparison with four other models: autoregressive moving average (ARMA), K-nearest neighbors (KNN), artificial neural networks (ANNs), and crisp distributed artificial neural networks (CDANN). With respect to distributed models of CDSVR and CDANN, the fuzzy C-means (FCM) clustering technique first split the flow data into three subsets (low, medium, and high levels) according to the magnitudes of the data, and then three single SVRs (or ANNs) were fitted to three subsets. This paper gives a detailed analysis on reconstruction of dynamics that was used to identify the configuration of all models except for ARMA. To improve the model performance, the data-preprocessing techniques of singular spectrum analysis (SSA) and/or moving average (MA) were coupled with all five models. Some discussions were presented (1) on the number of neighbors in KNN; (2) on the configuration of ANN; and (3) on the investigation of effects of MA and SSA. Two streamflow series from different locations in China (Xiangjiaba and Danjiangkou) were applied for the analysis of forecasting. Forecasts were conducted at. four different horizons (1-, 3-, 6-, and 1 2-month-ahead forecasts). The results showed that models fed by preprocessed data performed better than models fed by original data, and CDSVR outperformed other models except for at a 6-month-ahead horizon for Danjiangkou. For the perspective of streamflow series, the SSA exhibited better effects on Danjingkou data because its raw discharge series was more complex than the discharge of Xiangjiaba. The MA considerably improved the performance of ANN, CDANN, and CDSVR by adjusting the correlation relationship between input components and output of models. It was also found that the performance of CDSVR deteriorated with the increase of the forecast horizon.
Original languageEnglish
Article numberW08432
JournalWater Resources Research
Issue number8
Publication statusPublished - 1 Aug 2009

ASJC Scopus subject areas

  • Water Science and Technology

Cite this