Online ride-sharing platforms have become a critical part of the urban transportation system. Accurately recommending hotspots to drivers in such platforms is essential to help drivers find passengers and improve users' experience, which calls for efficient passenger demand prediction strategy. However, predicting multi-step passenger demand is challenging due to its high dynamicity, complex dependencies along spatial and temporal dimensions, and sensitivity to external factors (meteorological data and time meta). We propose an end-to-end deep learning framework to address the above problems. Our model comprises three components in pipeline: 1) a cascade graph convolutional recurrent neural network to accurately extract the spatial-temporal correlations within citywide historical passenger demand data; 2) two multi-layer LSTM networks to represent the external meteorological data and time meta, respectively; 3) an encoder-decoder module to fuse the above two parts and decode the representation to predict over multi-steps into the future. The experimental results on three real-world datasets demonstrate that our model can achieve accurate prediction and outperform the most discriminative state-of-the-art methods.