Accurate demand forecasting of different public transport modes (e.g., buses and light rails) is essential for public service operation. However, the development level of various modes often varies significantly, which makes it hard to predict the demand of the modes with insufficient knowledge and sparse station distribution (i.e., station-sparse mode). Intuitively, different public transit modes may exhibit shared demand patterns temporally and spatially in a city. As such, we propose to enhance the demand prediction of station-sparse modes with the data from station-intensive mode and design a Memory-Augmented Multi-task Re current Network (MATURE) to derive the transferable demand patterns from each mode and boost the prediction of station-sparse modes through adapting the relevant patterns from the station-intensive mode. Specifically, MATURE comprises three components: 1) a memory-augmented recurrent network for strengthening the ability to capture the long-short term information and storing temporal knowledge of each transit mode; 2) a knowledge adaption module to adapt the relevant knowledge from a station-intensive source to station-sparse sources; 3) a multi-task learning framework to incorporate all the information and forecast the demand of multiple modes jointly. The experimental results on a real-world dataset covering four public transport modes demonstrate that our model can promote the demand forecasting performance for the station-sparse modes.