Real-world surveillance face images are usually of low-resolution (LR) because the faces are captured at a distance. Matching the LR query faces with high-resolution (HR) gallery faces is still challenging and remains an open problem. The existing face recognition networks fail to extract discriminative features from the LR face images as they never encounter any LR face images during training. One intuitive way to solve the problem is to randomly downsample the training face images to different resolutions for training. This implicitly makes the face recognition network invariant to the resolution change. To better address this problem, we propose to train a face recognition network using a deep Siamese network, which is simple yet effective. Firstly, a shared classifier is used to classify the deep features extracted from HR and LR facial image pairs, explicitly narrowing the domain gap between the HR and LR deep features. Secondly, on top of the deep Siamese network, a new loss function, namely the cross-resolution triplet loss, is used to pull the matching pairs further while pushing the non-matching pairs in the learned feature space. Therefore, the trained network can extract discriminative features across different resolutions. Experiments demonstrate the superiority of our proposed method on a synthetic LR face dataset, LFW, and two real-world LR face datasets, SCface and QMUL-SurvFace.