Face Super Resolution(FSR) is to infer High Resolution(HR) facial images from given Low Resolution(LR) ones with the assistance of LR and HR training pairs. Among existing methods, local patch based methods are superior in visual and objective quality than global based methods. These local patch based methods are based on the consistency assumption that the neighbors in HR/LR space form similar local geometry. But when LR images are with low quality, the LR space is seriously contaminated that even two distinct patches look similar, which means that the consistency assumption is not well held anymore. To this end, in this paper we introduce the contextual topological structure of target patch to improve the consistency. The contextual topological structure consists of the target patch as well as its adjacent patches, we explore the relationship between them based on statistical probability and apply the relationship for joint learning progress of mapping from LR to HR. By incorporating the contextual topological structure, the robustness to noise of approach is increased as well as the LR/HR consistency. The effectiveness of proposed method is verified both quantitatively and qualitatively.