Robustness and accuracy for monocular visual odometry (VO) under challenging environments are widely concerned. In this paper, we present a monocular VO system leveraging learned repeatability and description. In a hybrid scheme, the camera pose is initially tracked on the predicted repeatability maps in a direct manner and then refined with the patch-wise 3D-2D association. The local feature parameterization and the adapted mapping module further boost different functionalities in the system. Extensive evaluations on challenging public datasets are performed. The competitive performance on camera pose estimation demonstrates the effectiveness of our method. Additional studies on the local reconstruction accuracy and running time exhibit that our system is capable of maintaining a robust and lightweight backend.