Multispectral pedestrian detection has attracted extensive attention, as paired RGB-thermal images can provide complementary patterns to deal with illumination changes in realistic scenarios. However, most of the existing deep-learning-based multispectral detectors extract features from RGB and thermal inputs separately, and fuse them by a simple concatenation operation. This fusion strategy is suboptimal, as undifferentiated concatenation for each region and feature channel may hamper the optimal selection of complementary features from different modalities. To address this limitation, in this paper, we propose an attention-based cross-modality interaction (ACI) module, which aims to adaptively highlight and aggregate the discriminative regions and channels of the feature maps from RGB and thermal images. The proposed ACI module is deployed into multiple layers of a two-branch-based deep architecture, to capture the cross-modal interactions from diverse semantic levels, for illumination-invariant pedestrian detection. Experimental results on the public KAIST multispectral pedestrian benchmark show that the proposed method achieves state-of-the-art detection performance.