Nowadays, a precise video quality assessment (VQA) model is essential to maintain the quality of service (QoS). However, most existing VQA metrics are designed for specific purposes and ignore the spatiotemporal features of nature video. This paper proposes a novel general-purpose no-reference (NR) VQA metric adopting Long Short-Term Memory (LSTM) modules with the masking layer and pre-padding strategy, namely VQA-LSTM, to solve the above issues. First, we divide the distorted video into frames and extract some significant but also universal spatial and temporal features that could effectively reflect the quality of frames. Second, the data preprocessing stage and pre-padding strategy are used to process data to ease the training for our VQA-LSTM. Finally, a three-layer LSTM model incorporated with masking layer is designed to learn the sequence of spatial features as spatiotemporal features and learn the sequence of temporal features as the gradient of temporal features to evaluate the quality of videos. Two widely used VQA database, MCL-V and LIVE, are tested to prove the robustness of our VQA-LSTM, and the experimental results show that our VQA-LSTM has a better correlation with human perception than some state-of-the-art approaches.