Even when acoustic tokens vary substantially, they nevertheless can usually be recognized accurately. Two opposing models have been proposed to account for how the speech recognition mechanism works to achieve the perceptual consistency. The abstract model holds that there is a unitary cognitive representation for each phonological category. The speech signal, after having variations filtered out by a computational process, is matched to a particular representation. By contrast, the exemplar-based model holds that the previously encountered exemplars of a given speech category together form its mental representation. The speech recognition for this model involves searching for a match (based on similarity) between the incoming signal and stored exemplars. The present study tested which of these two models best fit data from second language acquisition. Mandarin speakers were trained with Cantonese tones that differed in acoustic variability. Results showed that training materials involving a large degree of within-class variability didn't produce a better learning outcome than those involving a small degree of variability, suggesting that the abstract model may provide a better fit for this data. The characteristics of Mandarin speakers' acquisition of Cantonese tones were also discussed.