Evaluation of the performance of traditional machine learning algorithms, convolutional neural network and AutoML Vision in ultrasound breast lesions classification: A comparative study

Ka Wing Wan, Chun Hoi Wong, Ho Fung Ip, Dejian Fan, Pak Leung Yuen, Hoi Ying Fong, Michael Ying

Research output: Journal article publicationJournal articleAcademic researchpeer-review

6 Citations (Scopus)


Background: In recent years, there was an increasing popularity in applying artificial intelligence in the medical field from computer-aided diagnosis (CAD) to patient prognosis prediction. Given the fact that not all healthcare professionals have the required expertise to develop a CAD system, the aim of this study was to investigate the feasibility of using AutoML Vision, a highly automatic machine learning model, for future clinical applications by comparing AutoML Vision with some commonly used CAD algorithms in the differentiation of benign and malignant breast lesions on ultrasound. Methods: A total of 895 breast ultrasound images were obtained from the two online open-access ultrasound breast images datasets. Traditional machine learning models (comprising of seven commonly used CAD algorithms) with three content-based radiomic features (Hu Moments, Color Histogram, Haralick Texture) extracted, and a convolutional neural network (CNN) model were built using python language. AutoML Vision was trained in Google Cloud Platform. Sensitivity, specificity, F1 score and average precision (AUCPR) were used to evaluate the diagnostic performance of the models. Cochran's Q test was used to evaluate the statistical significance between all studied models and McNemar test was used as the post-hoc test to perform pairwise comparisons. The proposed AutoML model was also compared with the current related studies that involve similar medical imaging modalities in characterizing benign or malignant breast lesions. Results: There was significant difference in the diagnostic performance among all studied traditional machine learning classifiers (P<0.05). Random Forest achieved the best performance in the differentiation of benign and malignant breast lesions (accuracy: 90%; sensitivity: 71%; specificity: 100%; F1 score: 0.83; AUCPR: 0.90) which was statistically comparable to the performance of CNN (accuracy: 91%; sensitivity: 82%; specificity: 96%; F1 score: 0.87; AUCPR: 0.88) and AutoML Vision (accuracy: 86%; sensitivity: 84%; specificity: 88%; F1 score: 0.83; AUCPR: 0.95) based on Cochran's Q test (P>0.05). Conclusions: In this study, the performance of AutoML Vision was not significantly different from that of Random Forest (the best classifier among traditional machine learning models) and CNN. AutoML Vision showed relatively high accuracy and comparable to current commonly used classifiers which may prompt for future application in clinical practice.

Original languageEnglish
Pages (from-to)1381-1393
Number of pages13
JournalQuantitative Imaging in Medicine and Surgery
Issue number4
Publication statusPublished - Apr 2021


  • AutoML Vision
  • Breast cancer
  • Computer-aided diagnosis (CAD)
  • Machine learning
  • Ultrasonography

ASJC Scopus subject areas

  • Radiology Nuclear Medicine and imaging

Cite this