| Peer-Reviewed

Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction

Received: 17 October 2018     Published: 18 October 2018
Views:       Downloads:
Abstract

Breast cancer is the most common invasive cancer in women and the second main cause of cancer death in females, which can be classified Benign or Malignant. Research and prevention on breast cancer have attracted more concern of researchers in recent years. On the other hand, the development of data mining methods provides an effective way to extract more useful information from complex database, and some prediction, classification and clustering can be made according to extracted information. In this study, to explore the relationship between breast cancer and some attributes so that the death probability of breast cancer can be reduced, five different classification models including Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Neural Network (NN) and Logistics Regression (LR) are used for the classification of two different datasets related to breast cancer: Breast Cancer Coimbra Dataset (BCCD) and Wisconsin Breast Cancer Database (WBCD). Three indicators including prediction accuracy values, F-measure metric and AUC values are used to compare the performance of these five classification models. comparative experiment analysis shows that random forest model can achieve better performance and adaptation than other four methods. Therefore, the model of this study is approved to possess clinical and referential values in practical applications.

Published in Applied and Computational Mathematics (Volume 7, Issue 4)
DOI 10.11648/j.acm.20180704.15
Page(s) 212-216
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2018. Published by Science Publishing Group

Keywords

Data Mining, Breast Cancer, Classification Models, Prediction

References
[1] Harbeck, N. & Gnant, M. (2017). Breast cancer. The Lancet, 389, 1134-1150.
[2] Wass, J. (2007). The R language. Scientific Computing, 24, 40-41.
[3] Patrício, M., Pereira, J., & Crisóstomo, J. et al. (2018). Using resistin, glucose, age, and BMI to predict the presence of breast cancer. BMC Cancer, 18, 21-29.
[4] Chaurasia, V., Pal, S., & Tiwari, B. B. (2018). Prediction of benign and malignant breast cancer using data mining techniques. Journal of Algorithms & Computational Technology, 12(2), 119-126.
[5] Cakir, A. & Demirel, B. (2011). A software tool for determination of breast cancer treatment methods using data mining approach. Journal of Medical Systems, 35(6), 1503-1511.
[6] Takada, M., Sugimoto, M., & Ohno, S. et al. (2012). Prediction of the pathological response to neoadjuvant chemotherapy in patients with primary breast cancer using a data mining technique. Breast Cancer Research and Treatment, 134(2), 661-670.
[7] Liu, X. Q., Li, Q. M., & Li, T. (2017). Differentially private classification with decision tree ensemble. Applied Soft Computing, 62, 807-816.
[8] O’Neil, G. L., Goodhall, J. L., & Watson, L. T. (2018). Evaluating the potential for site-specific modification of LiDAR DEM derivatives to improve environmental planning-scale wetland identification using random forest classification. Journal of Hydrology, 559, 192-208.
[9] Zhang, H., Gao, C., & Zhang, M. (2017). Prediction of soil organic carbon in an intensively managed reclamation zone of eastern China: a comparison of multiple linear regressions and the random forest model. Science of the Total Environment, 592, 704-713.
[10] Li, L., Paxton, E. W., & Fan, J. (2017). Predicting risk for adverse health events using random forest. Journal of Applied Statistics, 45(12), 2279-2294.
[11] Clark, J. W. (1991). Neural network modeling. Physics in Medicine & Biology, 36, 1259-1317.
[12] Suthar, V., Tarmizi, R. A., & Midi, H. et al. (2010). Students’ belief on mathematics and achievement of university students: logistic regression analysis. Procedia-Social and Behavioral Science, 8, 525-531.
Cite This Article
  • APA Style

    Yixuan Li, Zixuan Chen. (2018). Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction. Applied and Computational Mathematics, 7(4), 212-216. https://doi.org/10.11648/j.acm.20180704.15

    Copy | Download

    ACS Style

    Yixuan Li; Zixuan Chen. Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction. Appl. Comput. Math. 2018, 7(4), 212-216. doi: 10.11648/j.acm.20180704.15

    Copy | Download

    AMA Style

    Yixuan Li, Zixuan Chen. Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction. Appl Comput Math. 2018;7(4):212-216. doi: 10.11648/j.acm.20180704.15

    Copy | Download

  • @article{10.11648/j.acm.20180704.15,
      author = {Yixuan Li and Zixuan Chen},
      title = {Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction},
      journal = {Applied and Computational Mathematics},
      volume = {7},
      number = {4},
      pages = {212-216},
      doi = {10.11648/j.acm.20180704.15},
      url = {https://doi.org/10.11648/j.acm.20180704.15},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.acm.20180704.15},
      abstract = {Breast cancer is the most common invasive cancer in women and the second main cause of cancer death in females, which can be classified Benign or Malignant. Research and prevention on breast cancer have attracted more concern of researchers in recent years. On the other hand, the development of data mining methods provides an effective way to extract more useful information from complex database, and some prediction, classification and clustering can be made according to extracted information. In this study, to explore the relationship between breast cancer and some attributes so that the death probability of breast cancer can be reduced, five different classification models including Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Neural Network (NN) and Logistics Regression (LR) are used for the classification of two different datasets related to breast cancer: Breast Cancer Coimbra Dataset (BCCD) and Wisconsin Breast Cancer Database (WBCD). Three indicators including prediction accuracy values, F-measure metric and AUC values are used to compare the performance of these five classification models. comparative experiment analysis shows that random forest model can achieve better performance and adaptation than other four methods. Therefore, the model of this study is approved to possess clinical and referential values in practical applications.},
     year = {2018}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction
    AU  - Yixuan Li
    AU  - Zixuan Chen
    Y1  - 2018/10/18
    PY  - 2018
    N1  - https://doi.org/10.11648/j.acm.20180704.15
    DO  - 10.11648/j.acm.20180704.15
    T2  - Applied and Computational Mathematics
    JF  - Applied and Computational Mathematics
    JO  - Applied and Computational Mathematics
    SP  - 212
    EP  - 216
    PB  - Science Publishing Group
    SN  - 2328-5613
    UR  - https://doi.org/10.11648/j.acm.20180704.15
    AB  - Breast cancer is the most common invasive cancer in women and the second main cause of cancer death in females, which can be classified Benign or Malignant. Research and prevention on breast cancer have attracted more concern of researchers in recent years. On the other hand, the development of data mining methods provides an effective way to extract more useful information from complex database, and some prediction, classification and clustering can be made according to extracted information. In this study, to explore the relationship between breast cancer and some attributes so that the death probability of breast cancer can be reduced, five different classification models including Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Neural Network (NN) and Logistics Regression (LR) are used for the classification of two different datasets related to breast cancer: Breast Cancer Coimbra Dataset (BCCD) and Wisconsin Breast Cancer Database (WBCD). Three indicators including prediction accuracy values, F-measure metric and AUC values are used to compare the performance of these five classification models. comparative experiment analysis shows that random forest model can achieve better performance and adaptation than other four methods. Therefore, the model of this study is approved to possess clinical and referential values in practical applications.
    VL  - 7
    IS  - 4
    ER  - 

    Copy | Download

Author Information
  • School of Mathematics and Statistics, University of Sheffield, Sheffield, UK

  • School of Information, Zhejiang University of Finance and Economics, Hangzhou, China

  • Sections