University has helped in many ways to overcome


University of Exeter Medical School (UEMS)

Evaluating the Potential for Machine Learning to Change Cancer Diagnosis in the Next 10 Years

Word Count: 2702

Student ID: 640038310 1-15-2018



Cancer is one of the most widely spread diseases that health professions struggle to face. It is one of the leading causes of death worldwide. Cancer is a very problematic disease to diagnose, yet the use of machine learning and artificial intelligence has helped in many ways to overcome this barrier. Machine learning is a branch of artificial intelligence that employs logarithmic equations to analyze statistical and probabilistic outcomes based on selected conditions. In other words, it allows computers to predict or calculate outcomes based on inputted data in relation to past examples. Machine learning has been applied to a new domain of cancer diagnosis, and that is cancer prediction. It is currently being determined whether the use of machine learning can be further broadened to aid in the future prediction of the disease at different stages. This includes three different aspects of cancer prediction which include cancer susceptibility, the chance of survival, and cancer recurrence in patients. Despite the promising outcomes of most publications, there are still several limiting factors which are restricting machine learning from becoming a reliable technique. Yet cancer prediction using machine learning does appear to be a possible tool of the future.













Cancer is one of the most problematic diseases that medical practitioners and researchers face day to day. It is a disease that can develop unknowingly at any time in both sexes, and at any age. It is estimated that roughly 38.5 percent of humans from both sexes will be diagnosed with cancer of any site at some point during their lifetime. 2 It is one of the leading causes of death worldwide and one of the most prevalent diseases currently spread around the world. It can affect one or several organ systems at the same time and can travel to damage different parts of the body. It is one of the most complex diseases to deal with for health professionals, as there are several factors that can cause this disease to occur. These include both environmental and genetic factors. 10

Cancer is caused by cells losing their ability to control their replication cycle. Therefore, cells begin to divide and control uncontrollably without the ability to stop this. This is due to damage caused by the genetic material (DNA) of the cell which leads to the production of damaged proteins within the cell which helps to control and maintain the cell cycle. 4 These faulty proteins then cannot perform their function correctly and lead to inconsistencies in the cell cycle. The cells, therefore, cannot control their cell cycle and divide and grow uncontrollably. This leads to the formation of huge masses called a tumor (malignant tumors) in the body. This causes other surrounding cells around these masses to die out as a tumor begins to limit the number of nutrients and blood acquired by these cells. This causes overall failure of organs and can lead to death if untreated. Cancer is a difficult disease to treat as its treatment methods are very invasive and harmful to the patient. This is due to cancerous cells needing to be killed through chemotherapy or surgery to terminate the disease from the body. In addition to this, diagnosing cancer is a complicated process which involves several blood tests, invasive procedures such as biopsies and physical tests such as rectal examinations. 17 Healthcare professionals are aiming to increase the early diagnosis and prediction of cancer to help individuals avoid developing the disease or being able to terminate it before it gets too serious. One method that is being investigated is by using a branch of artificial intelligence known as machine learning. This uses computer software and specific algorithms which analyses inputted data from current and past examples. Regarding cancer prediction, the aim is to develop fully predictable logarithmic equations for machines that are capable of analyzing patient data to determine the possibility of them developing cancer. 13 The use of artificial intelligence has been applied successfully to several fields such as cancer diagnosis, yet the theory of developing systems to carry out the prediction of cancer with a high degree of accuracy is currently under analysis due to the sensitivity of the application it is being used for. 1 Therefore, some research has been undertaken to aid in deciding on whether such an application would be reliable enough to the field of diagnostic medicine, especially in cancer diagnosis. The aim of this literature review is to investigate the potential for machine learning to change cancer diagnosis in the next ten years.




A Clear Understanding of Machine Learning:

In order to understand how machine learning can help change cancer diagnosis in the future, the details of machine learning’s way of function must be discussed. Machine learning is a branch of artificial intelligence that uses mathematical tools and algorithms in in addition to past examples of what is being investigated so that the machine can build new, unidentified patterns which would only be discovered through these algorithms. Machine learning is somewhat like statistics. Yet statistics are limited to the variables presented and can only be used to analyze the dataset given and give an answer based on the current variables. Machine learning, on the other hand, can take an adaptation from this and explore further options without being limited to the variables presented to it. Machine learning methods can employ conditional probabilities (probability of X given Y), absolute conditionality (then, if, else), Boolean logic (not, and, or), and unconventional optimization strategies to model data or classify patterns. Machine learning is heavily reliant on the use of statistics and probability, yet it is not limited to what is given to it and can adapt per each situation. In other words, it can learn from each situation. There are two main types of machine learning conditions under which data is presented: The first being supervised learning, and the second being unsupervised learning. In supervised learning, a set of data is inputted and the data is mapped to an output. In other words, the machine is shown how to analyze the data. In the unsupervised method, this differs as there is no notion of the output, and the pattern or correlation must be discovered without any indicators. Under these conditions, there are specific methods in which machines learn and process this data:

1. Logistic Regression (LR): Predicts the probability of something occurring.  

2. Decision Tree (DT) : a breakdown of the data that is created by applying a series of simple rules.

3. Random Forest (RT): using multiple Decision Trees with random samples and random attributes.

4. Artificial Neural Networks (ANNs): detecting complex nonlinear relationships in data

5. Support Vector Machines (SVMs): construct a set of hyperplanes that maximize the margin between two classes for classification.

These are some of the tasks that machine use to process data in the aim of making a prediction or correlation. Each method is used for specific data sets, and for different assumptions. 148111318

How Can Machine Learning Change Cancer Diagnosis?

Machine learning has been used in cancer diagnosis for several years now, yet it has been limited to the detection and diagnosis of cancers using biomarkers present during the presence of the diseases. 1 It has been used to detect cancers, distinguish their levels, and help treat them. This has been done by deploying algorithms which depend on the patient’s symptoms, clinical data, and sometimes imaging scans. All these factors can be presented into a machine which can analyze it to help determine if an individual may or may not suffer from a certain type of cancer, and if so how severe the diagnosis.

Yet health professionals are attempting to push the use of machine learning to a whole new level. This is the level of attempting to predict cancer in individuals before it becomes present in their body. This is being done by using several genetic, clinical and histological parameters as data to be analyzed for machines to identify any patterns that could be indicative of cancer development. Additionally, this has been broken into three further separate prediction charts. The first being predicating cancer before it has ever occurred in an individual. The second being predicting how long an individual can survive with cancer and how the treatment may or may not change the survivability of the individual, and the final being the probability of cancer reoccurrence for an individual who has already been diagnosed with the disease. 1

1.    Prediction of Cancer Susceptibility:

Currently, there are systems that are being introduced to help point out individuals who are more susceptible to developing cancer-based on several factors which include, but are not limited to, their family history, their lifestyle, and their genetic background. Yet the least amount of studies focuses on this aspect of cancer prediction as it is the most difficult.  The table below (Table 1a) displays some studies which have attempted to predict the susceptibility of different cancers in patients. All literature reviewed used different features and methods of machine learning that analyzed different data. Yet no study that had been analyzed has had full success in the susceptibility prediction of any type of cancer. 11

One study implemented different machine learning methods to predict the susceptibility of breast cancer in patients. This study used single nucleotide polymorphisms (SNPs) as inputted data to be analysed which was compared to control data. Machine learning was then able to then distinguish three SNPs as key discriminators between patients susceptible of breast cancer and control individuals. The results of this study were quite successful, yet were not fully accurate. 20

2.    Prediction of Cancer Recurrence:

Cancer, like some diseases, can return despite being fully treated. This is because either the cancer has not been fully terminated in the body, or because another type of cancer has developed. 19 Therefore, another aspect to of cancer prediction using machine learning has been applied to is predicting the recurrence of cancer in patients. This uses data which includes a combination of predictive variables which include clinical data such as patient age, tumour size, patient history, and genomic data. Protein biomarker information can also be included as they are important markers in the disease. Table (1B) shows some published research studies which use machine learning methods to predict the recurrence of cancer in individuals who have been treated for the disease once before. The prediction of this aspect has shown to be one of the most promising from all three factors with the highest accuracy yet it has not been applied to all types of cancer. 11

Another publication used data concerning 2441 of 2990 consecutive breast cancer patients to develop an artificial neural network (ANN) for the prediction of the probability of relapse over 5 years. Several prognostic factors were analysed and used, and it was compared to the normal classification of breast cancer. After rigorous analysis of all the patients, the study showed that the artificial neural network is able to identify patients with strikingly different risks of relapse within different classifications of breast cancer. 6

3.    Prediction of Cancer Survivability:

Another aspect that is hoping to be determined by machine learning is predicting the survival chances and longevity of the life of an individual who has been correctly diagnosed with the disease. Most literature published in relation to cancer prediction using machine learning is focused on predicting the negativity of life of individuals who have been diagnosed with the disease. These studies focus mainly on data that describes the severity of the disease, the current treatment in place in addition to physical and wellbeing factors. The table below (Table 1C) shows some different publications in which different machine learning methods were used to help predict the longitude of life for different patients diagnosed with different types of cancer. 11

Another study, which used artificial neural network to examine the accuracy of a predicting breast cancer survival. A series of 951 breast cancer patients were divided into a training set of 651 and a validation set of 300 patients. Eight variables were entered as input to the network: tumour size, axillary nodal status, histological type, mitotic count, nuclear pleomorphism, tubule formation, tumour necrosis, and age. The area under the ROC curve (AUC) was used as a measure of the accuracy of the prediction models in generating survival estimates for the patients in the independent validation set. An artificial neural network is very accurate in the 5-, 10- and 15-year breast-cancer-specific survival prediction. The consistently high accuracy over time and the good predictive performance of a network trained without information on nodal status demonstrate that neural networks can be important tools for cancer survival prediction. 9


Despite the promising results that most publications have demonstrated, there yet to remain current limitations to machine learning that have yet to allow it to become a fully reliable tool in the domain of cancer prediction.

One of the biggest limitations is providing enough data and examples to input into the data systems. When attempting to provide data to analyze, it becomes hard to grasp enough data to present to machines to run trials. This is because machine learning remains an untrusted method, therefore the willingness to use it in regards to such a sensitive matter decreases severely. Therefore, with the limited size of data that can be analyzed, it becomes more difficult for the algorithms to expand on the data given for them to see beyond the data that is being presented to them. This also restricts the machine’s capabilities of making more accurate and specific analysis due to the lack of variability of the data which is required for the system to indicate the susceptibility of cancer.

In addition to this, it is not only the quantity of data that raises an issue, but also the quality of the data presented. For machines to learn the correct way about the indicators of cancer and for them to be able to successfully establish a link that can help them detect cancer, they must be presented with data of high quality. This includes the relevant variables and the relevant examples for the machines to analyze. This is a vital limitation which is difficult to manage as information that is presented must be highly considered to deem it valuable or not. The consequences of this are that presenting machines with data that is invaluable can cause them to adapt and learn in an incorrect way, which can cause them to give out incorrect predictions based on the data they initially learned from.

Another limitation which is uncommon, yet still exists, the errors during the programming of the machines and their algorithms. This is a vital limitation as it can lead to incorrect analysis due to the faulty nature of the function of the program and method being used. This causes a major setback as it can lead to incorrect results despite all other factors and conditions being controlled. 1112122


Based on the analysis of the results of the publications it is clearly evident that the use of the different methods of machine learning, alongside the heterogeneous data that can be analyzed in predicting cancer diagnosis, will provide a promising tool in the future of diagnostic medicine. This will allow professionals in the medical field to overcome several barriers in regard to cancer diagnosis and treatment. Yet it must be added that the limitations, such as data quantity and quality, that currently withhold it from being used as a reliable tool must be overcome with great significance and until the accuracy of the predictions is fully proven. While most studies are mostly well built and rationally well validated, certainly greater attention to experimental design and implementation appears to be warranted, especially with respect to the quantity and quality of biological data. But looking at the current path of research and publications, we believe that if the quality of studies continues to improve. This is due to the types of training data being integrated, the kinds of endpoint predictions being made, the types of cancers being studied and the overall performance of these methods in predicting cancer susceptibility or outcomes. It is likely that the use of machine learning will become much more commonplace in many clinical and hospital settings and will become a frequently used and reliable tool in the field of diagnostic medicine.








1. Cruz J, Wishart D. Applications of Machine Learning in Cancer Prediction and Prognosis Internet. PubMed Central (PMC). 2006 cited 18 December 2017. Available from:

2. Cancer Stat Facts Internet. National Cancer Institute. 2018 cited 19 December 2017. Available from:

3. Ho N. Introduction to Machine Learning: Predict Cancer Diagnosis Internet. IBM. 2017 cited 19 December 2017. Available from:

4. Karplus A. Machine Learning Algorithms for Cancer Diagnosis Internet. UC Santa Cruz. 2012 cited 19 December 2017. Available from:

5. Marr B. How AI And Deep Learning Are Now Used To Diagnose Cancer Internet. Forbes. 2017 cited 19 December 2017. Available from:

6. De Laurentiis M, De Placido S, Bianco A, Clark G, Ravdin P. A prognostic model that makes quantitative estimates of probability of relapse for breast cancer patients. Internet. NCBI. 1999 cited 19 December 2017. Available from:

7. Hamamoto I, Okada S, Hashimoto T, Wakabayashi H, Maeba T, Maeta H. Prediction of the early prognosis of the hepatectomized patient with hepatocellular carcinoma with a neural network Internet. NCBI. 1995 cited 19 December 2017. Available from:

8. Rodvold D, McLeod D, Barndt J, Snow P, Murphy G. Introduction to artificial neural networks for physicians: taking the lid off the black box. – PubMed – NCBI Internet. NCBI. 2001 cited 19 December 2017. Available from:

9. Lundin M, Lundin J, Burke H, Toikkanen S, Pylkkänen L, Joensuu H. Artificial neural networks applied to survival prediction in breast cancer. – PubMed – NCBI Internet. NCBI. 1999 cited 19 December 2017. Available from:

10. Trichopoulos D, Li F, Hunter D. What Causes Cancer? Internet. 1996 cited 18 December 2017. Available from:

11. Kourou K, Exarchos T, Exarchos K, Karamouzis M, Fotiadis D. Machine learning applications in cancer prognosis and prediction. Science Direct. 2015.

12. Asri H, Mousannif H, Moatassime H, Noel T. Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis. Science Direct. 2016.

13. Nilsson N. Introduction to Machine Learning Internet. AI Stanford. 2015 cited 18 December 2017. Available from:

14. Bishop C. Pattern Recognition and Machine Learning Internet. Institute for Systems and Robotics. 2006 cited 19 December 2017. Available from:

15. Ahmad L, Eshlaghy A, Poorebrahimi A, Ebrahimi M, Razavi A. Using Three Machine Learning Techniques for Predicting Breast Cancer Recurrence Internet. Department of Management Information Systems, Science and Research Branch, Islamic Azad University of Tehran-Iran, Iran. 2013 cited 19 December 2017. Available from:

16. Gupta S, Tran T, Luo W, Phung D, Kennedy R, Broad A et al. Machine-learning prediction of cancer survival: a retrospective study using electronic administrative records and a cancer registry Internet. BMJ. 2014 cited 19 December 2017. Available from:

17. What Is Cancer? Internet. National Cancer Institute. 2015 cited 19 December 2017. Available from:

18. Yoon Lee T, Seo Y. Machine Learning for Breast Cancer Prediciton Internet. 2015 cited 19 December 2017. Available from:

19. When Cancer Comes Back Internet. National Cancer Institute. 2016 cited 19 December 2017. Available from:

20. Listgarten J, Damaraju S, Poulin B, Cook L, Dufour J, Driga A et al. Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms. American Association for Cancer Research. 2004.

21. Lee C, Yoon H. Medical big data: promise and challenges. NCBI. 2017.

22. Cabitza F, Rasoini R, Gensini G. Unintended Consequences of Machine Learning in Medicine. 2017.