The Barriers to Using Machine Learning for Personalized Rare Disease Diagnosis
“With over 6 000 known rare diseases and at least 3.5-5.9% of the global population battling such diseases, effective diagnostics are pivotal for favourable health outcomes.”
The modern era of medicine is characterized by a paradigm shift towards personalized diagnostics and treatment recommendations which are fine-tuned to each individual’s unique genome. This emerging field of research can be used for detecting rare diseases. With over 6 000 known rare diseases and at least 3.5-5.9% of the global population battling such diseases,1 effective diagnostics are pivotal for favourable health outcomes.2 By performinggenome-wide analyses, physicians can theoretically diagnose rare genetic diseases and initiate appropriate medical treatment more efficiently.
That said, precision medicine is not a magic bullet. As a premise, it faces two major barriers: excessive information and appropriate diagnosis. However, both of these issues may be overcome with technological innovation.
Firstly, the genome contains an enormous amount of information with many genetic changes that are near-impossible to manually sift through. However, this barrier can be overcome by modern computational analysis, in which algorithms systematically evaluate segments of the genome to identify mutations.
Secondly, it is important to note that not every genetic mutation will give rise to an adverse health outcome. Genetic data is extremely complex and requires careful interpretation. Alerting patients about mutations that are not clinically significant could do more harm than good, inducing unnecessary worry. Since it isn’t feasible for medical researchers to identify potentially harmful genetic sequences in the genome of every single patient, this necessitates a ‘smart’ entity. Through extensive training of artificial intelligence (AI) and machine learning (ML) systems, these technologies can independently generate diagnoses.3
However, integrating AI and ML into rare genetic disease diagnosis creates additional complications. For one, it is important to decide whether one should analyze the genome or the exome.3 The genome refers to the entire DNA sequence in our cells whereas the exome refers to only the parts that code for components of the proteins. In simpler words, the genetic information in our body (genome) contains specific segments of genetic information (exome) that encode for biologically-important proteins and enzymes (phenotypes). Making a choice between which trove of information to analyze can impact the types of results that are obtained. But it is important to keep in mind the constraint that whole-genome sequencing isn’t readily available as of yet (but this may be changing in the near future).4 After making that choice, the next consideration is about which genome/exome variant information is clinically significant. For example, when sequencing the exome, 60,000-100,000 variants are observed on average, but many of these variants are benign and unrelated to disease.3 In order to determine the significance of a variant, the software must be trained to distinguish between important and unimportant information by making genotype and phenotype associations or leverage pre-existing information about such associations. If AI and ML programs are trained in this way, in the near future they may be able to correlate phenotypic expression data with genetic variant data and make more accurate diagnoses.3
Despite considerable progress in this field, it is important to note two critical pitfalls of this research: differential focus on certain diseases over others and limited input data for AI and ML. In a scoping review examining international studies about AI and ML usage in rare disease, the authors found that rare diseases with relatively higher prevalence are still generally more explored whereas rarer diseases receive less attention.5 For example, amyotrophic lateral sclerosis, systemic lupus, moderate and severe traumatic brain injury, and cystic fibrosis had more published studies than rare skin and endocrine diseases.5 This means that many ultra-rare diseases are being excluded from diagnostic advancements. Additionally, the authors also note that image data (from MRIs and other medical-imaging instruments) are presently the most common input data for AI and ML applications.5 This is not immediately distressing, since image data is standardized and can be obtained in large volumes, which makes it accessible to AI and ML. However, excluding other important information sources, such as unstructured doctor notes in medical files and patient testimony, negates important supporting information that can be significant to diagnosis.5
Overall, the application of AI and ML to rare disease is a significant one that can improve the efficiency and accuracy of diagnostics internationally. However, in order to optimize the usage of these tools, future research must commit to an innovation-oriented approach that breaks down barriers and goes where previous technology has never gone before.
Vaishnavi Bhamidi
Works Cited:
About Rare Diseases. Eurordis, Rare Diseases Europe. https://www.eurordis.org/about-rare-diseases. Accessed December 1, 2020.
Nguengang Wakap S, Lambert DM, Olry A, et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur J Hum Genet. 2020;28(2):165-173. doi:10.1038/s41431-019-0508-0
Anderson D, Baynam G, Blackwell JM, Lassmann T. Personalised analytics for rare disease diagnostics. Nature Communications. 2019;10(1):5274. doi:10.1038/s41467-019-13345-5
Personal Genome Project shows whole genome sequencing may transform how Canadians manage their own health care. U of T News. February 5, 2018. https://www.utoronto.ca/news/personal-genome-project-shows-whole-genome-sequencing-may-transform-how-canadians-manage-their
Schaefer J, Lehne M, Schepers J, Prasser F, Thun S. The use of machine learning in rare diseases: a scoping review. Orphanet J Rare Dis. 2020;15(1):145. doi:10.1186/s13023-020-01424-6
Cite This Article:
Bhamidi V., Vytlingam K., Chowdhury F., Nakhoul R., Lombo L., Chharawala V. The Barriers to Using Machine Learning for Personalized Rare Disease Diagnosis. Illustrated by D. Amin. Rare Disease Review. October 2021. DOI:10.13140/RG.2.2.21400.72965