Predictive Deep Learning Applications in Ophthalmology

Neslihan Dilruba Koseoglu, TY Alvin Liu

The term ‘artificial intelligence’ was coined by a group of scientists during a workshop known as the “Dartmouth Summer Research Project on Artificial Intelligence” in 1956.¹Expand Reference The concept was based on the idea that “intelligent human behavior consisted in processes that could be formalized and reproduced in machine”.¹Expand Reference The subfield of machine learning (ML) aims to train algorithms to recognize patterns from a large amount of data using extracted features. However, these features require manual extraction, which is labour intensive. Examples of this method include random forests, support vector machines and decision trees.²Expand Reference Currently, deep learning (DL) is considered the state-of-the-art ML technique, and these DL algorithms are largely preferred in medical image analyses. In contrast to classical ML, DL algorithms do not require manual feature extraction and typically involve multi-layered artificial neural networks (NN).^2–4234 In ophthalmology, DL models have been successfully applied to various imaging modalities, including colour fundus photography (CFP), optical coherence tomography (OCT) and visual field (VF) testing.

Broadly speaking, DL models have been deployed in ophthalmology for the following purposes: classification, segmentation and prediction. Most of the published studies to date have focused on classification tasks, such as classifying whether a particular colour fundus image contains referral diabetic retinopathy (DR). In recent years, predictive DL models have become an area of particular interest for researchers, as they could be used as clinical decision-support tools. They also could, within certain contexts, make predictions that are beyond the capabilities of human clinicians.

In this review, we aim to highlight predictive DL models by organizing published manuscripts in this area into the following themes: structure-structure prediction, structure-function prediction, disease onset/progression prediction and treatment response prediction. In addition, we focus on three major diseases that can lead to blindness, namely age-related macular degeneration (AMD), DR and glaucoma.

Methods

The PubMed database was searched for original investigations published between January 2017 and March 2023 using the following keywords: “deep learning”, “artificial intelligence”, “prediction”, “age-related macular degeneration”, “diabetic retinopathy” and “glaucoma”. Initially, 77 original research articles were identified. Studies that included only classical ML were excluded from our final review.

Structure-structure prediction

CFP, an easily accessible imaging tool, is widely used as a screening modality for retinal and optic nerve head pathologies.^5–7567 The advent of OCT has further revolutionized ophthalmology, as it can image ocular tissue noninvasively, with micro-level resolution. Compared with CFP, OCT images can provide much more valuable medical information due to the higher resolution and 3D nature of OCT volumes. However, OCT imaging is limited by its relatively narrow field of view and the costly, non-portable nature of OCT machines. Therefore, DL models that can predict OCT metrics or characteristics directly from CFP can be invaluable.

Diabetic retinopathy

In the management of DR, increased central foveal thickness (CFT) due to diabetic macular oedema (DMO) deteceted on OCT is an important indication for anti-vascular endothelial growth factor (anti-VEGF) therapy. Studies have shown that eyes with DMO shown on OCT images may not have obvious features, such as lipid exudates, in CFP.⁸Expand Reference To address the limited sensitivity in detecting DMO from CFP, two studies trained DL models to predict CFT and quantitative retinal fluid metrics on OCT directly from CFP.^9,10910 Both studies showed promising results, with an area under the receiver operating curve (AUC) ranging from 0.89 (95% confidence interval [CI] 0.87–0.91) in predicting centre involving DMO to 0.97 (95% CI 0.89–1.00) in predicting CFT above 250 microns on spectral-domain OCT (SD-OCT).^9,10910

Glaucoma

CFPs were also used for predicting retinal thickness on OCT images in glaucoma. Glaucomatous damage is defined by the thinning of the retinal nerve fibre layer (RNFL) and corresponding VF defects. Medeiros et al. trained a DL model to predict progressive thinning of RNFL on OCT directly from longitudinal CFP.¹¹Expand Reference The model predicted progressive damage, with an AUC of 0.86 (95% CI 0.83–0.88), and the predicted RNFL thickness values were significantly correlated with the observed RNFL values (r=0.76; 95% CI 0.70–0.80).¹¹Expand Reference Similarly, a hybrid DL + classical ML model (pre-trained DL model + support vector machine) was developed by Lee et al. to predict macular ganglion cell layer-inner plexiform layer (mGCL-IPL) thickness on OCT using red-free RNFL photography.¹²Expand Reference Their model’s predictions were strongly correlated with the measurements taken by human experts (correlation coefficient r=0.739, mean absolute error [MAE] 4.76 µm; p<0.001).¹²Expand Reference

Clinical implications

The vast majority of the published studies to date on this topic centred on using DL and CFPs to predict OCT characteristics and metrics. This has huge implications for decentralized monitoring in a non-ophthalmology setting. In general, OCT machines are much more expensive than devices capturing CFPs, and OCT machines are typically only available at ophthalmology clinics. In contrast, a plethora of options exist for capturing CFPs, including low-cost, portable colour fundus cameras, and these devices could be paired with structure-structure predictive DL models for at-home monitoring. For example, a patient with known DR could undergo regular fundus imaging at home, and the captured CFPs could be analysed by a DL model to monitor for central macular thickening due to DMO. Similarly, a patient with known glaucoma could undergo regular fundus imaging at home, and the captured CFPs could be analysed by a DL model to monitor for progressive RNFL thinning. As the next steps, DL models could also be trained to predict OCT angiography metrics, such as vascular density, from CFPs.

Structure-function prediction

DL models have been developed to predict various visual functions directly from images. Herein, we present four examples of models used in retinal diseases.

Age-related macular degeneration

Using OCT images from the phase III HARBOR (ClinicalTrials.gov identifier: NCT00891735) clinical trial,^13,141314 which involved monthly visits for patients with neovascular AMD (nAMD) undergoing anti-VEGF injections, Kawczynski et al. developed a DL model¹⁵Expand Reference to predict visual acuity (VA) at every concurrent visit and VA at 12 months from baseline. The model achieved better overall results in predicting VA of the fellow eyes (AUC 0.98 at concurrent visits and AUC 0.96 at 12 months) compared with study eyes (AUC 0.92 at the concurrent visits and AUC 0.84 at 12 months).¹⁵Expand Reference

Balaskas et al. aimed to predict VA in patients with geographic atrophy (GA) under standard and low-luminance conditions.¹⁶Expand Reference First, the OCT images were segmented using DL techniques. Then, a random forest regression model was trained using the segmented images to predict VA in Early Treatment of Diabetic Retinopathy Study (ETDRS) letters. ¹⁶Expand Reference The model achieved r2 0.40 (MAE 11.7 ETDRS letters) predicting standard luminance VA and r2 0.25 (MAE 12.1 ETDRS letters) predicting low-luminance VA from OCT images.

Microperimetry (MP) is another important visual function assay that produces retinal sensitivity results comparable to standard automated perimetry but also has better anatomical-functional correspondence.^17,181718 Additionally, MP can effectively detect residual visual function in various ophthalmic conditions such as glaucoma, DR and AMD.^19–21192021 In their review, Midena et al. also concluded that MP was superior in providing functional changes than VA in patients with AMD.¹⁹Expand Reference Using images from healthy individuals and patients with nAMD and GA for training, Seebock et al. trained a DL model (ReSensNet) to directly predict retinal sensitivity on MP from OCT images.²²Expand Reference The model was then tested on an external dataset, consisting of eyes with DMO, retinal vein occlusion and epiretinal membrane. The MAE for point-wise sensitivity was 2.73 decibels (dB), and the MAE for mean sensitivity was 1.66 dB.²²Expand Reference

Diabetic retinopathy

Lin et al. trained a DL model to predict visual impairment from OCT images of eyes with DME.²³Expand Reference Adequate vision was defined as a decimal VA of ≥0.05, and impaired VA was defined as a decimal VA of <0.05. The model achieved an AUC of 0.80 in predicting adequate versus impaired VA.²³Expand Reference

Glaucoma

Besides retinal diseases, the majority of the structure-function prediction studies pertained to glaucoma, specifically in predicting VF results from images. VF testing is important for the management of patients with glaucoma, and is the gold standard tool in quantifying functional deficits in patients with glaucoma.

Various studies focused on predicting threshold sensitivity values in 24–2 standard automated perimetry from segmented OCT images.^24–282425262728 While the majority of these studies used SD-OCT imaging, Park et al. used swept-source OCT images, and the root mean squared error of the global prediction error for their model was 4.44 dB.²⁵Expand Reference Christopher et al. used RNFL en face images, laser scanning ophthalmoscopy images and RNFL thickness measurements to predict 24–2 VF, and their model achieved an R² of 0.70 and MAE of 2.5 dB, outperforming a model that was only trained with RNFL thickness measurements.²⁸Expand Reference Two other studies developed DL models to predict 24–2 VF results from unsegmented SD-OCT images and achieved similar results. The model developed by Hemelings et al. had an MAE of 4.82 (4.45–5.22) dB.²⁹Expand Reference Kihara et al. trained their EfficientNet B2 model with both unsegmented OCT images and infrared reflectance images to predict each of the 52 sensitivity points on the 24–2 VF, and their model had an MAE of 0.485 (0.438–0.533).³⁰Expand Reference

Other glaucoma studies focused on the central 10° region and on predicting 10–2 VF sensitivity values.^31–353132333435 Xu et al. trained DL models with segmented SD-OCT images (mGCL-IPL, RNFL and outer segment + retina pigment epithelium) to predict VF sensitivity at each point.³⁴Expand Reference The MAE for the whole VF was 2.72 ± 2.60 dB for convolutional neural networks-tensor regression, one of the DL models.³⁴Expand Reference Hashimato et al. developed a pattern-based regularization convolutional neural network-pattern-based regularization with segmented SD-OCT images (mGCL-IPL, RNFL and outer segment + retina pigment epithelium), and their proposed model outperformed classical ML models, achieving an MAE of 2.84 (±2.98) dB.³¹Expand Reference Moon et al. used swept-source OCT images to develop two DL models to predict 10–2 VF.³⁵Expand Reference The MAE for global prediction was 3.10 dB for the model that was trained with mGCL-IPL thickness maps and wide-field en face images and 3.17 dB for the model that was trained with mGCL-IPL thickness maps and RNFL thickness.³⁵Expand Reference

Christopher et al. used macular SD-OCT images to estimate 10–2 and 24–2 VF results. The model, which took into account six segmented retinal layers simultaneously, performed the best in predicting mean deviation values and achieved an MAE of 1.9 dB (95% CI 1.6–2.4 dB) for 10–2 VF and 2.1 dB (95% CI 1.8–2.5 dB) for 24–2 VF testing.³⁶Expand Reference

Lastly, Shamsi et al.³⁷Expand Reference aimed to predict retinal contrast sensitivity in patients with AMD and glaucoma from segmented OCT images. The model was trained on healthy individuals, patients with AMD and patients with glaucoma. The authors reported that mGCL and IPL thicknesses and reflectivity of retinal ganglion cells were significantly correlated with contrast sensitivity, and this correlation was corroborated by class activation maps of the images in the test set. The model achieved an MAE of 0.13 ± 0.011 in predicting Pelli–Robson contrast sensitivity values for all subjects.³⁷Expand Reference

Clinical implications

In ophthalmology, the most widely used functional metric is VA. However, this metric has limitations since, in many common ophthalmic conditions such as glaucoma and retinitis pigmentosa, VA is not affected until the end-stage disease has set in. As alternatives to VA, many functional assays, such as VF and contrast sensitivity, are used in clinical trials and in routine clinical practice, but these alternative assays are typically time consuming and labour intensive. They are also limited by patient and operator variability. In contrast, imaging tests are generally more readily available, reliable and repeatable than functional tests. Therefore, DL models that can predict functional status from objective imaging hold great promise in revolutionizing the way that we assess and monitor functional endpoints in various ophthalmic diseases. A major limitation of this approach is the sheer number of possible combinations of imaging and functional tests. Datasets with paired data points will be needed to train such DL models: for example, paired optic nerve OCT and VF for glaucoma and paired fundus autofluorescence (FAF) and contrast sensitivity for GA. Collecting and curating such datasets require significant resources and effort, so for a given condition, it will be more practical and scalable for experts to agree on and perhaps standardize the optimal combination of imaging and functional tests first before proceeding with large-scale data collection in the future.

Disease onset/progression prediction

Age-related macular degeneration

Several studies used OCT images to predict progression from early/intermediate to advanced AMD in the fellow eye of patients with nAMD in one eye.^38–40383940 Russakoff et al. pre-processed the OCT images in the training set using segmentation and improved the performance of their model (AMDnet) to an AUC of 0.89 at the scan level and 0.91 at the volume level.³⁸Expand Reference On the other hand, Yim et al. used both segmented and raw OCT volumes for training, and their model achieved an AUC of 0.745 in predicting imminent (6 months) conversion to nAMD; in a head-to-head comparison, this system outperformed the three retinal specialists and two optometrists and showed an equivalent performance to the remaining human expert on the panel.³⁹Expand Reference Finally, Banerjee et al. developed a hybrid model, combining patient demographic information, VA and OCT image features.⁴⁰Expand Reference The model achieved an AUC of 0.82 in predicting conversion to nAMD within 3 months and an AUC of 0.68 in predicting conversion to nAMD within 21 months.

A number of studies used CFPs from the Age-Related Eye Disease Study (AREDS) clinical trial to train their neural networks.^41–454142434445 Bhujyan et al. applied a two-step approach: (1) classification of images according to disease severity using DL and (2) prediction of progression to advanced AMD using classical ML.⁴³Expand Reference The hybrid model achieved 84% accuracy in predicting advanced AMD development (GA and nAMD) within 2 years.⁴³Expand Reference Ganjdanesh et al. developed a generative adversarial network (GAN) that learned from temporal, longitudinal changes in CFPs; this model achieved an accuracy of 0.762 (95% CI 0.733–0.792) in simultaneously grading disease severity at the current time point and in predicting progression to late AMD at a future time point.⁴⁵Expand Reference Lastly, Liefers et al. trained their DL model with automatically segmented CFPs to predict GA growth rate.⁴⁶Expand Reference The dataset included patients from the Rotterdam study and Blue Mountain Eye Study for training and patients from the AREDS trial for validation.^47,484748 The model reached an interclass correlation of 0.83 between predicted and ground-truth GA areas.⁴⁶Expand Reference

Other studies focused on the progression of dry AMD and GA. Gigon et al. developed a DL model to predict en face retinal pigment epithelium and outer retinal atrophy (RORA) progression on OCT, achieving a Dice score that ranged from 0.46 to 0.72 in predicting the RORA growth regions.⁴⁹Expand Reference Zhang et al. developed a bi-directional long short-term memory prediction module that had an average Dice ranging between 0.86 and 0.92 in predicting GA growth on SD-OCT images under different scenarios, and their model gained 10% in accuracy after the integration of time-related factors.⁵⁰Expand Reference Anegondi et al. developed two DL models for predicting GA growth rate.⁵¹Expand Reference One model was trained with FAF images only. The second model was trained with both FAF and OCT images. Interestingly, the model trained with FAF images only actually performed better, with an AUC of 0.98 (0.97–0.99) in predicting GA growth rate.⁵¹Expand Reference Kalra et al. focused on predicting degeneration of the ellipsoid zone on SD-OCT images, and the at-risk ellipsoid zone areas identified by their DL model showed an interclass correlation of 0.83 with the ground truth.⁵²Expand Reference

Diabetic retinopathy

Compared with AMD, fewer studies investigated disease onset/progression within the context of DR. A model created by Bora et al. aimed to predict the development of DR within 2 years from baseline fundus photographs that did not contain any DR and achieved an AUC of 0.70 (95% CI 0.67–0.74) during external validation.⁵³Expand Reference Arcadu et al. used ETDRS 7-field CFPs from patients with DR at baseline to predict two-step worsening on the ETDRS severity scale at month 12, but their model only achieved a modest performance, with a mean AUC of 0.61.⁵⁴Expand Reference

Glaucoma

Several studies trained DL models with CFPs to predict the onset of glaucoma. Both Thakur et al. and Lin et al. used CFPs to predict whether a person will develop glaucoma within a certain time frame in the future.^55,565556 While Thakur et al. reported moderate performance (AUC 0.88 [0.86–0.91] for onset in 1–3 years and AUC 0.77 [0.75–0.78] for onset in 4–7 years),⁵⁵Expand Reference the Multi-scale Multi-structure Siamese Network (MMSNet) developed by Lin et al. achieved an AUC of 0.93 for predicting onset of primary open-angle glaucoma in 2 years and AUC of 0.95 for predicting onset in 5 years.⁵⁶Expand Reference

Hou et al. employed vision transformers, the cutting-edge DL architecture, to predict VF worsening using longitudinal OCT images and their most robust model achieved an AUC of 0.97 (95% CI 0.88–1.00).⁵⁷Expand Reference Similarly, Herbert et al. used vision transformers to predict rapid worsening in VF defects (more than 1 dB decrease per year globally) from OCT images, and their best-performing model had an AUC of 0.87 (95% CI 0.77–0.97).⁵⁸Expand Reference

Clinical implications

Among the four kinds of prediction highlighted in our review, DL models capable of predicting disease onset and progression will likely have the most far-reaching impact on public health by identifying the most at-risk patients on a population level. In most medical conditions, early detection and timely initiation of treatment lead to better outcomes. For example, in nAMD, the presenting VA predicts the long-term VA. In glaucoma, optimal intraocular pressure control at the first sign of glaucomatous optic neuropathy can halt further damage. While being able to predict which patients will develop the earliest stage of diabetic eye disease (i.e. ETDRS level 20 [microaneurysms only]) does not affect DR management currently, as there is no approved therapy, this ability may become more relevant in future, should neuroprotective agents become available.

Treatment response prediction

Most studies published to date in this area pertain to predicting response to intravitreal anti-VEGF injections.

Age-related macular degeneration

Lee et al. trained a GAN with baseline OCT images and fundus fluorescein angiography/indocyanine green angiography images to generate post-therapeutic OCT images in patients undergoing anti-VEGF injections for nAMD.⁵⁹Expand Reference The synthetically generated images were compared with their authentic counterparts for the presence or absence of four biomarkers: pigment epithelial detachment, intraretinal fluid, subretinal fluid and subretinal hyper-reflective material. The best-performing model showed an accuracy ranging from 80.7% to 96.3% in generating the appropriate biomarkers in simulated post-therapeutic OCT images.⁵⁹Expand Reference Also Liu et al. used a GAN, trained with pairs of pre- and post-therapeutic OCT images of patients with nAMD, to predict treatment response.⁶⁰Expand Reference The synthetically generated OCT images were compared with actual post-therapeutic OCT images, and the generative DL model achieved an accuracy of 0.85 (95% CI 0.74–0.95) in predicting the final status of the macula (wet versus dry) and an accuracy of 0.81 (95% CI 0.69–0.93) in predicting whether there will be a complete resolution of sub/intraretinal fluid after one injection.⁶⁰Expand Reference

Yeh et al. aimed to predict VA improvement at month 12 after the initiation of anti-VEGF injections.⁶¹Expand Reference The model, HDF-Net, was trained with unsegmented baseline OCT images and non-image-based clinical data of treatment-naïve patients with nAMD and achieved an AUC of 0.98 (95% CI 0.97–0.99) in predicting VA improvement of ≥2 lines.⁶¹Expand Reference Fu et al. used a least squares regression model to predict VA at 3 and 12 months after the initiation of anti-VEGF treatment. Their model achieved R²=0.80 (MAE 5.0 ETDRS letters) at month 3 and R²=0.7 (MAE 7.2 ETDRS letters) at month 12. In addition, the model was able to predict incremental VA change after the first and third injections.⁶²Expand Reference Romo-Bucheli et al. aimed to predict treatment burden over 2 years using baseline OCT images, and their model achieved an AUC of 0.81 in predicting high treatment burden, defined as ≥16 injections over 2 years.⁶³Expand Reference

Diabetic retinopathy

Studies evaluating treatment response in DMO focused on predicting the reduction of macular thickness after anti-VEGF injections.^64–66646566 Both Rasti et al. and Alryalat et al. used DL models to identify good responders versus poor responders based on OCT criteria, and the models developed by these two groups showed an AUC ranging from 0.81 to 0.86.^64,656465 Furthermore, Liu et al. used an ensemble model, comprising both DL and classical ML techniques, to predict post-treatment OCT CFT and VA using baseline OCT and clinical data. The MAE for predicting good anatomical outcome was 68.08 µm, and the MAE for predicting good functional outcome was 0.13 logMAR in the external validation dataset.⁶⁶Expand Reference Lastly, Xu et al. used a GAN to generate synthetic post-treatment OCT images that were compared with real post-treatment OCT images, and the MAE between synthetic and real images was 24.51 ± 18.56 μm for CFT.⁶⁷Expand Reference

Clinical implications

The burden of VEGF-driven retinal diseases, such as AMD and DR, is expected to increase exponentially as the population ages and the incidence of diabetes continues to rise. Accordingly, the ability to risk stratify and tailor treatment plans for patients requiring anti-VEGF therapy will become increasingly important. On an individual level, more personalized therapy could lead to better final visual outcomes. On a systems level, given there is a large difference in cost between the different anti-VEGF agents, being able to determine which patients would respond equally well to both inexpensive and expensive medications would have implications from a cost-effectiveness point of view.

Conclusion and future directions

DL applications in ophthalmology have gradually shifted from classification to predictive tasks. Such predictive DL models hold the promise of revolutionizing our field by providing insights that may elude even the most astute clinicians. From a technical point of view, we anticipate the more widespread use of vision transformers and generative DL techniques in training these predictive models.

Structure-function predictions may involve the simultaneous correlation of multiple visual functional endpoints with a single imaging modality. Further, future studies involving the prospective validation of models trained with retrospective data will be invaluable. Finally, for predicting treatment response, clinical trials comparing standard-of-care versus DL-based clinical decision support tools will help establish whether DL tools could improve patient outcomes.

Specialities

Sign Up for Free Access to the Latest Clinical Updates in Your Specialty!

Conference Coverage

Learning Activities

Journals

Noninfectious Uveitis: A Review of Ophthalmic Management and Clinical Pearls

Trending Topic

Noninfectious Uveitis: A Review of Ophthalmic Management and Clinical Pearls

Connect With Us:

Predictive Deep Learning Applications in Ophthalmology

Abstract

Overview

Keywords

Article

Methods

Structure-structure prediction

Diabetic retinopathy

Glaucoma

Clinical implications

Structure-function prediction

Age-related macular degeneration

Diabetic retinopathy

Glaucoma

Clinical implications

Disease onset/progression prediction

Age-related macular degeneration

Diabetic retinopathy

Glaucoma

Clinical implications

Treatment response prediction

Age-related macular degeneration

Diabetic retinopathy

Clinical implications

Conclusion and future directions

This Content is forMembers Only

References

Article Information

Disclosure

Compliance With Ethics

Review Process

Authorship

Correspondence

Support

Access

Data Availability

Received

Further Resources

1. Get Permission

2. Download as PDF

3. Share This

Foreword: touchREVIEWS in Ophthalmology, Volume 18, Issue 1, 2024

Artificial Intelligence Chatbot Use in Ophthalmology

Artificial Intelligence for the Diagnosis and Screening of Retinal Diseases

Latest articles videos and clinical updates - straight to your inbox

Log into your Touch Account

Register now for FREE Access

Sign up with an Email

This Functionality is forMembers Only

Sign Up for Free Access to the
Latest Clinical Updates in Your Specialty!

This Content is for
Members Only

This Functionality is for
Members Only