Artificial Intelligence (AI) enabled pre-screening for Diabetic Retinopathy (DR) clinical trials

May 5, 2025

Nancy Barrett, Robert Slater, Rachel E. Linderman, Rick Voland, Claire Calhoun, Jennifer K. Sun, Barbara A. Blodi, Amitha Domalpally, with the DRCR Retina Network

Abstract

Purpose: Preventive strategies to slow progression of diabetic retinopathy (DR) are currently of interest.Clinical trials typically enroll patients at high risk of progression based on DR severity scores (DRSS), oftenmoderate to severe non-proliferative DR (NPDR) (DRSS 43-53). Estimated screen failure rates after readingcenter (RC) grade of stereoscopic 7-field color images are over 50%, frequently due to insufficient DRSS. Wedeveloped an AI algorithm to enable real-time prescreening of color images for DRSS to reduce screen failurerates.

Methods: Macula-centered fundus images (field 2) from baseline visits of DRCR protocols AA, AC, S, T, V, andW were used to train an AI model. All eyes included met criteria for high image quality and absence ofphotocoagulation. A SWIN-B Transformer model was trained and validated on 2,120 eyes (1,764 participants)and tested on 551 eyes (451 participants). The AI model produced binary outputs to simulate clinical trialenrollment eligibility: ineligible (DRSS ≤35 or >53) and eligible (DRSS 43-53). Ground truth RC DRSSassessment of 7-field/4-wide images was compared to AI outputs and disagreements were evaluated.

Results: Of the 551 test set eyes, 310 (56%) were eligible and 241 (44%) were ineligible based on RCassessment. The AI model correctly predicted 266 (86%) eligible eyes and 118 (49%) ineligible eyes. The areaunder the receiver operator characteristic curve was 0.759 (95% CI (0.716, 0.803)). Model performancemetrics include an accuracy of 70% (66%,74%), sensitivity 86% (82%, 90%), specificity 49% (42%, 56%), andF1 score 76% (72%, 80%). A review of false positives and negatives revealed two main contributors toinaccuracies: lesions outside the field 2 area that the algorithm did not have access to (figure), and lesionswithin field 2 that the AI failed to detect. Other factors included image quality issues and complexpresentations.

Conclusions: Use of the prescreen AI algorithm resulted in 70% correct evaluation for eligible vs ineligibleusing clinical trial data. A tiered approach with AI prescreening for eligibility followed by expert confirmationmay lose some eligible patients, but could also reduce screen failure rates. Future efforts at real-timeautomated assessment of potential eligibility could improve efficiency of enrollment, reduce human burden,and reduce cost in DR clinical trials.