News|Articles|June 10, 2026

Machine learning model predicts T2D risk up to 10 years before onset

Author(s)Rose McNulty
Listen
0:00 / 0:00

Key Takeaways

  • A >1.2% one-year risk threshold (top two deciles) yielded 80% sensitivity and 81% specificity, supporting scalable targeting of prevention resources within an overinclusive at-risk population.
  • Discrimination was strong (AUC 0.883 in validation) with near-ideal 1-year calibration (1.03% predicted vs 1.01% observed), and performance was stable across longer horizons.
SHOW MORE

An electronic health record–based model could help health systems target diabetes prevention to the high-risk patients most likely to benefit.

A machine learning model built entirely from routine electronic health record (EHR) data accurately identified adults at the highest risk of developing type 2 diabetes up to 10 years before onset, researchers reported at the American Diabetes Association's 2026 Scientific Sessions in New Orleans.

The analysis, led by Luis A. Rodriguez, Ph.D., M.P.H., R.D., of Kaiser Permanente Northern California, points to a more precise way to find the patients who stand to gain most from prevention programs—a longstanding challenge for health systems and payers.

That challenge is fundamentally one of scale because more than 60% of U.S. adults have at least one risk factor for type 2 diabetes, a population far larger than existing prevention programs can realistically serve. And because the disease develops gradually, often without obvious warning signs, clinicians struggle to single out who needs intervention and when.

To address that, the researchers ran a retrospective cohort study of 3,365,464 adults aged 18 to 70 who received care at Kaiser Permanente Northern California between 2012 and 2024. They split the cohort 70:30 for training and validation and applied a hazard-based super learning approach (an ensemble method that blends multiple survival-analysis models) to estimate each patient's one-, three-, and 10-year risk of diabetes.

The model drew on demographics, clinical measures, comorbidities, prescriptions, and utilization, along with less conventional inputs: metabolic dysfunction–associated steatotic liver disease (MASLD) and neighborhood-level measures of socioeconomic status, walkability, and food environment.

The cohort had a median age of 39, and 55% were female. Over a median 5.4 years of follow-up, type 2 diabetes incidence was 10.7 per 1,000 person-years. The model achieved an area under the curve of 0.886 in training and 0.883 in validation, with near-ideal calibration at one year (mean predicted risk 1.03% versus observed 1.01%). At the optimal cut point, which was a greater than 1.2% risk threshold capturing the top two deciles of risk, sensitivity was 80% and specificity was 81%. Performance held across the three- and 10-year windows.

One methodological choice stands out in the current treatment landscape. The investigators classified incident diabetes using diagnosis codes, glycemic test values, or diabetes medication fills but did not count patients prescribed only metformin, an SGLT2 inhibitor, or a glucagon-like peptide-1 receptor agonist without a corresponding diagnosis, lab value, or other diabetes medication. With those drugs now widely prescribed for weight management and cardiovascular and renal indications, the distinction helps guard against misclassifying patients who do not actually have diabetes.

"These findings represent a potential advancement over existing approaches for identifying individuals at risk of developing type 2 diabetes by enabling earlier, more precise detection," Rodriguez said. “Our model has the potential to create an opportunity for clinicians and health systems to focus prevention efforts on the high-risk individuals often missed by traditional screening who have the most to gain from prevention and treatment.”

The authors plan to test the model in a clinical setting to see whether it boosts enrollment in prevention programs and, ultimately, lowers diabetes incidence.


Latest CME