Cutaneous squamous cell carcinoma (cSCC) is the second most common form of skin cancer. While most cases are treatable, a small number can become serious and spread, leading to worse outcomes.
A risk prediction tool for cutaneous squamous cell carcinoma (cSCC) built with the GPT-4 large language artificial intelligence model performed better than current systems at identifying patients more likely to have poor outcomes, according to a new study published in JAMA Dermatology.
CSCC is the second most common form of skin cancer. While most cases are treatable, a small number can become serious and spread, leading to worse outcomes.
Accurately identifying which tumors are more dangerous is important for deciding how to treat patients, the report shared.
Invasive squamous cell carcinoma.
Existing tools or models, such as the AJCC8 and BWH staging systems, group tumors by certain traits, but they tend to miss important risk factors and can group very different tumors together, making it harder to predict who might do poorly.
Many factors increase the risk of developing cSCC, including immunosuppression, chronic wounds, fair skin, male gender, older age, certain genetic conditions, ultraviolet (UV) radiation exposure and a history of prior squamous cell carcinoma, according to the National Institutes of Health.
In 2012, the estimated incidence was 140 cases per 100,000 American men and 50 per 100,000 women.
To address these limitations, researchers searched PubMed, Embase and the Cochrane Library for studies from 1999 through the end of 2023.
After applying strict criteria, 10 studies that linked risk factors to serious outcomes such as recurrence, spread or death were selected.
These studies were used to inform a large AI model, GPT-4, called AIRIS through a process called retrieval-augmented generation (RAG).
The AI created a new scoring system to predict which cSCC tumors are more dangerous.
AIRIS was tested using tumor data from NYU Langone Health and Mayo Clinic.
The dataset included 2,379 biopsy-proven cSCC cases with full clinical information.
The AI model’s predictions were compared to AJCC8 and BWH systems using statistical tests.
Researchers measured how well AIRIS could predict poor outcomes using standard metrics like sensitivity, specificity and AUC. AIRIS was also tested for consistency and ability to separate high- and low-risk cases.
It was found that AIRIS outperformed BWH and AJCC8 in a number of key areas for predicting poor outcomes in patients with cSCC.
In low-risk groups, AIRIS showed fewer poor outcomes: 50.9% for local recurrence (LR), 26.3% for nodal metastasis (NM), 17.5% for distant metastasis (DM) and 27.8% for disease-specific death (DSD).
In comparison, BWH and AJCC8 systems had nearly twice as many poor outcomes in their low-risk groups, indicating there were less consistent results.
AIRIS also showed further progression, overall.
For high-risk AIRIS classes, the poor outcome rates increased significantly: LR (49.1%), NM (73.7%), DM (82.5%) and DSD (72.2%).
As far as diagnostic performance, AIRIS had higher sensitivity for all outcomes—ranging from 49.1% to 82.5%—but slightly lower compared to BWH and AJCC8.
Although overall accuracy was lower, AIRIS demonstrated stronger predictive power, with AUC values of 0.69 (LR), 0.81 (NM), 0.85 (DM), and 0.80 (DSD)—all higher than the traditional systems.
While much data was collected, the study did have several strengths.
For example, reviewed over 2,000 primary tumors to validate AIRIS. AIRIS included important patient risk factors such as immunosuppression, lymphovascular invasion and in-transit metastasis, which are often missing from traditional staging systems, authors of the study noted.
This helped AIRIS better predict poor outcomes and showed improved sensitivity and risk discrimination compared to current standards.
However, limitations include the relatively low event rate of poor outcomes in cSCC which cab make validation challenging.
In addition, large language models such as GPT rely on probable predictions and can have biases based on their training data and inputs.
While RAG helps ground the model in reliable literature, AI-generated outputs still require careful validation, authors suggest.
Future improvements are recommended to include weighting immunosuppression categories and integrating multimodal data including imaging or gene profiles to personalize risk predictions further.
Get the latest industry news, event updates, and more from Managed healthcare Executive.
Dupixent Is Linked to a Higher Risk of Psoriasis in Patients With Atopic Dermatitis
July 8th 2025AD and psoriasis were once thought to be opposites in terms of immune response, with AD linked to a Th2-dominant pathway and psoriasis driven by Th17 inflammation. However, newer research has shown that these diseases can overlap in patients and may even share common pathways.
Read More
Higher Air Pollution and Extreme Weather Linked to Adult Atopic Dermatitis
July 1st 2025Climate change and air pollution are growing threats to public health, and new research suggests they may also worsen atopic dermatitis (AD), the most common chronic inflammatory skin condition, affecting up to 15% of people worldwide.
Read More
Teens With Hidradenitis Suppurativa Face Higher Risk of Chronic Health and Mental Health Conditions
June 16th 2025In a national study of 55 million children and teens across the U.S., there were 1,240 confirmed cases of hidradenitis suppurativa (HS). Nearly all were aged 10 or older.
Read More