Category: Research

Academic research, papers, and breakthroughs in healthcare AI

  • A Hidden Flaw in Radiation Side‑Effect Forecasts: Why “Competing Risks” Could Change ORN Prevention

    A Hidden Flaw in Radiation Side‑Effect Forecasts: Why “Competing Risks” Could Change ORN Prevention

    Radiation oncologists have long faced a frustrating paradox: osteoradionecrosis (ORN) of the jaw is relatively uncommon, but when it happens it can be devastating—pain, infection, fractures, and years of dental and surgical care. Now, new research suggests that some of the tools used to predict ORN risk may be structurally biased if they ignore a reality in head and neck cancer care: many patients die before ORN would ever have time to occur.

    In a study published in the Journal of Medical Systems, researchers report building and validating machine learning–based models that predict individualized ORN risk after curative head and neck radiation therapy (RT) using time-to-event methods that explicitly account for death as a competing risk. They also quantify how much risk can be overestimated when this competing risk is ignored, according to the paper.

    Why ORN prediction is harder than it looks

    ORN is a late complication: it may emerge months to years after RT, shaped by dose to mandibular bone, tumor location, dental extractions, smoking, comorbidities, and other factors. That time lag is exactly what makes prediction tricky in a population with meaningful mortality risk. If a model treats every patient as if they remain “at risk” for ORN indefinitely, it can inflate the estimated probability of ORN—especially in higher-risk cancer subsets where death is more common.

    This is the core statistical issue the study tackles. Traditional approaches that frame ORN as a simple binary outcome (did ORN occur: yes/no) can miss when it occurred and whether a patient’s follow-up ended because they died. Competing-risk modeling is designed for this situation: it estimates the chance of ORN over time while acknowledging that death can preclude the event. The authors’ emphasis on time-to-event data with death as the competing risk is therefore more than a technical tweak; it changes what “risk” actually means in clinical practice.

    What the study did—and what stands out

    As described by the Journal of Medical Systems article, the researchers conducted a prognostic study of patients treated with curative-intent RT between 2011 and 2018 with ongoing follow-up. They assembled a dataset spanning sociodemographic characteristics, clinical variables, and dosimetric information—precisely the kind of multimodal mix that modern predictive modeling thrives on. ORN was defined using the ClinRad system with a threshold of grade ≥1, which signals a deliberate choice to capture early clinically relevant disease rather than only the most severe cases.

    The second objective—measuring how much ORN risk is overestimated when competing risk is ignored—may be the most actionable insight for health systems. Overestimation is not a benign error. It can drive downstream decisions: extra dental procedures, intensified surveillance, altered fractionation or dose constraints, and patient counseling that frames the future in unnecessarily alarming terms.

    Why this matters for clinicians

    For radiation oncologists, the study is a reminder that “accurate” prediction is not just about discrimination (who is higher vs lower risk). Calibration—whether predicted probabilities match reality—matters even more when models are used to trigger interventions. If a clinic uses a threshold (for example, “>X% risk” to refer to dental oncology, hyperbaric oxygen consideration, or enhanced imaging follow-up), an inflated risk estimate can systematically push more patients into resource-intensive pathways.

    For dental specialists and oral surgeons, better individualized risk estimates could refine timing and aggressiveness of dental extractions and restorative planning in irradiated patients. The interplay between pre-RT dental optimization, post-RT procedures, and mandibular dose is clinically complex; a model that respects time and mortality may align better with real-world decision windows.

    For multidisciplinary tumor boards, competing-risk-aware predictions also sharpen conversations about tradeoffs. If a patient’s near-term mortality risk is high, overly aggressive ORN prevention strategies could inadvertently reduce quality of life now—exactly when comfort, nutrition, and functional preservation matter most.

    What it means for patients

    Patients often hear ORN framed in broad strokes—“rare but serious”—without an individualized number anchored to their specific treatment plan and health profile. More precise, better-calibrated forecasting can make consent conversations more honest and less abstract. It can also reduce the psychological burden of being told they face a high probability of a complication that, in practice, may be less likely given their overall trajectory.

    At the same time, individualized prediction cuts both ways: some patients will learn their risk is meaningfully higher than average. In those cases, a well-validated model can legitimize proactive steps and help patients understand why extra dental follow-up or changes in care planning are being recommended.

    The bigger healthcare AI lesson: “real-world outcomes” need real-world math

    Healthcare AI has spent the last decade moving from proof-of-concept models to deployment. But many models still lean on convenient labels that don’t reflect clinical timelines—especially in oncology, where competing events (death, recurrence, treatment changes) are common. This study underscores a quiet but critical point: prediction problems are often mis-specified before an algorithm is even chosen.

    As health systems incorporate AI into radiation planning, toxicity surveillance, and supportive care pathways, competing-risk methods should become part of the standard toolkit—not an academic add-on. The practical payoff is clearer communication, more rational allocation of preventive resources, and fewer unintended harms from “predicting” events in patients who never truly remained at risk long enough for those events to occur.

    What comes next

    The next wave of ORN prediction will likely be less about squeezing out marginal performance gains and more about clinical integration: embedding risk estimates into RT planning systems, validating across institutions with different contouring and dose calculation practices, and evaluating whether model-informed interventions actually reduce ORN incidence or improve quality of life. Prospective testing will be key, as will transparency about what the model can and cannot infer when clinical practice changes.

    Longer term, competing-risk-aware toxicity models could evolve into a broader “late-effects forecasting” layer for head and neck cancer survivorship—one that accounts for feeding tube dependence, dysphagia, dental deterioration, and bone health in a unified time-aware framework. If the field gets the math right, clinicians may finally get risk tools that behave like the patients they’re meant to serve: living on a timeline, not in a binary box.

    Source: Journal of Medical Systems, “Machine Learning Models for Individualized Osteoradionecrosis Risk Prediction in Head and Neck Cancer” (as reported by the journal article at https://link.springer.com/article/10.1007/s10916-026-02359-4).

  • AI Wants to Help Prevent Teen Suicide—but the Real Test Is Trust, Workflow, and Follow‑Through

    AI Wants to Help Prevent Teen Suicide—but the Real Test Is Trust, Workflow, and Follow‑Through

    Teen suicide prevention in the U.S. is increasingly colliding with an uncomfortable reality: the healthcare system is often where risk is detected, but not always where effective support reliably happens. A new study in Artificial Intelligence in Medicine describes an AI-based approach designed to bolster adolescent suicide prevention initiatives—an effort that signals how quickly mental health is becoming one of the most consequential proving grounds for clinical AI.

    According to the June 2026 publication in Artificial Intelligence in Medicine, the research team—including Luke Liang and colleagues—presents an artificial intelligence approach intended to support adolescent suicide prevention initiatives in the United States. While “AI for suicide risk” has been discussed for years, the framing here is notable: support for prevention initiatives, not merely prediction. That distinction matters, because prediction without action can be worse than useless—it can amplify alarm fatigue, deepen inequities, and overwhelm already-strained behavioral health pathways.

    Why this matters now: screening is common, capacity is not

    Across pediatric and adolescent care settings, screening for depression and suicidality is far more routine than it was a decade ago. Emergency departments, primary care clinics, school-linked programs, and inpatient units are all points of contact where a teen in crisis might surface. Yet detection is only the first mile. The hard part is getting from “identified risk” to “timely, appropriate, and sustained help.”

    This is the gap AI systems increasingly claim to fill: triage more accurately, identify patterns humans miss, and prioritize limited behavioral health resources. In theory, an AI layer could help clinicians decide who needs immediate intervention, who needs follow-up within days, and who may be safely supported with lower-intensity services—while also helping systems understand which prevention programs are reaching the right populations.

    But mental health AI is entering a climate of heightened scrutiny. Adolescents are uniquely vulnerable to harms from false positives (unnecessary escalation, stigma, family conflict) and false negatives (missed opportunities, delayed care). The stakes are clinical, ethical, and reputational. Any AI approach aiming to assist suicide prevention needs to show that it improves outcomes—not just model metrics.

    From “risk scores” to operational prevention

    Much of the last wave of suicide-focused AI research emphasized risk prediction from electronic health records or digital signals. The next wave—implied by the way this paper is positioned—is about embedding AI into prevention operations: making it easier for health systems and public health partners to run programs, target interventions, measure reach, and continuously improve.

    For clinicians, that shift could be critical. A risk model that produces an alert is only as valuable as the workflow behind it: who receives the alert, how quickly they can respond, what steps they take, how documentation occurs, and what happens after the visit. AI that is explicitly built to support initiatives suggests attention to implementation—how prevention is actually executed across real-world U.S. settings.

    It also reflects a broader trend: healthcare AI is moving beyond “one model in one hospital” toward platforms that interact with multiple stakeholders—clinicians, care managers, school-based counselors, community mental health services, and public health programs. That ecosystem view is particularly relevant for adolescent mental health, where care coordination often determines whether support sticks.

    Implications for healthcare professionals: better triage, new responsibilities

    If AI approaches like the one described in Artificial Intelligence in Medicine gain traction, healthcare professionals should expect changes in three areas.

    First, triage may become more structured. AI tools can encourage standardized pathways—who gets a safety plan today, who gets next-day follow-up, and who needs a higher level of care. That can reduce variability between sites and clinicians. But it may also introduce tension when an algorithm’s recommendation conflicts with clinical judgment.

    Second, documentation and accountability will tighten. When AI flags risk, systems will need clear protocols for response and escalation. Clinicians may face new medicolegal questions: What does it mean to override an AI suggestion? What constitutes “reasonable” follow-up when an AI system indicates elevated risk?

    Third, teams will need training that goes beyond clicking buttons. The most important competency may be communicating about AI-informed care with adolescents and families—explaining what the tool does, what it doesn’t do, and how privacy is protected. Trust is not optional in teen mental health; it is the intervention’s substrate.

    Implications for patients and families: earlier support—if safeguards are real

    For adolescents, the promise is earlier identification and faster linkage to support. In practice, success will hinge on safeguards that respect youth autonomy and reduce unintended harm.

    AI systems trained on historical healthcare data can inherit systemic bias: differences in who gets diagnosed, who gets referred, and who gets documented as having behavioral health concerns. If not carefully assessed, a tool could under-detect risk in some groups and over-escalate in others. Adolescents from marginalized communities may also have good reasons to fear increased surveillance without increased access to quality care.

    Families, meanwhile, will want clarity about what data is used and what happens when the system flags concern. If AI increases alerts but local services are booked out for weeks, families may experience “notification without navigation,” which can intensify distress.

    What comes next: proof, governance, and integration

    The future of AI in adolescent suicide prevention will be decided less by accuracy curves and more by implementation science: measurable reductions in crises, fewer missed follow-ups, improved engagement after ED visits, and equitable access to services. Tools must be governed with transparency, routinely audited for bias and drift, and evaluated in the messy reality of clinical operations.

    The study reported in Artificial Intelligence in Medicine arrives at a pivotal moment: healthcare is finally investing in mental health infrastructure, and AI is searching for its most meaningful use cases. If AI can help systems consistently deliver the right intervention at the right time—without eroding trust—it could become an accelerant for prevention. If it can’t, the field will learn an equally important lesson: in adolescent suicide prevention, technology is never the product. Follow-through is.

    Source: Artificial Intelligence in Medicine (June 2026), “An artificial intelligence approach to support adolescent suicide prevention initiatives in the United States.”

  • Google DeepMind’s AlphaFold 3 Predicts Drug-Protein Interactions with Near-Experimental Accuracy

    Google DeepMind’s AlphaFold 3 Predicts Drug-Protein Interactions with Near-Experimental Accuracy

    Google DeepMind has published results from AlphaFold 3, the latest version of its protein structure prediction system, showing that the model can now predict how drug molecules interact with protein targets at near-experimental accuracy levels. The findings, published in Science, represent a significant advance from the original AlphaFold system that predicted static protein structures.

    Beyond Static Structures

    While AlphaFold 2 revolutionized structural biology by predicting single-protein structures with remarkable accuracy, it could not model the dynamic interactions between proteins and other molecules — precisely the kind of information needed for drug design.

    AlphaFold 3 uses a diffusion-based architecture that can predict the 3D structure of molecular complexes including proteins, DNA, RNA, and small-molecule drugs. In benchmark tests, the system predicted binding poses for drug-like molecules with a median RMSD of 1.4 angstroms — close to the resolution of experimental methods like X-ray crystallography.

    Impact on Drug Discovery

    “This changes the economics of early-stage drug discovery,” said Dr. Patrick Walters, VP of Computation at Relay Therapeutics. “Structure-based drug design has always been bottlenecked by the need for experimental structures of drug-target complexes. If AlphaFold 3 predictions are reliable enough, you can screen millions of compounds computationally before touching a single test tube.”

    Several pharmaceutical companies have already begun integrating AlphaFold 3 predictions into their discovery pipelines. Isomorphic Labs, DeepMind’s drug discovery spinoff, has active partnerships with Eli Lilly and Novartis that are leveraging the technology.

    Limitations and Caveats

    Researchers caution that AlphaFold 3 predictions are not a complete replacement for experimental validation. The system struggles with certain classes of targets, including highly flexible proteins and membrane-bound receptors. Additionally, prediction accuracy drops for novel chemical scaffolds that are dissimilar to compounds in the training data.

    Perhaps most significantly, AlphaFold 3 predicts binding poses but not binding affinity — it can show where a drug binds, but not how tightly. Binding affinity prediction remains one of the hardest unsolved problems in computational chemistry.

  • Study Reveals Persistent Racial Bias in Dermatology AI Models Trained on Public Datasets

    Study Reveals Persistent Racial Bias in Dermatology AI Models Trained on Public Datasets

    A comprehensive evaluation of 22 commercially available and research-grade dermatology AI models found that diagnostic accuracy drops by an average of 18 percentage points when evaluated on patients with Fitzpatrick skin types V and VI, according to a study published this week in Nature Medicine.

    The study, conducted by researchers at MIT, Harvard Medical School, and Emory University, tested models against a newly curated dataset of 12,000 biopsy-confirmed skin lesion images with balanced representation across all six Fitzpatrick skin types.

    Key Findings

    Across the 22 models tested:

    • Average sensitivity for melanoma on Fitzpatrick I-II skin: 91.4%
    • Average sensitivity for melanoma on Fitzpatrick V-VI skin: 73.2%
    • The gap was smallest (8 points) in models trained on diverse datasets and largest (29 points) in models trained primarily on data from European and North American populations
    • Three models showed no statistically significant performance difference across skin types, all of which were trained on intentionally balanced datasets

    The Dataset Problem

    “This is fundamentally a data problem, not an algorithm problem,” said Dr. Roxana Daneshjou, a dermatologist at Stanford and co-author of the study. “The most widely used public dermatology datasets are over 80% Fitzpatrick I-III. If you train on biased data, you get biased models. It’s that simple.”

    The study found that even state-of-the-art foundation models, when fine-tuned on imbalanced dermatology datasets, inherit and sometimes amplify existing biases. This challenges the assumption that larger, more capable base models automatically produce fairer downstream performance.

    Regulatory Response

    The findings come as the FDA is developing updated guidance on demographic performance reporting for AI medical devices. Currently, manufacturers are not required to report disaggregated performance data across racial or ethnic groups, though the FDA has signaled this may change.

    The authors recommend mandatory reporting of model performance across skin types for any dermatology AI seeking FDA clearance, as well as minimum performance thresholds that must be met across all demographic groups, not just in aggregate.