Category: Clinical AI

AI applications in clinical settings, diagnostics, and patient care

  • Poison in the Training Set: Why Medical LLMs Need a Supply-Chain Mindset

    Poison in the Training Set: Why Medical LLMs Need a Supply-Chain Mindset

    Medical AI has a new, uncomfortable reality to contend with: you don’t have to “hack” a medical large language model (LLM) in the traditional sense to make it dangerous—you may only need to subtly contaminate what it learns from. New research reported by Nature Medicine suggests that poisoning a surprisingly small fraction of training data can nudge medical LLMs toward generating convincing misinformation, raising fresh concerns about the integrity of the data pipelines feeding clinical-grade AI.

    The work lands at a moment when LLMs are rapidly moving from pilots into production workflows—summarizing charts, drafting patient instructions, supporting coding, and answering clinician questions. In other words, these models are increasingly positioned as “soft infrastructure” in care delivery. If their knowledge can be quietly reshaped upstream, the effects could show up downstream as incorrect clinical guidance, flawed patient education, or distorted medical consensus.

    A new threat model for healthcare AI

    In cybersecurity terms, data poisoning is a supply-chain attack: instead of breaking into the model at runtime, an adversary influences what the model becomes during training. The Nature Medicine paper highlights a key point that should worry healthcare leaders—these attacks don’t necessarily require large-scale access or dramatic tampering. The idea is to introduce small, targeted distortions so that the model later produces specific kinds of wrong answers while still appearing broadly competent.

    That’s particularly relevant in medicine, where “mostly correct” can still be unsafe. A model that performs well on general benchmarks but occasionally slips into confident falsehoods about drug interactions, contraindications, or screening recommendations can create a risk profile that’s hard to detect with conventional validation. Most health systems test models on curated datasets and expected use cases; they rarely test the model’s behavior under adversarially influenced training distributions.

    Why misinformation from medical LLMs is uniquely sticky

    Clinicians and patients don’t interact with LLMs the way they interact with a journal article. The output arrives in a conversational format that feels personalized and authoritative. That interface—combined with the speed and apparent fluency—can compress skepticism. In a busy clinic, a plausible but wrong answer can become a cognitive shortcut; for a patient, it can feel like a second opinion that “speaks human.”

    Poisoning attacks amplify this problem because the misinformation can be tailored. Rather than producing random errors, a compromised model could systematically misstate facts about a specific medication class, a public health topic, or a controversial therapy. In the worst case, that could be operationalized for financial fraud (steering toward unnecessary tests), reputational sabotage (undermining trust in guidelines), or public health manipulation.

    A mitigation approach: grounding models in structured biomedical knowledge

    The encouraging part of the Nature Medicine report is that it doesn’t just diagnose a vulnerability—it explores a potential countermeasure. According to the authors, using biomedical knowledge graphs as a harm-mitigation layer can help identify or dampen the effects of poisoned training signals. In plain terms, a knowledge graph can act like a structured reference map of biomedical entities and relationships—drugs, diseases, genes, contraindications—against which a model’s claims can be checked for consistency.

    That’s important because it reframes “alignment” in medicine. Generic guardrails—like refusing to answer certain questions—are blunt tools for a domain where nuanced, evidence-based answers are the goal. Knowledge-graph-driven mitigation points to a more clinical approach: not just blocking outputs, but validating biomedical plausibility and flagging statements that contradict established relationships.

    What this means for clinicians, health systems, and patients

    For healthcare professionals: expect more emphasis on provenance and validation. If an LLM is used for clinical decision support or patient communication, health systems will need to ask: What data trained this model? How was it filtered? What are the controls that prevent contamination? This is a shift from evaluating model performance to evaluating model lineage—akin to checking a medication supply chain, not just measuring outcomes.

    For health system leaders and AI governance teams: the findings argue for adversarial testing and continuous monitoring, not one-time model approval. Poisoning can be subtle, and model updates can reintroduce risk. Procurement processes may need to require documentation of training data governance, red-team results, and post-deployment surveillance—especially for models that influence clinical decisions or patient instructions.

    For patients: the big risk is misplaced trust. If patient-facing chatbots or after-visit-summary generators rely on compromised models, misinformation could affect adherence, self-triage, or medication use. The practical implication: patient-facing AI should be designed with transparent sourcing, easy escalation to humans, and conservative behavior around high-risk topics like dosing and emergent symptoms.

    The forward path: from “model safety” to “data integrity”

    The broader industry lesson is that medical LLM safety can’t be reduced to prompt rules and disclaimers. As reported by Nature Medicine, small training-data manipulations may be enough to produce harmful behavior, meaning the attack surface includes everything upstream: data acquisition, licensing, scraping, labeling, and preprocessing.

    Over the next year, expect three shifts. First, more hybrid systems that combine LLMs with structured biomedical sources—knowledge graphs, drug databases, guideline repositories—to constrain outputs. Second, a rise in “model auditability” as a differentiator: vendors that can prove data provenance and demonstrate resilience to poisoning will have an edge in regulated workflows. Third, regulators and accrediting bodies may start treating training data governance as a clinical safety issue, not merely an engineering detail.

    Medical AI is entering an era where the integrity of what models learn is as critical as the sophistication of the models themselves. The organizations that treat data as a protected clinical asset—monitored, traceable, and validated—will be best positioned to deploy LLMs responsibly at scale.

    Source: Nature Medicine (Nature)

  • Inside MUSC Health’s Push to Reduce OR Gridlock with AI-Driven Scheduling

    MUSC Health is turning to AI analytics to squeeze more capacity out of operating rooms that were already running near their limits—an effort aimed at reducing delays, improving schedule reliability, and creating breathing room in a surgical system strained by rising demand. The South Carolina health system’s experience underscores a broader reality across U.S. hospitals: when OR utilization stays consistently high, even small inefficiencies compound into late starts, cascading overruns, staff burnout, and patient frustration.

    According to Healthcare IT News, MUSC Health had seen surgical demand grow significantly over recent years, while its main OR location operated at persistently high utilization, leaving little flexibility to absorb day-to-day variability. That “tight” environment is exactly where analytics—especially models that can detect patterns invisible to manual review—can have outsized impact.

    Why OR scheduling has become a make-or-break operational problem

    The OR is one of the hospital’s most expensive and revenue-critical assets. Every minute of idle time is a cost; every minute of delay reverberates across anesthesia, nursing, sterile processing, inpatient bed management, and post-acute transitions. Yet OR scheduling remains notoriously difficult because it’s a classic “high stakes, high variability” system: case duration estimates are imperfect, emergent cases arrive unpredictably, staffing constraints shift, equipment availability changes, and downstream beds may not be ready.

    When a health system runs close to full utilization, these uncertainties stop being manageable exceptions and become routine disruptions. The result is a fragile schedule—one that looks fine on paper but breaks under real-world conditions. MUSC Health’s move to AI analytics, as reported by Healthcare IT News, reflects an operational shift: rather than relying solely on historical averages, manual tweaks, and institutional memory, organizations are increasingly applying data science to quantify variability and redesign schedules for resilience.

    What “AI analytics” can do differently

    While the phrase “AI” can mean many things in healthcare, OR optimization typically centers on advanced analytics and machine learning that improve predictions and decision-making. In practical terms, that can include:

    More accurate case-duration forecasting: Models can incorporate surgeon-specific patterns, procedure mix, patient factors, and historical variance—often outperforming blanket time blocks or simple averages.

    Identifying bottlenecks and root causes: Analytics can reveal whether delays primarily stem from late starts, turnover time, documentation workflows, equipment constraints, or inpatient bed availability.

    Smarter block utilization and release rules: Systems can highlight underused blocks earlier and recommend how and when to reallocate time to reduce unused capacity without triggering chaos.

    Scenario planning: Leaders can simulate operational changes (e.g., staffing shifts, new rooms, altered turnover workflows) before implementing them, using data rather than intuition alone.

    The key point: in a near-maxed OR environment like MUSC Health’s, marginal gains matter. Small improvements in start-time adherence or turnover predictability can translate into real increases in throughput—or, just as importantly, reduce the need for overtime and weekend catch-up.

    Implications for clinicians: less firefighting, more predictability

    For surgeons, anesthesiologists, nurses, and perioperative leaders, the promise of AI-enabled scheduling is not simply “more cases.” It’s fewer surprises. A schedule that reflects real variability can reduce last-minute room changes, decrease pressure to rush turnovers, and improve coordination with pre-op and PACU teams. Over time, this can support workforce sustainability—an underappreciated outcome in perioperative services, where burnout is fueled by chronic unpredictability and frequent late days.

    However, algorithmic scheduling also raises cultural and governance challenges. Clinicians may distrust models that appear to “black box” their workflows, especially if recommendations conflict with lived experience. Successful programs tend to pair analytics with transparency: clear performance metrics, the ability to audit model outputs, and shared accountability for process changes.

    Implications for patients: fewer delays and cancellations

    Patients experience OR inefficiency as a human problem: long waits, rescheduled procedures, and stressful day-of-surgery uncertainty. When systems run hot, a single delayed first case can cascade into afternoon cancellations—forcing patients to repeat fasting, time off work, travel logistics, and caregiver coordination.

    By improving schedule reliability, AI analytics can help reduce same-day cancellations and shorten time-to-procedure for elective surgeries. That matters clinically as well as emotionally. For some patients, delayed surgery can mean prolonged pain, limited mobility, and disease progression—especially in areas like oncology, cardiovascular care, and complex orthopedics.

    A sign of where “clinical AI” is headed

    MUSC Health’s initiative is a reminder that some of the most immediate ROI for healthcare AI may come from operational and clinical-operations intersections—not only from diagnostic algorithms. OR scheduling is a ripe target because the data is abundant (cases, times, staffing, outcomes), the financial stakes are high, and the improvements are measurable.

    Looking ahead, expect health systems to connect OR optimization with broader hospital flow—bed management, ED boarding, staffing models, and supply chain. The next wave won’t just predict how long a case will take; it will coordinate the entire perioperative “supply chain” from pre-admission testing to post-op discharge. As more organizations adopt these tools, the differentiator will be less about having AI and more about operational readiness: clean data, aligned incentives, and clinician trust.

    Source: Healthcare IT News — “MUSC Health uses AI analytics to gain OR scheduling efficiencies”

  • AI in Dermatology for Melanoma Detection: From Smartphone Scans to Clinic-Grade Decision Support

    AI in Dermatology for Melanoma Detection: From Smartphone Scans to Clinic-Grade Decision Support

    Melanoma accounts for a small fraction of skin cancer cases but a disproportionate share of skin cancer deaths, largely because outcomes depend heavily on catching disease early. Dermatology has long relied on visual pattern recognition—making it a natural fit for machine learning (ML) systems trained to detect malignancy from images. Over the past decade, AI for melanoma detection has matured from research prototypes into a growing ecosystem of tools that support triage, documentation, and clinical decision-making. Still, the best results come not from “AI replacing dermatologists,” but from careful integration into workflows—paired with rigorous validation, bias testing, and clear guardrails for patient safety.

    Why dermatology is an early proving ground for clinical AI

    Dermatology is image-rich and comparatively standardized: lesions can be photographed with consumer smartphones, dermatoscopes, or high-resolution clinical cameras. That makes it possible to build large labeled datasets for supervised learning and to evaluate model performance in controlled test sets. In practice, melanoma detection AI tends to fall into three overlapping categories:

    • Consumer-facing risk assessment: smartphone apps and camera-based tools that estimate whether a mole looks suspicious.
    • Clinical decision support: AI that helps clinicians triage lesions, prioritize referrals, or support biopsy decisions.
    • Workflow and documentation: tools that standardize imaging, track lesions over time, and integrate with the EHR.

    Most melanoma-focused AI systems use deep convolutional neural networks (CNNs) or vision transformers trained on dermoscopic images, clinical photos, or both. Performance is usually reported using metrics like sensitivity (catching true melanomas), specificity (avoiding false alarms), and area under the ROC curve (AUC).

    What the research says: strong benchmarks, messy real-world deployment

    Academic momentum accelerated after widely cited work showed that deep learning models could match or exceed dermatologist-level performance on curated image sets. A landmark example is the 2017 Nature paper by Esteva and colleagues, which trained a CNN on a large dataset of skin lesion images and reported performance comparable to dermatologists on benchmark tasks (Nature, 2017). That study helped set the narrative—but it also highlighted a recurring challenge: models can look excellent on benchmark datasets yet stumble when exposed to the variability of real-world practice.

    More recent research has focused on generalizability (does the model work across different devices, lighting, and clinical sites?), fairness (does performance hold across skin tones and demographic groups?), and prospective validation (does it improve outcomes in real clinical workflows?). Researchers have also explored hybrid approaches that combine dermoscopic images with clinical metadata (age, lesion location, personal history), which can improve discrimination but complicates deployment because structured data quality varies across settings.

    Key technical and clinical friction points

    • Dataset shift: Training images are often captured under ideal conditions; real images include blur, glare, occlusion, and inconsistent framing.
    • Label noise: Even biopsy-confirmed labels have nuance (atypical nevi, borderline lesions), while many datasets rely on clinician assessment rather than histopathology.
    • Skin tone representation: Underrepresentation of darker skin in dermatology datasets can degrade accuracy and increase missed diagnoses in groups already facing disparities.
    • Clinical thresholds: A “good” AUC can still be unsafe if the chosen operating point misses melanoma or generates unmanageable false positives.

    Google Lens and the consumerization of visual search

    When people find something unfamiliar on their skin, many now start with a smartphone. While Google Lens is not a regulated medical device and is marketed as a general-purpose visual search tool, it has become part of the de facto consumer health pathway: users take photos and search for visually similar images. This matters clinically for two reasons. First, it can influence patient anxiety, self-triage, and timing of care-seeking. Second, it underscores a broader trend: image-based AI is increasingly ambient—embedded in everyday tools rather than limited to clinical software.

    From a safety perspective, general-purpose image search is not the same as clinical AI: it may return look-alike images without calibrated risk estimates, clinical context, or guidance on urgency. Dermatology clinics are already seeing the downstream effect—patients arrive with screenshots and strong expectations. The opportunity for clinical AI is to provide a safer bridge: validated tools that can prompt urgent evaluation for high-risk lesions, while discouraging false reassurance.

    Notable projects and new directions in melanoma AI

    Melanoma detection has moved beyond “single-image classification” toward systems that better reflect clinical practice.

    1) Multi-modal and longitudinal models

    Newer projects aim to combine dermoscopy with standard clinical photos and patient metadata, and to track lesions over time. Longitudinal comparison—detecting change in size, color, or border irregularity—mirrors how dermatologists monitor atypical moles and can reduce unnecessary biopsies. This also aligns with the growing interest in foundation models for medical imaging, which can be fine-tuned for specific tasks like pigmented lesion classification.

    2) Prospective and workflow-integrated evaluation

    The field is increasingly emphasizing prospective studies and clinic-based pilots over retrospective benchmark performance. These evaluations ask practical questions: Does AI reduce time-to-biopsy for true melanomas? Does it change clinician decision-making? Does it overload clinics with false positives? And how does it perform on diverse populations and devices?

    3) Dermoscopy quality control and “human-in-the-loop” design

    A quiet but important innovation is AI that checks image quality (focus, illumination, framing) before analysis. Another is decision support that explains model attention (e.g., saliency maps) and communicates uncertainty. In many clinics, the safest pattern is a human-in-the-loop approach: AI flags concerning lesions and supports documentation, while clinicians retain diagnostic responsibility and determine whether biopsy is warranted.

    Clinical impact: where AI helps today

    When deployed responsibly, melanoma AI can deliver measurable benefits:

    • Triage support: prioritizing high-risk referrals and reducing time-to-specialist for suspicious lesions.
    • Decision support: helping clinicians—especially non-dermatologists—decide when to refer or biopsy.
    • Access and scalability: supporting teledermatology by standardizing image capture and pre-screening large volumes of cases.
    • Consistency: reducing variability in assessments between clinicians and across sites.

    These gains are especially relevant in primary care and underserved areas where dermatology shortages can delay evaluation.

    Safety, regulation, and the risk of overconfidence

    Melanoma is a high-stakes target: the cost of a false negative can be life-threatening, while excessive false positives can drive unnecessary biopsies, scarring, anxiety, and system burden. For publication-grade and clinical-grade AI, the most important questions are not just “How accurate is it?” but:

    • Validated on what population? Including a representative range of skin tones, ages, and lesion types.
    • Validated in what setting? Dermoscopy images from specialty clinics may not match smartphone photos from primary care.
    • What’s the intended use? Consumer triage vs. clinician decision support require different thresholds and messaging.
    • How is uncertainty handled? Systems should fail safely, prompting clinical evaluation when confidence is low.

    Regulators have increasingly emphasized transparency around intended use, performance evidence, and post-market monitoring for AI/ML-based software. For healthcare organizations, governance also includes model monitoring (detecting performance drift), cybersecurity protections for image data, and clear patient communication that AI is assistive—not definitive.

    What to watch next

    Three developments will shape the next phase of melanoma detection AI:

    • Foundation models in dermatology: larger pre-trained vision models fine-tuned for lesion analysis, potentially improving robustness across devices and settings.
    • Better equity benchmarks: standardized reporting across Fitzpatrick skin types and demographic groups, moving fairness from a footnote to a requirement.
    • Integrated care pathways: AI that links detection to action—streamlined referral, telederm consult, and follow-up—rather than standalone “risk scores.”

    In parallel, consumer tools like Google Lens will continue to influence how patients interpret skin changes. That makes it even more important for clinical AI developers—and healthcare systems—to provide validated, context-aware alternatives that encourage timely care without amplifying misinformation or false reassurance.

    Bottom line

    AI for melanoma detection is one of the most promising and visible applications of clinical computer vision. The science has advanced well beyond proof-of-concept, with strong benchmark performance and an expanding range of real-world pilots. The next leap will depend on prospective evidence, equitable performance across skin tones, and practical integration into care pathways—so AI improves outcomes, not just accuracy charts.

    References (selected): Esteva A. et al., “Dermatologist-level classification of skin cancer with deep neural networks,” Nature (2017). Google Lens (Google) as a general-purpose visual search product frequently used by consumers for image-based queries; not a regulated diagnostic tool.

  • FDA Clears First AI System for Autonomous Stroke Detection in Emergency Departments

    FDA Clears First AI System for Autonomous Stroke Detection in Emergency Departments

    The U.S. Food and Drug Administration has granted De Novo clearance to NeuralStroke AI, a deep learning system that autonomously detects large vessel occlusion (LVO) strokes on CT angiography scans without requiring radiologist confirmation before alerting the stroke team.

    This marks the first time the FDA has authorized an AI system to operate fully autonomously in the acute stroke pathway — a significant departure from previous clearances that positioned AI as a decision-support tool requiring physician oversight.

    How It Works

    The system integrates directly with hospital CT scanners. When a CTA scan is completed, NeuralStroke AI processes the images in under 90 seconds. If an LVO is detected with high confidence, the system simultaneously alerts the on-call neurointerventionalist, activates the stroke protocol, and sends annotated images to the care team’s mobile devices.

    “Time is brain in stroke care,” said Dr. Maria Chen, chief of neurology at Boston General Hospital and principal investigator of the pivotal trial. “Every minute of delay in treatment means roughly 1.9 million neurons lost. Having AI cut through the traditional notification chain can be the difference between a patient walking out of the hospital and a patient needing lifelong care.”

    Clinical Trial Results

    The FDA clearance was based on a multicenter prospective trial involving 4,200 patients across 28 emergency departments. Key findings include:

    • Sensitivity: 97.3% for LVO detection (compared to 94.1% for on-call radiologists)
    • Specificity: 96.8% (false positive rate of 3.2%)
    • Time to notification: Median 4.2 minutes from scan completion vs. 38 minutes in standard workflow
    • Clinical impact: 26% reduction in door-to-groin-puncture time at sites using the system

    Regulatory Implications

    The De Novo pathway classification creates a new regulatory category for autonomous AI in emergency settings. This could pave the way for similar autonomous AI systems in other time-critical diagnoses, including pulmonary embolism, aortic dissection, and intracranial hemorrhage.

    The FDA has specified post-market surveillance requirements, including mandatory reporting of false negatives and a real-world performance study across 50 additional sites over the next three years.

    Industry Reaction

    The clearance has drawn attention from both enthusiasts and skeptics of autonomous AI. Dr. James Park, a neuroradiologist at Stanford, called it “a carefully validated step forward,” noting that the LVO detection use case is particularly well-suited for autonomy because of the unambiguous imaging findings and the extreme time sensitivity.

    Others have raised concerns about liability and the potential for over-reliance on AI in settings where image quality varies widely. NeuralStroke AI includes a confidence calibration system that flags borderline cases for immediate radiologist review rather than acting autonomously.