United States: According to the latest findings in the United States, it appears that the addition of generative AI does not in fact diminish the burnout of the health professionals as was expected.
Beforehand observations have pointed out to the stress suffered by doctors because of the excessive use of electronic health records (EHR) systems and administrative duties.
Hence, artificial intelligence has been called the potential solution for modern-day problems. Nevertheless, recent experiences within US healthcare systems have revealed limited effectiveness of large language models (LLMs) in aiding clinicians’ daily operations.
Exploring Artificial Intelligence
An observational study performed at Brigham and Women’s Hospital in Boston, Massachusetts, in 2023 studied the effect of utilizing AI for the purpose of electronic patient messaging.
Researchers employed an LLM to respond to simulated queries from cancer patients, comparing its output with responses crafted by six board-certified radiation oncologists.
Subsequently, medical professionals refined the AI-generated responses to ensure they met clinical standards before dispatching them to patients.
Published in The Lancet Digital Health, the study identified potential risks associated with LLM-generated responses, including instances of severe harm and even fatalities.
Further, the medical professionals edited the responses, which were AI-generated, into “clinically acceptable” answers to send to patients.
The study reports were published in The Lancet Digital Health. It identified that the LLM drafts posed “a risk of severe harm in 11 of 156 survey responses and death in one survey response.”
The researchers also wrote, “The majority of harmful responses were due to incorrectly determining or conveying the acuity of the scenario and recommended action,” Fox News reported.
Furthermore, the study concluded, “These early findings … indicate the need to thoroughly evaluate LLMs in their intended clinical contexts, reflecting the precise task and level of human oversight.”
Medical Coding Challenges
Another investigation conducted at New York’s Mount Sinai Health System evaluated the performance and error patterns of four types of LLMs in querying medical billing codes.
Published in NEJM AI, the study revealed subpar performance of LLMs in medical code querying, often generating imprecise or erroneous codes.
Consequently, the study emphasized the unsuitability of LLMs for medical coding tasks without further research, citing implications for billing accuracy, clinical decision-making, and healthcare policies.
Patient Communication and Physician Time
A study published in JAMA Network, conducted at the University of California San Diego School of Medicine, examined AI-generated responses to patient messages and the subsequent editing time required by physicians.
The study found that AI-drafted replies resulted in increased reading time and reply length but did not significantly alter reply time. Additionally, only partial benefits were perceived.
Physician Insights on AI
David Atashroo, Chief Medical Officer of Qventus, an AI-driven surgical management solution in Mountain View, California, commented on the research findings.
Atashroo emphasized the potential of AI to streamline routine tasks traditionally handled by healthcare support staff, thereby enhancing overall clinical efficiency.
However, he cautioned against expecting perfection, highlighting the fallibility of both AI systems and human operators.
Atashroo stressed the importance of transparency in AI development and implementation to foster trust among healthcare stakeholders and patients alike.