Abstract 4145543: ChatGPT-4 Improves Readability of Institutional Heart Failure Patient Education Materials

Circulation, Volume 150, Issue Suppl_1, Page A4145543-A4145543, November 12, 2024. Introduction:Heart failure consists of complex management involving lifestyle modifications such as daily weights, fluid and sodium restriction, and blood pressure monitoring placing additional responsibility on patients and caregivers. Successful adherence requires comprehensive counseling and understandable patient education materials (PEMs). Prior research has shown that many PEMs related to cardiovascular disease exceed the American Medical Association’s 5th-6thgrade recommended reading level. The large language model (LLM) Chat Generative Pre-trained Transformer (ChatGPT) may be a useful adjunct resource for patients with heart failure to bridge this gap.Research Question:Can ChatGPT-4 improve heart failure institutional PEMs to meet the AMA’s recommended 5th-6thgrade reading level while maintaining accuracy and comprehensiveness?Methods:There were 143 heart failure PEMs collected from the websites of the top 10 institutions listed on the 2022-2023 US News&World Report for “Best Hospitals for Cardiology, Heart&Vascular Surgery”. The PEMs of each institution were entered into ChatGPT-4 (Version updated 20 July 2023) preceded by the prompt “please explain the following in simpler terms”. The readability of the institutional PEM and ChatGPT prompted response were both assessed usingTextstatlibrary in Python and theTextstat readabilitypackage in R software. The accuracy and comprehensiveness of each response were also assessed by a board-certified cardiologist.Results:The average Flesch-Kincaid grade reading level was 10.3 (IQR: 7.9, 13.1) vs 7.3 (IQR: 6.1, 8.5) for institutional PEMs and ChatGPT responses (p< 0.001), respectively. There were 13/143 (9.1%) institutional PEMs meeting a 6thgrade reading level which improved to 33/143 (23.1%) after prompting by ChatGPT-4. There was also a significant difference found for each readability metric assessed when comparing institutional PEMs with ChatGPT-4 responses (p

Read More
Novembre 2024

Abstract 4142343: My AI Ate My Homework: Measuring ChatGPT Performance on the American College of Cardiology Self-Assessment Program

Circulation, Volume 150, Issue Suppl_1, Page A4142343-A4142343, November 12, 2024. Background:Artificial intelligence (AI) is a rapidly growing field with promising utility in health care. ChatGPT is a language learning model by OpenAI trained on extensive data to comprehend and answer a variety of questions, commands, and prompts. Despite the promise AI offers, there are still glaring deficiencies.Methods:310 questions from the American College of Cardiology Self-Assessment Program (ACCSAP) question bank were queried. 50% of questions from each of the following sections were randomly selected: coronary artery disease, arrhythmias, valvular disease, vascular disease, systemic hypertension and hypotension, pericardial disease, systemic disorders affecting the circulatory system, congenital heart disease, heart failure and cardiomyopathy, and pulmonary circulation. Questions were fed into ChatGPT Legacy 3.5 version with and without answer choices and the accuracy of its responses were recorded. Statistical analysis was performed using Microsoft Excel statistical package.Results:Human respondents were 77.86% accurate +/- 16.01% with an IQR of 21.08% on average. Without answer choice prompting, ChatGPT was correct 57.93% and inconclusive 7.77% of the time. When prompted with answer choices, ChatGPT was correct only 20.91% and inconclusive 14.55% of the time. Additionally, an average of 55.47% +/- 35.55% of human respondents with an IQR of 73.19% selected the same answer choice as ChatGPT. Finally, on a scale of 1 to 5, with 1 being the most picked and 5 being the least picked, human respondents selected the same response as ChatGPT an average of 1.66 out of 5. 30.32% or 94 of the 310 questions contained images in the question stem. Only 0.65% or 2 out of the 310 questions contained images in the answer choices.Conclusion:To our knowledge, the performance of ChatGPT in the field of cardiology board preparation is limited. Our analysis shows that while AI software has become increasingly more comprehensive, progress is still needed to accurately answer complex medical questions.

Read More
Novembre 2024

Abstract 4116844: Simplifying Cardiology Research Abstracts: Assessing ChatGPT's Readability and Comprehensibility for Non-Medical Audiences

Circulation, Volume 150, Issue Suppl_1, Page A4116844-A4116844, November 12, 2024. Background:Artificial Intelligence (AI)-powered chatbots like ChatGPT are increasingly used in academic medical settings to help with tasks such as evidence synthesis and manuscript drafting. They have shown potential in simplifying complex medical texts for non-medical audiences like patients and journalists. However, less is known about whether simplified texts may exclude important information or be of interest to patients or other non-medically-trained people such as journalists.Objective:This study aims to assess ChatGPT’s capacity to simplify cardiology research abstracts by eliminating jargon and enhancing universal comprehension.Methods:We analyzed all abstracts and scientific statements published from July to November 2023 inCirculation(n=113). These abstracts were processed through ChatGPT with the prompt: “Please rewrite the following text to be comprehensible at a 5th-grade reading level. Retain all original information and exclude nothing”. We assessed the readability of both original and simplified texts using Flesch-Kincaid Grade Level (FKGL) and Reading Ease (FKRE) scores. Additionally, a panel of five physicians and five laypeople evaluated these texts for completeness, accuracy, and understandability.Results:ChatGPT transformation or abstracts reduced the required reading level from a college graduate to 8-9th grade by both FKGL (18.3 to 8.6; p

Read More
Novembre 2024

Abstract 4134824: Evaluation of ChatGPT-4.0 and Google Bard ‘s Capabilities in Clinical Decision Support in Cardiac Electrophysiology

Circulation, Volume 150, Issue Suppl_1, Page A4134824-A4134824, November 12, 2024. Background:ChatGPT-4.0 and Bard have shown clinical decision support (CDS) potential in general medicine, but their role in EP is unknown. This study aims to evaluate ChatGPT and Bard’s CDS potential by assessing their accuracy in multiple-choice questions (MCQs), guideline recommendations (GRs) and treatment (Tx) suggestions.Methods:Two chatbots were tested with 15 clinical vignettes (CVs) and 47 case-related MCQs from Heart Rhythm Case Reports, focusing on ablation, arrhythmia and CIEDs management. CVs included narrative diagnostic images results. 3 tasks were performed: 1) Generating GRs, rated 0 for incorrect or correct but irrelevant to the primary problem, 0.5 for correct for the primary problem, 1 for case-specific (CS) if relevant to both the primary problem and concomitant conditions (e.g. afib with HF); 2) Suggesting Tx steps, scored 0 for incorrect, 0.5 for correct and 1 for CS. Tx was deemed correct if referenced in the case or guidelines, and CS if used in the case. For Tx responses not CS, a prompt was provided before reassessment. The prompt included one similar CV and its Tx from PubMed case reports. 3) Answering MCQs, rated 1 for correct and 0 for incorrect. Welch’s T-test was used for analysis.Results:Bard outperformed ChatGPT in generating CS-GRs (P = 0.01). However, there was no significant difference in CS-Tx suggestions with a prompt (P value = 0.12, Figure 1C) or without a prompt (P value = 0.59, Figure 1A). When prompted for non-CS-Tx responses, ChatGPT significantly improved from 0.66 to 0.93 (P value = 0.02), suggesting an enhanced ability to provide CS-Tx plans post-prompt. In contrast, Bard showed no notable improvement (0.73 vs. 0.76, P value = 0.79, Figure 1B). Both chatbots demonstrated similar MCQ accuracy, with scores below 70%, indicating EP training gaps or the need for prompts to activate existing knowledge.Conclusion:This study showed Bard’s superiority in generating GRs and ChatGPT’s remarkable improvement in suggesting Tx when external knowledge is provided, revealing their CDS potential in specialized fields.

Read More
Novembre 2024