GPT-4o found accurate for evaluating examinees' performance on CPR skills tests, claims research

Published On 2024-12-20 15:45 GMT | Update On 2024-12-20 15:45 GMT

Research on large language models (LLMs) in the healthcare sector has shown their promising advantages. For instance, following the launch of ChatGPT, notable advancements have been achieved in addressing medical inquiries concerning cancer screening, pathological classification, and public health topics during medical Q&A sessions . Recent study aimed to evaluate the suitability of GPT-4o for scoring examinees' performance on cardiopulmonary resuscitation (CPR) skills tests. Six experts reviewed CPR skills test videos of 103 examinees, which were also automatically assessed by GPT-4o across four sections: patient assessment, chest compressions, rescue breathing, and repeated operations. The experts rated GPT-4o's reliability on a Likert scale and compared the agreement between GPT-4o's scores and experts' scores.

Evaluation of GPT-4o Performance

The results showed that GPT-4o achieved accuracy scores similar to senior experts in patient assessment, chest compressions, and rescue breathing, with lower accuracy in repeated operations. The reliability ratings given by experts were generally high for GPT-4o. The study highlighted the potential of using GPT-4o in medical examination settings based on its accuracy and reliability in evaluating CPR skills exam videos.

Utility of Large Language Models in Healthcare

The use of large language models (LLMs) in healthcare, such as GPT-4o, has shown progress in various medical tasks, including responding to medical queries, generating clinical records, and achieving proficiency in text-based medical scenarios. Previous studies have assessed LLMs in medical examinations, revealing mixed results in meeting passing requirements for certain exams. While opinions on LLMs in medicine vary, the study demonstrated the potential for GPT-4o in medical examination scenarios.

AI Technology in Medical Education

The study employed AI technology and LLMs to enhance medical education and examination processes. GPT-4o's ability to assess CPR skills videos accurately and reliably suggests its potential as an examiner in clinical skill practice exams. The findings indicate that GPT-4o could improve the efficiency and accuracy of examination scoring, particularly for practical assessments like CPR skills tests. Overall, this research sheds light on the promising role of AI, specifically GPT-4o, in medical examination settings for evaluating practical skills of examinees.

Key Points

1. The study assessed the performance of GPT-4o in scoring examinees' CPR skills test videos, using a methodology where six experts reviewed the videos and compared their ratings with those generated by GPT-4o. The evaluation covered four sections: patient assessment, chest compressions, rescue breathing, and repeated operations.

2. GPT-4o demonstrated accuracy levels similar to senior experts in patient assessment, chest compressions, and rescue breathing, albeit with lower accuracy in repeated operations. Experts generally rated the reliability of GPT-4o highly, indicating its potential for medical examination settings, specifically in evaluating CPR skills exam videos.

3. The study discussed the utility of large language models (LLMs) in healthcare, exemplified by GPT-4o, which has shown promise in various medical tasks like responding to medical queries, generating clinical records, and performing well in text-based medical scenarios. Previous evaluations of LLMs in medical examinations have yielded mixed results, but the study showcased the potential of GPT-4o in such scenarios.

4. Utilizing AI technology and large language models like GPT-4o can enhance medical education and examination processes. GPT-4o's ability to accurately and reliably assess CPR skills videos suggests its suitability as an examiner for clinical skill practice exams, potentially improving the efficiency and accuracy of examination scoring, especially for practical assessments such as CPR skills tests.

5. The research highlighted the promising role of artificial intelligence, specifically GPT-4o, in medical examination settings for evaluating the practical skills of examinees. By leveraging AI technology, institutions can potentially streamline and standardize assessment processes, providing a more objective and consistent evaluation of medical skills.8

6. Overall, the study underscored the potential benefits of incorporating AI technology, particularly large language models like GPT-4o, in medical education and examination settings. The findings suggest that AI-driven assessment tools can enhance the objectivity, accuracy, and efficiency of evaluating practical skills in medical scenarios, paving the way for advancements in medical education and assessment practices.

Reference –

Lu Wang et al. (2024). Suitability Of GPT-4o As An Evaluator Of Cardiopulmonary Resuscitation Skills Examinations.. *Resuscitation*, 110404 . https://doi.org/10.1016/j.resuscitation.2024.110404.

Disclaimer: This website is primarily for healthcare professionals. The content here does not replace medical advice and should not be used as medical, diagnostic, endorsement, treatment, or prescription advice. Medical science evolves rapidly, and we strive to keep our information current. If you find any discrepancies, please contact us at corrections@medicaldialogues.in. Read our Correction Policy here. Nothing here should be used as a substitute for medical advice, diagnosis, or treatment. We do not endorse any healthcare advice that contradicts a physician's guidance. Use of this site is subject to our Terms of Use, Privacy Policy, and Advertisement Policy. For more details, read our Full Disclaimer here.

NOTE: Join us in combating medical misinformation. If you encounter a questionable health, medical, or medical education claim, email us at factcheck@medicaldialogues.in for evaluation.

Dr Monish Raut

Dr Monish Raut is a practicing Cardiac Anesthesiologist. He completed his MBBS at Government Medical College, Nagpur, and pursued his MD in Anesthesiology at BJ Medical College, Pune. Further specializing in Cardiac Anesthesiology, Dr Raut earned his FNB in Cardiac Anesthesiology from Sir Ganga Ram Hospital, Delhi.

Comments Policy

Our comments section is governed by our Comments Policy . By posting comments at Medical Dialogues you automatically agree with our Comments Policy , Terms And Conditions and Privacy Policy .

GPT-4o found accurate for evaluating examinees' performance on CPR skills tests, claims research

Similar News

Inhaled Isoflurane Matches Midazolam for Pediatric ICU Sedation, Offers Additional Benefits: Study

Acetaminophen Linked to Lower Risk of ICU Delirium in Critically ill Patients: Study Shows

Excessive daytime sleepiness may raise risk of cognitive problems after surgery, reveals study

Anesthesiologist-Led Care Speeds Hip-Fracture Surgery and Reduces Complications: Study

Children fast from clear liquids much longer before surgery than guidelines recommend, large study shows

Epidural Analgesia Lowers Risk of Severe Perineal Tears in First-Time Mothers: Study

Cesarean delivery linked to higher risk of pain and sleep problems after childbirth: Study

Lidocaine poisonings rise despite overall drop in local anesthetic toxicity, suggests study