GPT-4o found accurate for evaluating examinees' performance on CPR skills tests, claims research

Written by Dr Monish Raut Published On 2024-12-20T21:15:30+05:30 | Updated On 20 Dec 2024 9:15 PM IST

Doctors shocked over viral video of TTE performing CPR on awake passenger

Research on large language models (LLMs) in the healthcare sector has shown their promising advantages. For instance, following the launch of ChatGPT, notable advancements have been achieved in addressing medical inquiries concerning cancer screening, pathological classification, and public health topics during medical Q&A sessions . Recent study aimed to evaluate the suitability of GPT-4o for scoring examinees' performance on cardiopulmonary resuscitation (CPR) skills tests. Six experts reviewed CPR skills test videos of 103 examinees, which were also automatically assessed by GPT-4o across four sections: patient assessment, chest compressions, rescue breathing, and repeated operations. The experts rated GPT-4o's reliability on a Likert scale and compared the agreement between GPT-4o's scores and experts' scores.

Evaluation of GPT-4o Performance

The results showed that GPT-4o achieved accuracy scores similar to senior experts in patient assessment, chest compressions, and rescue breathing, with lower accuracy in repeated operations. The reliability ratings given by experts were generally high for GPT-4o. The study highlighted the potential of using GPT-4o in medical examination settings based on its accuracy and reliability in evaluating CPR skills exam videos.

Utility of Large Language Models in Healthcare

The use of large language models (LLMs) in healthcare, such as GPT-4o, has shown progress in various medical tasks, including responding to medical queries, generating clinical records, and achieving proficiency in text-based medical scenarios. Previous studies have assessed LLMs in medical examinations, revealing mixed results in meeting passing requirements for certain exams. While opinions on LLMs in medicine vary, the study demonstrated the potential for GPT-4o in medical examination scenarios.

AI Technology in Medical Education

The study employed AI technology and LLMs to enhance medical education and examination processes. GPT-4o's ability to assess CPR skills videos accurately and reliably suggests its potential as an examiner in clinical skill practice exams. The findings indicate that GPT-4o could improve the efficiency and accuracy of examination scoring, particularly for practical assessments like CPR skills tests. Overall, this research sheds light on the promising role of AI, specifically GPT-4o, in medical examination settings for evaluating practical skills of examinees.

Key Points

1. The study assessed the performance of GPT-4o in scoring examinees' CPR skills test videos, using a methodology where six experts reviewed the videos and compared their ratings with those generated by GPT-4o. The evaluation covered four sections: patient assessment, chest compressions, rescue breathing, and repeated operations.

2. GPT-4o demonstrated accuracy levels similar to senior experts in patient assessment, chest compressions, and rescue breathing, albeit with lower accuracy in repeated operations. Experts generally rated the reliability of GPT-4o highly, indicating its potential for medical examination settings, specifically in evaluating CPR skills exam videos.

3. The study discussed the utility of large language models (LLMs) in healthcare, exemplified by GPT-4o, which has shown promise in various medical tasks like responding to medical queries, generating clinical records, and performing well in text-based medical scenarios. Previous evaluations of LLMs in medical examinations have yielded mixed results, but the study showcased the potential of GPT-4o in such scenarios.

4. Utilizing AI technology and large language models like GPT-4o can enhance medical education and examination processes. GPT-4o's ability to accurately and reliably assess CPR skills videos suggests its suitability as an examiner for clinical skill practice exams, potentially improving the efficiency and accuracy of examination scoring, especially for practical assessments such as CPR skills tests.

5. The research highlighted the promising role of artificial intelligence, specifically GPT-4o, in medical examination settings for evaluating the practical skills of examinees. By leveraging AI technology, institutions can potentially streamline and standardize assessment processes, providing a more objective and consistent evaluation of medical skills.8

6. Overall, the study underscored the potential benefits of incorporating AI technology, particularly large language models like GPT-4o, in medical education and examination settings. The findings suggest that AI-driven assessment tools can enhance the objectivity, accuracy, and efficiency of evaluating practical skills in medical scenarios, paving the way for advancements in medical education and assessment practices.

Reference –

Lu Wang et al. (2024). Suitability Of GPT-4o As An Evaluator Of Cardiopulmonary Resuscitation Skills Examinations.. *Resuscitation*, 110404 . https://doi.org/10.1016/j.resuscitation.2024.110404.

GPT-4oCardiopulmonary resuscitationMedical educationExamination

Dr Monish Raut

MBBS, MD (Anaesthesiology), FNB (Cardiac Anaesthesiology)

Dr Monish Raut is a practicing Cardiac Anesthesiologist. He completed his MBBS at Government Medical College, Nagpur, and pursued his MD in Anesthesiology at BJ Medical College, Pune. Further specializing in Cardiac Anesthesiology, Dr Raut earned his FNB in Cardiac Anesthesiology from Sir Ganga Ram Hospital, Delhi.