ChatGPT-3.5 Shows Moderate Accuracy in Medical Genetics Exam, reveals research

Written By :  Dr Riya Dave
Medically Reviewed By :  Dr. Kamal Kant Kohli
Published On 2025-08-01 14:45 GMT   |   Update On 2025-08-01 14:45 GMT
Advertisement

Researchers have found in a new study that ChatGPT-3.5 performed with moderate accuracy and stable results on a specialist medical genetics exam, indicating potential for educational support. However, its limitations in handling complex, domain-specific reasoning highlight the need for continued advancement before wider application. The study was published in Laboratory Medicine journal by Klaudia P. and colleagues.

To test the performance of ChatGPT-3.5, scientists chose 456 available questions from the Polish national specialist exam in medical laboratory genetics that are available online. The questions were classified by topic, genetic changes, diagnostic techniques, clinical case, and calculations, and by complexity (simple, complex). Each question was asked three times on ChatGPT to test not just the correctness but also the reliability of responses across multiple rounds of interaction. Statistical tests were then used to compare differences in performance by category, level of complexity, and repeatability.

Advertisement

Key Findings

  • Overall Accuracy: ChatGPT correctly answered 59% of the questions, statistically significant (P < 0.001).

By Category:

  • Calculation-based questions: 71% accurate

  • Genetic methods and genetic changes: ~60% accuracy

  • Clinical case–based questions: 37% accuracy

By Complexity:

  • Simple questions: 63% accurate

  • Complex questions: 43% accuracy (P = .001)

  • Consistency: The AI model had consistent performance throughout three repeated sessions (P = 0.43), which reflects reliability in output even when being asked repeatedly.

This research concluded that although ChatGPT-3.5 performs moderate accuracy and stable performance in responding to medical laboratory genetics exam questions, it lags behind in dealing with complex and clinical case–based reasoning. Consequently, its version at present may assist in education but is not yet adequate for advanced or high-stakes implementation in genetic medicine. Further advancement in AI reasoning and domain adaptation will be required before these tools can be introduced to professional medical education or practice with confidence.

Reference:

Klaudia Paruzel, Michał Ordak, Assessment of ChatGPT-3.5 performance on the medical genetics specialist exam, Laboratory Medicine, 2025;, lmaf038, https://doi.org/10.1093/labmed/lmaf038

Tags:    
Article Source : Laboratory Medicine

Disclaimer: This website is primarily for healthcare professionals. The content here does not replace medical advice and should not be used as medical, diagnostic, endorsement, treatment, or prescription advice. Medical science evolves rapidly, and we strive to keep our information current. If you find any discrepancies, please contact us at corrections@medicaldialogues.in. Read our Correction Policy here. Nothing here should be used as a substitute for medical advice, diagnosis, or treatment. We do not endorse any healthcare advice that contradicts a physician's guidance. Use of this site is subject to our Terms of Use, Privacy Policy, and Advertisement Policy. For more details, read our Full Disclaimer here.

NOTE: Join us in combating medical misinformation. If you encounter a questionable health, medical, or medical education claim, email us at factcheck@medicaldialogues.in for evaluation.

Our comments section is governed by our Comments Policy . By posting comments at Medical Dialogues you automatically agree with our Comments Policy , Terms And Conditions and Privacy Policy .

Similar News