Publication: Who Is More Successful in a Spinal Surgery Examination? ChatGPT-3.5/4.0 or a Resident Doctor
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Objective: As in all work sectors, artificial intelligence (AI) is now often used and has increased especially in the field of medicine with advances in technology. The aim of this study was to compare the responses given by Chat Generative Pre-trained Transformer (ChatGPT)-4.0, ChatGPT-3.5, and orthopaedics and traumatology residents to the Turkish Orthopedics and Traumatology Education Council (TOTEK) questions about the spine. Materials and Methods: A total of 15 residents in the orthopaedics and traumatology clinic of a tertiary-level university hospital participated in an examination consisting of questions only related to the spine. The same questions were asked to ChatGPT-3.5 and ChatGPT-4.0 on two different days. The examination consisted of true/false questions, theoretical/classical and diagram/visual sections, with each section scored from 100 points. The average score was calculated and the results were evaluated by two instructors. Results: The mean score obtained was 72.88 for ChatGPT-3.5 (p=0.005) and 69.38 for Chat GPT-4.0 (p=0.001), showing a 5.87% difference in success. The mean score obtained by the orthopaedic residents was 69.90 (p=0.779). Both the 3.5 and 4.0 versions of ChatGPT AI were observed to have a knowledge level equivalent to that of a 3rd year resident. Conclusion: The 4th and 5th year orthopaedic residents were able to answer more questions correctly than ChatGPT-3.5 and GPT-4 on the spine assessment questions. Both ChatGPT-3.5 and GPT-4 performed better on text-only questions than on visual questions. It is unlikely that GPT-4 or ChatGPT-3.5 would pass the TOTEK written examination.
Description
Keywords
Citation
WoS Q
Scopus Q
Q4
Source
Journal of Turkish Spinal Surgery
Volume
36
Issue
2
Start Page
88
End Page
91
