Who Is More Successful in a Spinal Surgery Examination? ChatGPT-3.5/4.0 or a Resident Doctor

Kaya, Ozcan; Dincer, Recep; Coskun, Huseyin Sina; Karapınar, Sefa Erdem

doi:10.4274/jtss.galenos.2025.15870

Publication:
Who Is More Successful in a Spinal Surgery Examination? ChatGPT-3.5/4.0 or a Resident Doctor

dc.contributor.author	Kaya, Ozcan
dc.contributor.author	Dincer, Recep
dc.contributor.author	Coskun, Huseyin Sina
dc.contributor.author	Karapınar, Sefa Erdem
dc.date.accessioned	2025-12-11T01:44:12Z
dc.date.issued	2025
dc.department	Ondokuz Mayıs Üniversitesi	en_US
dc.department-temp	Sağlık Bilimleri Üniversitesi,Süleyman Demirel Üniversitesi,Ondokuz Mayıs Üniversitesi,Süleyman Demirel Üniversitesi	en_US
dc.description.abstract	Objective: As in all work sectors, artificial intelligence (AI) is now often used and has increased especially in the field of medicine with advances in technology. The aim of this study was to compare the responses given by Chat Generative Pre-trained Transformer (ChatGPT)-4.0, ChatGPT-3.5, and orthopaedics and traumatology residents to the Turkish Orthopedics and Traumatology Education Council (TOTEK) questions about the spine. Materials and Methods: A total of 15 residents in the orthopaedics and traumatology clinic of a tertiary-level university hospital participated in an examination consisting of questions only related to the spine. The same questions were asked to ChatGPT-3.5 and ChatGPT-4.0 on two different days. The examination consisted of true/false questions, theoretical/classical and diagram/visual sections, with each section scored from 100 points. The average score was calculated and the results were evaluated by two instructors. Results: The mean score obtained was 72.88 for ChatGPT-3.5 (p=0.005) and 69.38 for Chat GPT-4.0 (p=0.001), showing a 5.87% difference in success. The mean score obtained by the orthopaedic residents was 69.90 (p=0.779). Both the 3.5 and 4.0 versions of ChatGPT AI were observed to have a knowledge level equivalent to that of a 3rd year resident. Conclusion: The 4th and 5th year orthopaedic residents were able to answer more questions correctly than ChatGPT-3.5 and GPT-4 on the spine assessment questions. Both ChatGPT-3.5 and GPT-4 performed better on text-only questions than on visual questions. It is unlikely that GPT-4 or ChatGPT-3.5 would pass the TOTEK written examination.	en_US
dc.identifier.doi	10.4274/jtss.galenos.2025.15870
dc.identifier.endpage	91	en_US
dc.identifier.issn	2147-5903
dc.identifier.issue	2	en_US
dc.identifier.scopusquality	Q4
dc.identifier.startpage	88	en_US
dc.identifier.trdizinid	1316544
dc.identifier.uri	https://doi.org/10.4274/jtss.galenos.2025.15870
dc.identifier.uri	https://search.trdizin.gov.tr/en/yayin/detay/1316544/who-is-more-successful-in-a-spinal-surgery-examination-chatgpt-3540-or-a-resident-doctor
dc.identifier.uri	https://hdl.handle.net/20.500.12712/45687
dc.identifier.volume	36	en_US
dc.language.iso	en	en_US
dc.relation.ispartof	Journal of Turkish Spinal Surgery	en_US
dc.relation.publicationcategory	Makale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.title	Who Is More Successful in a Spinal Surgery Examination? ChatGPT-3.5/4.0 or a Resident Doctor	en_US
dc.type	Article	en_US
dspace.entity.type	Publication

Collections

TR-Dizin İndeksli Yayınlar Koleksiyonu

Publication: Who Is More Successful in a Spinal Surgery Examination? ChatGPT-3.5/4.0 or a Resident Doctor

Files

Collections

Publication:
Who Is More Successful in a Spinal Surgery Examination? ChatGPT-3.5/4.0 or a Resident Doctor