Publication: System-Based Comparison of the Knowledge Level of Popular AI Chatbots on Human Anatomy: A Multiple-Choice Exam Analysis of GPT-4.1, Deepseek, Co-Pilot, and Gemini Models
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Purpose This study aimed to comparatively assess the knowledge level of AI-based chatbots on human anatomy systems using multiple-choice questions, and to analyze their potential contribution to medical education. Methods Seventy multiple-choice questions covering seven major anatomical systems (musculoskeletal, respiratory, circulatory, digestive, urinary, genital, and nervous systems) were translated in accordance with Terminologia Anatomica and presented to GPT-4.1, DeepSeek, Co-Pilot, and Gemini. Questions were selected from first-and second-year medical student exams and distributed based on item difficulty index (Pj). All bots were tested under the same conditions to minimize bias. Success rates and statistical differences were evaluated using Kruskal-Wallis and Cochran's Q tests. The relationship with item difficulty was assessed using point biserial correlation. Results GPT-4.1 showed the highest accuracy (95.7%), followed by Co-Pilot (94.3%), DeepSeek (92.9%), and Gemini (91.4%). System-based results showed Co-Pilot reached 100% in musculoskeletal, and GPT-4.1 reached 90% in nervous system questions. All bots scored 100% in respiratory and circulatory systems. In other systems, success rates ranged from 80 to 100%. No significant correlation was found between item difficulty and chatbot accuracy. Conclusion Chatbots achieved high accuracy in anatomy questions, but there were notable differences across systems and areas. While their supportive role in medical education is increasing, expert supervision is still recommended. These results show that AI-based systems can serve as complementary educational tools, but further improvement is needed for complete reliability.
Description
Citation
WoS Q
Q3
Scopus Q
Q3
Source
Surgical and Radiologic Anatomy
Volume
48
Issue
1
