Publication:
Evaluation of Research Methodology Generation by Large Language Models in Laryngology: A Comparative Analysis of ChatGPT-4.0 and Gemini 1.5 Flash

dc.authorscopusid56866396600
dc.authorscopusid60109528500
dc.authorscopusid57190227169
dc.authorwosidTüre, Nurullah/Gyq-5355-2022
dc.authorwosidTahir, Emel/Aad-1634-2019
dc.contributor.authorTure, Nurullah
dc.contributor.authorUmurhan, Elif
dc.contributor.authorTahir, Emel
dc.date.accessioned2025-12-11T00:45:26Z
dc.date.issued2025
dc.departmentOndokuz Mayıs Üniversitesien_US
dc.department-temp[Ture, Nurullah; Umurhan, Elif] Kutahya Hlth Sci Univ, Dept Otorhinolaryngol, Kutahya, Turkiye; [Tahir, Emel] Ondokuz Mayis Univ, Dept Otorhinolaryngol, Samsun, Turkiyeen_US
dc.description.abstractObjectives This study aimed to compare the ability of two major language models, ChatGPT-4.0 and Gemini 1.5 Flash, to establish a research methodology based on scientific publications in laryngology. Methods We screened 80 articles selected from five prestigious otolaryngology journals and included 60 articles with a methods section and statistical analysis. These were classified according to six research types: cell culture, animal experiments, prospective, retrospective, systematic review, and artificial intelligence. A total of 30 studies were analyzed, with five articles randomly selected from each group. For each article, both language models were asked to produce research methodologies, and the responses were evaluated by two independent raters. Results There was no statistically significant difference between the mean scores of the models (p > 0.05). ChatGPT 4.0 had a higher mean score (5.17 +/- 1.12), especially in the data collection and measurement-assessment category. The Gemini model showed relatively more balanced performance in the statistical analysis category. The weighted kappa values were between 0.54 and 0.71, indicating a moderate to high agreement between the raters. In the analysis by article type, Gemini's performance in Q1 showed significant variation (p = 0.038). Conclusion Large language models such as ChatGPT and Gemini provide similarly consistent results in establishing the methodology of scientific studies in laryngology. Both models can be considered supportive tools; however, expert supervision is needed, especially for complex constructs such as statistical analysis. This study makes original contributions to the usability of LLMs for study design in laryngology.en_US
dc.description.sponsorshipThe authors express their gratitude to Assoc. Prof. Engin Ba & scedil;er for his valuable input during the review and evaluation of the manuscript.en_US
dc.description.woscitationindexScience Citation Index Expanded
dc.identifier.doi10.1007/s00405-025-09656-7
dc.identifier.endpage5749en_US
dc.identifier.issn0937-4477
dc.identifier.issn1434-4726
dc.identifier.issue11en_US
dc.identifier.pmid40968205
dc.identifier.scopus2-s2.0-105016724263
dc.identifier.scopusqualityQ1
dc.identifier.startpage5739en_US
dc.identifier.urihttps://doi.org/10.1007/s00405-025-09656-7
dc.identifier.urihttps://hdl.handle.net/20.500.12712/38966
dc.identifier.volume282en_US
dc.identifier.wosWOS:001575228900001
dc.identifier.wosqualityQ1
dc.language.isoenen_US
dc.publisherSpringeren_US
dc.relation.ispartofEuropean Archives of Oto-Rhinoen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectLaryngologyen_US
dc.subjectLarge Language Modelsen_US
dc.subjectChatGPTen_US
dc.subjectGeminien_US
dc.subjectMethodologyen_US
dc.subjectArtificial Intelligenceen_US
dc.subjectOtolaryngologyen_US
dc.titleEvaluation of Research Methodology Generation by Large Language Models in Laryngology: A Comparative Analysis of ChatGPT-4.0 and Gemini 1.5 Flashen_US
dc.typeArticleen_US
dspace.entity.typePublication

Files