Evaluation of Research Methodology Generation by Large Language Models in Laryngology: A Comparative Analysis of ChatGPT-4.0 and Gemini 1.5 Flash

Ture, Nurullah; Umurhan, Elif; Tahir, Emel

doi:10.1007/s00405-025-09656-7

Publication:
Evaluation of Research Methodology Generation by Large Language Models in Laryngology: A Comparative Analysis of ChatGPT-4.0 and Gemini 1.5 Flash

dc.authorscopusid	56866396600
dc.authorscopusid	60109528500
dc.authorscopusid	57190227169
dc.authorwosid	Türe, Nurullah/Gyq-5355-2022
dc.authorwosid	Tahir, Emel/Aad-1634-2019
dc.contributor.author	Ture, Nurullah
dc.contributor.author	Umurhan, Elif
dc.contributor.author	Tahir, Emel
dc.date.accessioned	2025-12-11T00:45:26Z
dc.date.issued	2025
dc.department	Ondokuz Mayıs Üniversitesi	en_US
dc.department-temp	[Ture, Nurullah; Umurhan, Elif] Kutahya Hlth Sci Univ, Dept Otorhinolaryngol, Kutahya, Turkiye; [Tahir, Emel] Ondokuz Mayis Univ, Dept Otorhinolaryngol, Samsun, Turkiye	en_US
dc.description.abstract	Objectives This study aimed to compare the ability of two major language models, ChatGPT-4.0 and Gemini 1.5 Flash, to establish a research methodology based on scientific publications in laryngology. Methods We screened 80 articles selected from five prestigious otolaryngology journals and included 60 articles with a methods section and statistical analysis. These were classified according to six research types: cell culture, animal experiments, prospective, retrospective, systematic review, and artificial intelligence. A total of 30 studies were analyzed, with five articles randomly selected from each group. For each article, both language models were asked to produce research methodologies, and the responses were evaluated by two independent raters. Results There was no statistically significant difference between the mean scores of the models (p > 0.05). ChatGPT 4.0 had a higher mean score (5.17 +/- 1.12), especially in the data collection and measurement-assessment category. The Gemini model showed relatively more balanced performance in the statistical analysis category. The weighted kappa values were between 0.54 and 0.71, indicating a moderate to high agreement between the raters. In the analysis by article type, Gemini's performance in Q1 showed significant variation (p = 0.038). Conclusion Large language models such as ChatGPT and Gemini provide similarly consistent results in establishing the methodology of scientific studies in laryngology. Both models can be considered supportive tools; however, expert supervision is needed, especially for complex constructs such as statistical analysis. This study makes original contributions to the usability of LLMs for study design in laryngology.	en_US
dc.description.sponsorship	The authors express their gratitude to Assoc. Prof. Engin Ba & scedil;er for his valuable input during the review and evaluation of the manuscript.	en_US
dc.description.woscitationindex	Science Citation Index Expanded
dc.identifier.doi	10.1007/s00405-025-09656-7
dc.identifier.endpage	5749	en_US
dc.identifier.issn	0937-4477
dc.identifier.issn	1434-4726
dc.identifier.issue	11	en_US
dc.identifier.pmid	40968205
dc.identifier.scopus	2-s2.0-105016724263
dc.identifier.scopusquality	Q1
dc.identifier.startpage	5739	en_US
dc.identifier.uri	https://doi.org/10.1007/s00405-025-09656-7
dc.identifier.uri	https://hdl.handle.net/20.500.12712/38966
dc.identifier.volume	282	en_US
dc.identifier.wos	WOS:001575228900001
dc.identifier.wosquality	Q1
dc.language.iso	en	en_US
dc.publisher	Springer	en_US
dc.relation.ispartof	European Archives of Oto-Rhino	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Laryngology	en_US
dc.subject	Large Language Models	en_US
dc.subject	ChatGPT	en_US
dc.subject	Gemini	en_US
dc.subject	Methodology	en_US
dc.subject	Artificial Intelligence	en_US
dc.subject	Otolaryngology	en_US
dc.title	Evaluation of Research Methodology Generation by Large Language Models in Laryngology: A Comparative Analysis of ChatGPT-4.0 and Gemini 1.5 Flash	en_US
dc.type	Article	en_US
dspace.entity.type	Publication

Collections

WoS İndeksli Yayınlar Koleksiyonu
PubMed İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Publication: Evaluation of Research Methodology Generation by Large Language Models in Laryngology: A Comparative Analysis of ChatGPT-4.0 and Gemini 1.5 Flash

Files

Collections

Publication:
Evaluation of Research Methodology Generation by Large Language Models in Laryngology: A Comparative Analysis of ChatGPT-4.0 and Gemini 1.5 Flash