Evaluation of Research Methodology Generation by Large Language Models in Laryngology: A Comparative Analysis of ChatGPT-4.0 and Gemini 1.5 Flash

Ture, Nurullah; Umurhan, Elif; Tahir, Emel

doi:10.1007/s00405-025-09656-7

Publication:
Evaluation of Research Methodology Generation by Large Language Models in Laryngology: A Comparative Analysis of ChatGPT-4.0 and Gemini 1.5 Flash

Date

2025

Authors

Ture, Nurullah

Umurhan, Elif

Tahir, Emel

Publisher

Springer

Abstract

Objectives This study aimed to compare the ability of two major language models, ChatGPT-4.0 and Gemini 1.5 Flash, to establish a research methodology based on scientific publications in laryngology. Methods We screened 80 articles selected from five prestigious otolaryngology journals and included 60 articles with a methods section and statistical analysis. These were classified according to six research types: cell culture, animal experiments, prospective, retrospective, systematic review, and artificial intelligence. A total of 30 studies were analyzed, with five articles randomly selected from each group. For each article, both language models were asked to produce research methodologies, and the responses were evaluated by two independent raters. Results There was no statistically significant difference between the mean scores of the models (p > 0.05). ChatGPT 4.0 had a higher mean score (5.17 +/- 1.12), especially in the data collection and measurement-assessment category. The Gemini model showed relatively more balanced performance in the statistical analysis category. The weighted kappa values were between 0.54 and 0.71, indicating a moderate to high agreement between the raters. In the analysis by article type, Gemini's performance in Q1 showed significant variation (p = 0.038). Conclusion Large language models such as ChatGPT and Gemini provide similarly consistent results in establishing the methodology of scientific studies in laryngology. Both models can be considered supportive tools; however, expert supervision is needed, especially for complex constructs such as statistical analysis. This study makes original contributions to the usability of LLMs for study design in laryngology.

Keywords

Laryngology, Large Language Models, ChatGPT, Gemini, Methodology, Artificial Intelligence, Otolaryngology

WoS Q

Q1

Scopus Q

Q1

Source

European Archives of Oto-Rhino

Volume

282

Issue

11

Start Page

5739

End Page

5749

URI

https://doi.org/10.1007/s00405-025-09656-7
https://hdl.handle.net/20.500.12712/38966

Collections

WoS İndeksli Yayınlar Koleksiyonu
PubMed İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Full item page

Publication:
Evaluation of Research Methodology Generation by Large Language Models in Laryngology: A Comparative Analysis of ChatGPT-4.0 and Gemini 1.5 Flash

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

WoS Q

Scopus Q

Source

Volume

Issue

Start Page

End Page

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Publication: Evaluation of Research Methodology Generation by Large Language Models in Laryngology: A Comparative Analysis of ChatGPT-4.0 and Gemini 1.5 Flash

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

WoS Q

Scopus Q

Source

Volume

Issue

Start Page

End Page

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Publication:
Evaluation of Research Methodology Generation by Large Language Models in Laryngology: A Comparative Analysis of ChatGPT-4.0 and Gemini 1.5 Flash