Large language models in neuro-ophthalmology diseases: ChatGPT vs Bard vs Bing

doi:10.18240/ijo.2025.07.05

Home > Archive>Volume 18, Issue 7, 2025 >1231-1236. DOI:10.18240/ijo.2025.07.05

Large language models in neuro-ophthalmology diseases: ChatGPT vs Bard vs Bing
DOI:
                        10.18240/ijo.2025.07.05
                    
Author:
                        
                        
                    
Corresponding Author:Ungsoo Samuel Kim. Gwangmyeong Hospital, Deokan-ro 110, Gwangmyeong-si, Gyeonggi-do 14353, Republic of Korea. ungsookim@cau.ac.kr
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

AIM: To investigate the capabilities of large language models (LLM) for providing information and diagnoses in the field of neuro-ophthalmology by comparing the performances of ChatGPT-3.5 and -4.0, Bard, and Bing. METHODS: Each chatbot was evaluated for four criteria, namely diagnostic success rate for the described case, answer quality, response speed, and critical keywords for diagnosis. The selected topics included optic neuritis, non-arteritic anterior ischemic optic neuropathy, and Leber hereditary optic neuropathy. RESULTS: In terms of diagnostic success rate for the described cases, Bard was unable to provide a diagnosis. The success rates for the described cases increased in the order of Bing, ChatGPT-3.5, and ChatGPT-4.0. Further, ChatGPT-4.0 and -3.5 provided the most satisfactory answer quality for judgment by neuro-ophthalmologists, with their sets of answers resembling the sample set most. Bard was only able to provide ten differential diagnoses in three trials. Bing scored the lowest for the satisfactory standard. A Mann-Whitney test indicated that Bard was significantly faster than ChatGPT-4.0 (Z=-3.576, P=0.000), ChatGPT-3.5 (Z=-3.576, P=0.000) and Bing (Z=-2.517, P=0.011). ChatGPT-3.5 and -4.0 far exceeded the other two interfaces at providing diagnoses and were thus used to find the critical keywords for diagnosis. CONCLUSION: ChatGPT-3.5 and -4.0 are better than Bard and Bing in terms of answer success rate, answer quality, and critical keywords for diagnosis in ophthalmology. This study has broad implications for the field of ophthalmology, providing further evidence that artificial intelligence LLM can aid clinical decision-making through free-text explanations.

Reference

Cited by

Get Citation

Dong Hee Ha, Ungsoo Samuel Kim. Large language models in neuro-ophthalmology diseases: ChatGPT vs Bard vs Bing. Int J Ophthalmol, 2025,18(7):1231-1236

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

Publication History

Received:March 26,2024
Revised:April 09,2025
Adopted:
Online: June 20,2025
Published:

Home

Articles

Journal Info

For Authors

For Reviewers

Publication Policies

News and Events

RSS

Get Citation

Share

Article Metrics

Publication History