Assessing the proficiency of large language models on funduscopic disease knowledge
Author:
Corresponding Author:

Yi Shao. Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Eye Diseases, Shanghai 200080, China. freebee99@163.com

Affiliation:

Clc Number:

Fund Project:

Supported by National Natural Science Foundation of China (No.82160195); Science and Technology Project of Jiangxi Provincial Department of Education (No.GJJ200169); Science and Technology Project of Jiangxi Province Health Commission of Traditional Chinese Medicine (No.2020A0087); Science and Technology Project of Jiangxi Health Commission (No.202130210).

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    AIM: To assess the performance of five distinct large language models (LLMs; ChatGPT-3.5, ChatGPT-4, PaLM2, Claude 2, and SenseNova) in comparison to two human cohorts (a group of funduscopic disease experts and a group of ophthalmologists) on the specialized subject of funduscopic disease. METHODS: Five distinct LLMs and two distinct human groups independently completed a 100-item funduscopic disease test. The performance of these entities was assessed by comparing their average scores, response stability, and answer confidence, thereby establishing a basis for evaluation. RESULTS: Among all the LLMs, ChatGPT-4 and PaLM2 exhibited the most substantial average correlation. Additionally, ChatGPT-4 achieved the highest average score and demonstrated the utmost confidence during the exam. In comparison to human cohorts, ChatGPT-4 exhibited comparable performance to ophthalmologists, albeit falling short of the expertise demonstrated by funduscopic disease specialists. CONCLUSION: The study provides evidence of the exceptional performance of ChatGPT-4 in the domain of funduscopic disease. With continued enhancements, validated LLMs have the potential to yield unforeseen advantages in enhancing healthcare for both patients and physicians.

    Reference
    Related
    Cited by
Get Citation

Jun-Yi Wu, Yan-Mei Zeng, Xian-Zhe Qian, et al. Assessing the proficiency of large language models on funduscopic disease knowledge. Int J Ophthalmol, 2025,18(7):1205-1213

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
Publication History
  • Received:November 12,2024
  • Revised:March 03,2025
  • Adopted:
  • Online: June 20,2025
  • Published: