Abstract:AIM:To build prediction model of dry eye with data mining techniques.
METHODS: From March 2020 to January 2021, 218 patients(436 eyes)with dry eye were selected as dry eye group, and 212 patients(424 eyes)without dry eye were selected as control group. Schirmer Ⅰ test(SⅠt), fluorescein staining tear film break-up time(FBUT), non-contact tear film break-up time(NI-BUT), tear meniscus height(TMH), corneal fluorescein staining(FL)and meibomian gland function score(MG-SCORE)were performed in both groups. Totally 200 eyes of 100 samples were randomly selected from the dry eye group and the control group to form a test set of 400 eyes of 200 samples. The remaining 118 samples(236 eyes)in the dry eye group and 112 samples(224 eyes)in the control group were used as the training set. Correlation feature searching(CFS)feature selection algorithm was used to search the factors related to the detection of dry eye. C4.5, Random Forest, Rondom Tree, Naïve Bayes, KNN, SVM, Decision Stump and Bagging methods were used to construct the prediction model, respectively.
RESULTS:By using CFS feature selection algorithm, an optimal sub-feature set including SⅠt, NI-BUT, TMH and FL were obtained. Based on the four features, eight machine learning algorithms were employed to build the prediction model, respectively. The results show that the prediction accuracies were all higher than 75%. Among the eight prediction models, the prediction accuracy model by using Random Forest is the highest, which achieved 91.8% and 88.3%, respectively. And the total prediction accuracy reached 90.1%. In addition, through the analysis of single factor modeling, we found that FL and NI-BUT had the highest prediction accuracy, which exceeded 74%.
CONCLUSION: Random Forest could be considered as a stable and well generalization algorithm to build prediction model for dry eye with well generalization. NI-BUT and FL have a strong correlation with dry eye, which can be considered as the standard for clinical examination of dry eye.