Objectives
Occupational noise-induced hearing loss (ONIHL) represents a prevalent occupational health condition, traditionally necessitating multiple pure-tone audiometry assessments. We have developed and validated a machine learning model leveraging routine haematological and biochemical parameters, thereby offering novel insights into the risk prediction of ONIHL.
Design, setting and participants
This study analysed data from 3297 noise-exposed workers in Shenzhen, including 160 ONIHL cases, with the data set divided into D1 (2868 samples, 107 ONIHL cases) and D2 (429 samples, 53 ONIHL cases). The inclusion criteria were formulated based on the GBZ49-2014 Diagnosis of Occupational Noise-Induced Hearing Loss. Model training was performed using D1, and model validation was conducted using D2. Routine blood and biochemical indicators were extracted from the case data, and a range of machine learning algorithms including extreme gradient boosting (XGBoost) were employed to construct predictive models. The model underwent refinement to identify the most representative variables, and decision curve analysis was conducted to evaluate the net benefit of the model across various threshold levels.
Primary outcome measures
Model creation data set and validation data sets: ONIHL.
Results
The prediction model, developed using XGBoost, demonstrated exceptional performance, achieving an area under the receiver operating characteristic curve (AUC) of 0.942, a sensitivity of 0.875 and a specificity of 0.936 on the validation data set. On the test data set, the model achieved an AUC of 0.990. After implementing feature selection, the model was refined to include only 16 features, while maintaining strong performance on a newly acquired independent data set, with an AUC of 0.872, a balanced accuracy of 0.798, a sensitivity of 0.755 and a specificity of 0.840. The analysis of feature importance revealed that serum albumin (ALB), platelet distribution width (PDW), coefficient of variation in red cell distribution width (RDW-CV), serum creatinine (Scr) and lymphocyte percentage (LYMPHP) are critical factors for risk stratification in patients with ONIHL.
Conclusion
The analysis of feature importance identified ALB, PDW, RDW-CV, Scr and LYMPHP as pivotal factors for risk stratification in patients with ONIHL. The machine learning model, using XGBoost, effectively distinguishes patients with ONIHLamong individuals exposed to noise, thereby facilitating early diagnosis and intervention.