Using Machine Learning to develop and validate a prognostic model for predicting cervical cancer treatment outcomes among Ugandan women
Abstract
Background; In management of cervical cancer, information about patient prognosis is very important as it informs not only on treatment choices but also on survival outcomes. The use of machine learning in development of prognostic models can play a significant role in cancer prognosis and prediction while taking into consideration the heterogeneity of tumor staging and patient characteristics. The lack of accurate predictions about survivability of Ugandan cervical cancer patients has limited both clinicians and patients or family to have clear information about patient prognosis when presented with common risk factors and as a result, timely prevention and treatment can not be done hence leading to poor prognosis. Therefore, this current study aimed at using machine learning methods to develop and validate a CCPM for predicting survival outcomes among Ugandan patients. Material and Methods; A retrospective cohort of 413 patients following CC treatment at UCI between 2015 to 2020 were analyzed with training set (n =330) and test set (n = 83). Demographic and pathological variables (25 variables) that are deemed to be important in determining patient prognosis were collected. MICE and KNN were used for imputation of missing data. Models predicting survival outcomes were developed using Cox Hazard, Logistic and PNN and prognostic performance was assessed using RMSE, AIC and BIC. Validation of models was carried out using 5-fold cross validation and evaluated the prediction ability using accuracy, sensitivity, specificity and AUCROC. All analyses were done using STATA version 16.1 and Python version 3.10.7 for windows. Results; Eleven independent prognostic factors were identified for CCPM development; region, age, BMI, histological grade, lymph node status, LVSI, lymph node metastasis, distant metastasis, FIGO stage, adjuvant therapy (immunotherapy & chemotherapy). Using the training set, logistic models merged out with lower AIC and BIC value compared to cox hazard model and it was used as a reference model for training the PNN model. When applied to the test set, the best predictive values were produced by the PNN model with accuracy (0.76), sensitivity (0.87), specificity (0.72) and AUROC (0.80) compared to that of logistic model. Conclusion; The PNN model accurately identified 11 independent prognostic factors determining survival among Ugandan CC patients with fairly good prediction metrics which gives future hope for using machine learning models for CC prediction and prognosis hence improving patient prognosis.