Abstract
Abstract
Background: Colorectal adenoma resection is an effective way to reduce the incidence of colorectal cancer. Currently, the recurrence rate of advanced colorectal neoplasia (ACRN) within 1 year after colorectal adenoma resection is high, and there is a lack of research on the construction of predictive models for early recurrence of ACRN after colorectal adenoma resection.
Objective: To explore the influencing factors of early recurrence of ACRN in patients after colorectal adenoma resection using machine learning methods, and to construct a predictive model for early recurrence of ACRN in patients after colorectal adenoma resection.
Methods: A total of 222 patients who underwent colorectal adenoma resection and had more than 3 colonoscopies at the First Affiliated Hospital of Zhengzhou University from January 2017 to August 2023 were retrospectively included as research subjects. According to whether ACRN occurred within 1 year after surgery, they were divided into an early recurrence group ($n=68$) and a non-early recurrence group ($n=154$). General data and laboratory test indicators of the patients were collected. The subjects were divided into a training set and a test set according to a ratio of 8:2. Predictive factors were jointly screened through Boruta and Lasso regression methods. Four machine learning methods, namely Category Boosting (Catboost), Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM), were used to construct predictive models. Receiver operating characteristic (ROC) curves, calibration curves, and clinical decision analysis (DCA) curves were plotted to evaluate the performance of the predictive models. Feature importance and SHAP interpretability analysis were used to discuss the risk factors associated with early recurrence of ACRN in patients after colorectal adenoma resection.
Results: There were statistically significant differences between the early recurrence group and the non-early recurrence group in terms of the number of adenomas, adenoma size, adenoma location, degree of adenoma dysplasia, abdominal distension, number of clinical symptoms, history of alcohol consumption, platelet count, and neutrophil-to-lymphocyte ratio (NLR) ($P<0.05$). Based on Boruta and Lasso methods, seven predictive factors were jointly screened: adenoma size, platelet count, degree of adenoma dysplasia, number of clinical symptoms, triglyceride-glucose (TyG) index, history of alcohol consumption, and number of adenomas. Based on these seven predictive factors, four predictive models (Catboost, RF, LR, and SVM) for early recurrence of ACRN after colorectal adenoma resection were constructed. ROC curve analysis showed that in the training set, the AUCs of the Catboost, RF, LR, and SVM models were 0.802, 0.836, 0.788, and 0.860, respectively; in the test set, the AUCs of the four models were 0.772, 0.749, 0.705, and 0.685, respectively. Delong test results showed that there were no statistically significant differences in the pairwise comparisons of the AUCs of the four models (all $P>0.05$). Calibration curve analysis showed that the Brier scores of the Catboost, RF, LR, and SVM models in the training set were 0.178, 0.197, 0.169, and 0.153, respectively, and the Brier scores of the four models in the test set were 0.188, 0.201, 0.191, and 0.198, respectively. DCA curve analysis showed that higher clinical net benefits were obtained based on the Catboost, LR, and SVM models in the training set, and the Catboost and SVM models achieved better clinical net benefits in the test set. SHAP interpretability analysis based on the Catboost model showed that the number of clinical symptoms, adenoma size, and number of adenomas were the top three important features for predicting early recurrence of ACRN after surgery. Among them, the number of clinical symptoms, adenoma size, number of adenomas, degree of adenoma dysplasia, TyG, and platelet count (SHAP values: 0.043, 0.042, 0.025, 0.020, 0.012, 0.005, respectively) were positively correlated with early postoperative ACRN recurrence, while history of alcohol consumption (SHAP value: 0.015) was negatively correlated with early postoperative ACRN recurrence.
Conclusion: The risk prediction model constructed based on the Catboost method has good predictive performance and clinical utility, and can be used to predict the early recurrence of ACRN after colorectal adenoma resection.
Full Text
Preamble
Research on Risk Prediction for Early Recurrence of Advanced Neoplasia after Colorectal Adenoma Resection
Abstract
Objective: To investigate the risk factors for the early recurrence of advanced neoplasia (AN) after colorectal adenoma (CRA) resection and to develop a clinical prediction model.
Methods: A retrospective analysis was conducted on clinical data from patients who underwent colorectal adenoma resection at our hospital. Patients were divided into a recurrence group and a non-recurrence group based on follow-up colonoscopy results within three years. Multivariate logistic regression analysis was used to identify independent risk factors for early AN recurrence. Based on these factors, a nomogram prediction model was constructed. The performance of the model was evaluated using the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) and the Hosmer-Lemeshow test.
Results: A total of [N] patients were included in the study. Multivariate analysis showed that age $\ge 60$ years, male gender, multiple adenomas ($\ge 3$), adenoma size $\ge 10$ mm, and high-grade intraepithelial neoplasia were independent risk factors for early AN recurrence. The developed nomogram demonstrated good predictive accuracy with an AUC of [Value]. The Hosmer-Lemeshow test indicated good calibration of the model ($P > 0.05$).
Conclusion: The risk of early AN recurrence after colorectal adenoma resection is associated with various clinical and pathological factors. The established nomogram model can effectively predict the risk of early recurrence, providing a scientific basis for personalized follow-up strategies.
Introduction
Colorectal cancer (CRC) is one of the most common malignant tumors of the digestive tract worldwide. Most CRCs develop through the "adenoma-carcinoma sequence," making the detection and endoscopic resection of colorectal adenomas (CRA) a critical measure for preventing CRC. However, even after complete resection, patients remain at risk for recurrent adenomas or even advanced neoplasia (AN).
Advanced neoplasia, defined as adenomas $\ge 10$ mm in diameter, those with villous components, or those exhibiting high-grade intraepithelial neoplasia, carries a significantly higher risk of progressing to malignancy. Early recurrence (typically defined as recurrence within 3 years) of AN poses a substantial challenge to clinical management and patient prognosis. Current guidelines provide general recommendations for follow-up intervals, but these often fail to account for individual
1.450001 河南省郑州市,郑州大学护理与健康学院
Health Management Center, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan Province; Medical Administration Department, Gaoping City People's Hospital, Jincheng, Shanxi Province.
背景
Colorectal adenoma resection is an effective strategy for reducing the incidence of colorectal cancer. Currently, however, there is a high recurrence rate of advanced colorectal neoplasia (ACRN) within one year following the procedure. Despite this clinical challenge, there is a significant lack of research focused on developing predictive models for the early recurrence of ACRN after colorectal adenoma resection.
This study utilizes machine learning methods to investigate the influential factors associated with the early recurrence of ACRN in patients following colorectal adenoma resection. Furthermore, we aim to construct a robust predictive model to identify patients at high risk for early ACRN recurrence after their initial procedure.
方法
This retrospective study included 222 patients who underwent surgical resection for colorectal adenomas and had received three or more colonoscopies at the First Affiliated Hospital of Zhengzhou University between January 2017 and August 2023. Based on whether advanced colorectal neoplasia (ACRN) recurred within one year post-surgery, patients were divided into an early recurrence group ($n=68$) and a non-early recurrence group ($n=154$). General clinical data and laboratory parameters were collected for all subjects.
The study population was partitioned into a training set and a testing set using an 8:2 ratio. Predictive factors were identified through the combined application of Boruta and Least Absolute Shrinkage and Selection Operator (LASSO) regression methods. Subsequently, four machine learning algorithms—Categorical Boosting (CatBoost), Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM)—were employed to construct predictive models. The performance of these models was evaluated using Receiver Operating Characteristic (ROC) curves, calibration curves, and Decision Curve Analysis (DCA). Furthermore, feature importance and SHAP (SHapley Additive exPlanations) interpretability analyses were utilized to discuss the risk factors associated with early ACRN recurrence in patients following colorectal adenoma resection.
结果
Statistically significant differences were observed between the early recurrence group and the non-early recurrence group regarding the number of adenomas, adenoma size, adenoma location, degree of adenoma dysplasia, abdominal distension, number of clinical symptoms, history of alcohol consumption, platelet count, and the neutrophil-to-lymphocyte ratio (NLR) ($P < 0.05$).
Based on the combined feature selection using Boruta and Lasso methods, seven predictive factors were identified: adenoma size, platelet count, degree of adenoma dysplasia, number of clinical symptoms, triglyceride-glucose (TyG) index, alcohol consumption history, and number of adenomas. Utilizing these seven predictors, four machine learning models—Catboost, Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM)—were constructed to predict the early recurrence of advanced colorectal neoplasia (ACRN) following colorectal adenoma resection.
Receiver Operating Characteristic (ROC) curve analysis demonstrated that in the training set, the Area Under the Curve (AUC) for the Catboost, RF, LR, and SVM models were 0.802, 0.836, 0.788, and 0.860, respectively. In the test set, the AUC values for the four models were 0.772, 0.749, 0.705, and 0.685, respectively. Delong test results indicated no statistically significant differences in pairwise comparisons of the AUCs among the four models (all $P > 0.05$).
Calibration curve analysis showed that in the training set, the Brier scores for the Catboost, RF, LR, and SVM models were 0.178, 0.197, 0.169, and 0.153, respectively. In the test set, the Brier scores were 0.188, 0.201, 0.191, and 0.198, respectively. Decision Curve Analysis (DCA) revealed that the Catboost, LR, and SVM models achieved higher clinical net benefits in the training set, while the Catboost and SVM models provided superior clinical net benefits in the test set.
SHAP (SHapley Additive exPlanations) interpretability analysis based on the Catboost model identified the number of clinical symptoms, adenoma size, and number of adenomas as the top three most important features for predicting early ACRN recurrence. Specifically, the number of clinical symptoms, adenoma size, number of adenomas, degree of adenoma dysplasia, TyG index, and platelet count (with SHAP values of 0.043, 0.042, 0.025, 0.020, 0.012, and 0.005, respectively) were all positively correlated with early postoperative ACRN recurrence. Conversely, a history of alcohol consumption (SHAP value of 0.015) was negatively correlated with early ACRN recurrence.
结论
The risk prediction model constructed based on the CatBoost method demonstrates excellent predictive performance and clinical utility. It can be effectively utilized to predict the early recurrence of advanced colorectal neoplasm (ACRN) following colorectal adenoma resection.
Keywords: Advanced colorectal neoplasm; Early recurrence; Influencing factors; Prediction model; Interpretability analysis
CLC Number: R 735.34
Document Code: A
Risk Prediction of Early Recurrence of Advanced Colorectal Neoplasm After Colorectal Adenoma Resection
Authors: Yujie, Jiaoyan, Jingfeng, Suying
Chinese General Practice
Zhengzhou University Zhengzhou 450001 China
The First Affiliated Hospital of Zhengzhou University Zhengzhou 450052 China Gaoping People s Hospital Jincheng 048400 China DING Suying Chief superintendent nurse
Background
Colorectal adenoma resection is an effective method to reduce colorectal cancer incidence.
However,the recurrence rate of Advanced Colorectal Neoplasm(ACRN) within one year after resection is high,and research on predictive models for early ACRN recurrence is lacking.
Objective To use machine learning to identify risk factors and develop a prediction model for early ACRN recurrence after colorectal adenoma resection.
Methods
A total of 222 patients who underwent three or more colonoscopies and had colorectal adenomas with surgical resection at the First Affiliated Hospital of Zhengzhou University from January 2017 to August 2023 were retrospectively included as the research subjects. Patients were divided into an early recurrence group( 68)and a non-early recurrence group( 154)based on ACRN occurrence within one year post-surgery. Clinical characteristics were compared. Subjects were split 8:2 into training and test sets. Boruta and Lasso regression methods jointly selected predictive features. Four machine learning models-Categorical Boosting(Catboost), Random Forest(RF),Logistic Regression(LR),and Support Vector Machine(SVM)-were built. Model performance was evaluated using sensitivity,specificity,AUC,calibration curves,and Decision Curve Analysis(DCA). Feature importance and SHAP analysis identified key risk factors.
Results
Significant differences( 0.05)were found in adenoma number, size,location,dysplasia,bloating,number of clinical symptoms,drinking history,platelet count,and Neutrophil-to- Lymphocyte Ratio(NLR)between groups. Based on the combined Boruta and Lasso methods,seven predictors were selected: adenoma size,platelet count,degree of adenoma dysplasia,number of clinical symptoms,TyG,drinking history,and adenoma number. Using the above seven predictors,four prediction models including Catboost,RF,LR and SVM for early set,the AUCs of the four models Catboost,RF,LR,and SVM were 0.802,0.836,0.788,and 0.860,respectively; In the testing set,the AUCs of the four models were 0.772,0.749,0.705,and 0.685,respectively. The results of Delong test showed that there was no statistically significant difference in the pairwise comparison of AUCs among the four models(all values were >0.05). The results of calibration curve analysis showed that in the training set,the Brier scores of the four models of Catboost,RF,LR,and SVM were 0.178,0.197,0.169,and 0.153,respectively; In the testing set,the Brier scores of the four models were 0.188,0.201,0.191,and 0.198,respectively. The results of DCA curve analysis showed that in the training set,relatively high clinical net benefits were obtained based on the Catboost,LR,and SVM models; In the testing set, the Catboost and SVM models could achieve good clinical net gains. Based on the SHAP interpretability analysis of the Catboost model,the number of clinical symptoms,adenoma size,and adenoma number were identified as the top three most important features for predicting early postoperative ACRN recurrence. Among these,the number of clinical symptoms,adenoma size, adenoma number,degree of adenoma dysplasia,TyG,and platelet count(with SHAP values of 0.043,0.042,0.025,0.020, 0.012,and 0.005,respectively) were all positively associated with early postoperative ACRN recurrence. In contrast,a history of alcohol consumption(SHAP value:0.015) was negatively associated with early postoperative ACRN recurrence.
Conclusion
The risk prediction model developed using Catboost demonstrates excellent predictive performance and clinical applicability, making it suitable for predicting early postoperative ACRN recurrence following colorectal adenoma resection.
Key words Advanced colorectal neoplasia;Early recurrence;Influence factor;Prediction model;Interpretability
analysis
Colorectal cancer (CRC) is a highly prevalent malignancy of the digestive system and has become one of the major diseases seriously threatening human health. According to the Global Cancer Statistics 2022 (GLOBOCAN 2022), CRC ranks third in global incidence and second in mortality, surpassed only by lung cancer. Without timely intervention, the incidence and mortality of CRC are expected to continue rising, with new cases and deaths projected to reach 3.2 million and 1.6 million, respectively, by 2040. In China, CRC shows a significant upward epidemiological trend, with its incidence ranking second among all malignant tumors nationwide, imposing a substantial burden on society. Most CRC cases progress from colorectal adenomas, which serve as the primary precancerous lesions. With the widespread application of colonoscopy technology, colorectal adenomatous polypectomy can effectively reduce the risk of malignant transformation, thereby lowering the incidence of CRC. However, studies have shown that during the first follow-up colonoscopy after adenoma resection, 36% to 61% of patients are found to have new or recurrent adenomas \cite{6-7}. Research also indicates that an earlier recurrence time post-surgery is associated with shorter median survival:
The median survival time gradually extends from 9.9 months for those who recur within one year to 19.1 months for those who recur after more than four years. To date, the risk of metachronous advanced colorectal neoplasia (ACRN) has been regarded as a surrogate indicator for the risk of CRC events. Previous studies have extensively explored recurrence following colorectal adenoma resection, with some research defining recurrence within one year as early recurrence.
Based on a broad range of patient characteristics—including basic demographic data, baseline adenoma features, and common laboratory test indicators—this study aims to clarify the timing of postoperative ACRN recurrence (defined as advanced adenoma and/or CRC) through patient follow-up. Furthermore, this research seeks to identify the risk factors influencing early postoperative recurrence of ACRN and to establish a risk prediction model. Such a model will facilitate the precise assessment of a patient's risk for early ACRN recurrence.
1.1 研究对象
Methods
This study employed a retrospective cohort design, selecting patients who underwent surgical resection for colorectal adenomas and had received three or more colonoscopies at the First Affiliated Hospital of Zhengzhou University between January 2016 and August 2023. Based on the inclusion and exclusion criteria, 222 patients were ultimately enrolled as study subjects. Advanced Colorectal Neoplasia (ACRN) was defined to include advanced adenomas (adenomas meeting at least one of the following criteria: diameter $>10$ mm, presence of villous components, or high-grade intraepithelial neoplasia) and colorectal cancer (CRC). Inclusion criteria were: (1) age $\ge 18$ years; (2) completion of a high-quality colonoscopy, defined by adequate bowel preparation and successful cecal intubation; (3) a follow-up duration of $\ge 6$ months; and (4) subsequent occurrence of ACRN. All subjects had a Boston Bowel Preparation Scale (BBPS) score $>6$ and a withdrawal time $\ge 6$ minutes. Exclusion criteria included: (1) a prior history of CRC or adenomatosis; (2) comorbid inflammatory bowel disease, familial adenomatous polyposis, or other major colonic diseases; and (3) missing key variables such as baseline pathology reports or laboratory data.
This study was approved by the Ethics Committee of the First Affiliated Hospital of Zhengzhou University (2022-KY-0018-002), and all study participants provided written informed consent.
Data Collection
General clinical data were collected, including gender, age, number of adenomas, adenoma size, histology, location, degree of dysplasia, and clinical symptoms (abdominal pain, abdominal distension, changes in bowel habits, and changes in stool consistency), as well as the total count of these clinical symptoms. Additional data included comorbidities (diabetes, hypertension, hyperlipidemia), history of cancer, smoking history (self-reported continuous or cumulative smoking for more than 6 months within the past year, or an average of $>1$ cigarette/day), alcohol history (self-reported daily ethanol intake $\ge 25$ g for men and $\ge 15$ g for women within the past year), family history (self-reported history of CRC or colorectal adenoma in first-degree relatives), and Body Mass Index (BMI) (normal range: $18.5\text{--}23.9$ kg/m$^2$). Laboratory indicators included red blood cell count (normal: $4.3\text{--}5.8 \times 10^{12}$/L for men, $3.8\text{--}5.1 \times 10^{12}$/L for women), white blood cell count (normal: $3.5\text{--}9.5 \times 10^9$/L), platelet count (normal: $125\text{--}350 \times 10^9$/L), total cholesterol (normal: $<5.2$ mmol/L), triglycerides (normal: $<1.7$ mmol/L), high-density lipoprotein (normal: $>0.91$ mmol/L), and low-density lipoprotein (normal: $<3.61$ mmol/L). Derived indices included the neutrophil-lymphocyte ratio (NLR), platelet-to-lymphocyte ratio (PLR), systemic immune-inflammation index (SII), and the triglyceride-glucose (TyG) index. These indices (NLR, PLR, SII, and TyG) were categorized into four groups (Q1, Q2, Q3, and Q4) based on their respective quartiles.
Follow-up and Grouping
Colorectal adenomas were diagnosed and resected during the initial colonoscopy. A second colonoscopy was performed within one year to identify any potentially missed adenomas. Subsequent follow-up continued until the detection of ACRN or the date of the final colonoscopy, with follow-up durations ranging from 6 to 77.5 months. Based on whether ACRN recurred within $\le 1$ year after the initial resection, subjects were divided into an early recurrence group ($n=68$) and a non-early recurrence group ($n=154$).
Statistical Analysis
Data processing was performed using SPSS 27.0 and R 4.3.3 software. Missing values were imputed using the Random Forest algorithm. Categorical data were expressed as relative numbers, and intergroup comparisons were conducted using the $\chi^2$ test or Fisher's exact test. The dataset was randomly partitioned into a training set ($n=179$) and a testing set ($n=43$) at an 8:2 ratio. Boruta and Lasso regression methods were jointly applied to the training set to screen for predictive factors. Collinearity tests were performed, and predictors with a Variance Inflation Factor (VIF) $>5$ were excluded. Four machine learning methods—Catboost, Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM)—were utilized to construct predictive models for early ACRN recurrence. Model performance was evaluated using Receiver Operating Characteristic (ROC) curves, calibration curves, and Decision Curve Analysis (DCA). The Delong test was used to compare the Area Under the Curve (AUC) differences among the four models, and SHAP (SHapley Additive exPlanations) values were employed to assess feature importance and model interpretability.
2.1 早期再发组和非早期再发组临床资料及实验室检
Among the 222 patients included in the clinical indicator comparison, 153 were male (68.9%) and 69 were female (31.1%), with a mean age of $57.2 \pm 11.4$ years. The cohort was divided into an early recurrence group (68 cases, 30.6%) and a non-early recurrence group (154 cases, 69.4%). Statistically significant differences ($P < 0.05$) were observed between the two groups regarding the number of adenomas, adenoma size, adenoma location, degree of adenoma dysplasia, presence of abdominal distension, number of clinical symptoms, history of alcohol consumption, and platelet count. No statistically significant differences ($P > 0.05$) were found between the groups in terms of sex, age, adenoma histology, abdominal pain, changes in bowel habits, changes in stool consistency, diabetes, hypertension, hyperlipidemia, history of cancer, smoking history, family history, BMI, white blood cell count, red blood cell count, total cholesterol, triglycerides, high-density lipoprotein (HDL), low-density lipoprotein (LDL), NLR, PLR, SII, or the TyG index, as shown in [TABLE:1].
Characteristic variables of early postoperative recurrence in patients with colorectal adenoma.
Comparison of clinical characteristics and laboratory findings between early and non-early recurrence groups: Adenoma histology, degree of adenoma dysplasia, villous tubular adenoma or villous adenoma, changes in bowel habits, changes in stool consistency, number of clinical symptoms, white blood cell count, red blood cell count, platelet count, high-density lipoprotein, low-density lipoprotein, and TyG index. Note: BMI = body mass index, NLR = neutrophil-to-lymphocyte ratio, PLR = platelet-to-lymphocyte ratio, SII = systemic immune-inflammation index, TyG = triglyceride-glucose index, Q1–Q4 = quartile groups 1–4.
The dataset was randomly partitioned into a training set (179 cases) and a testing set (43 cases) using an 8:2 ratio. Feature selection for the training set was performed using a combination of the Boruta method and Lasso regression. Based on feature importance, the Boruta method ultimately identified seven key features: adenoma size, platelet count, degree of adenoma dysplasia, number of clinical symptoms, TyG index, history of alcohol consumption, and number of adenomas, as illustrated in [FIGURE:1]. In the Lasso regression analysis, the optimal penalty coefficient was selected using ten-fold cross-validation within one standard error of the minimum error ($\lambda = 0.0356$). This process retained 13 important features associated with the outcome variable: adenoma location, number of adenomas, adenoma size, degree of dysplasia, changes in bowel habits, white blood cell count, high-density lipoprotein, adenoma histology, number of clinical symptoms, hypertension, hyperlipidemia, history of alcohol consumption, red blood cell count, platelet count, low-density lipoprotein, NLR, and the TyG index, as shown in [FIGURE:2].
By synthesizing the results of the two selection methods, seven characteristic variables were ultimately determined for model construction: adenoma size, platelet count, degree of adenoma dysplasia, number of clinical symptoms, TyG index, history of alcohol consumption, and number of adenomas. Multicollinearity testing confirmed that there was no significant collinearity among these seven variables (VIF < 5).
Construction of the predictive models: Using the seven characteristic variables jointly selected by Boruta and Lasso regression, Catboost, Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM) models were constructed. Grid search was employed to optimize the models using the Area Under the Curve (AUC) as the primary evaluation metric. The variables included low-density lipoprotein, changes in stool consistency, red blood cell count, number of clinical symptoms, degree of dysplasia, and platelet count. Note: Green indicates important features, while red indicates unimportant features; PLR = platelet-to-lymphocyte ratio, SII = systemic immune-inflammation index, NLR = neutrophil-to-lymphocyte ratio. Feature values of predictor variables were based on Boruta selection.
Chinese General Practice
The optimal parameters for each model were determined within the training set as follows: Catboost: 32; Random Forest (RF): number=5, repeats=5, mtry=1; Logistic Regression (LR) was trained using default parameters; and Support Vector Machine (SVM): number=5, pre-processing=("center", "scale"), sigma=0.059, C=1.
Comparison of Predictive Model Performance
Receiver Operating Characteristic (ROC) curves were plotted for the Catboost, RF, LR, and SVM models to predict early advanced colorectal neoplasia (ACRN) recurrence in patients after colorectal adenoma resection. The Area Under the Curve (AUC), sensitivity, and specificity were calculated to compare the performance of the four models. The results showed that in the training set, the AUCs for the Catboost, RF, LR, and SVM models were 0.802, 0.836, 0.788, and 0.860, respectively. In the testing set, the AUCs for the four models were 0.772, 0.749, 0.705, and 0.685, respectively (see [FIGURE:3] and [TABLE:2]). Delong’s test results indicated that the pairwise differences between the AUCs of the four models were not statistically significant (all $P > 0.05$), as shown in [TABLE:3].
Calibration curve analysis showed that the Brier scores for the Catboost model in the training and testing sets were 0.178 and 0.188, respectively. For the RF model, the Brier scores were 0.197 and 0.201; for the LR model, 0.169 and 0.191; and for the SVM model, 0.153 and 0.198. Note: Figure A represents the LASSO coefficient curves, and Figure B represents the cross-validation for LASSO regression analysis.
Predictor variable selection based on LASSO regression. Note: A represents the ROC curve for the training set, and B represents the ROC curve for the testing set. Catboost = Categorical Boosting, RF = Random Forest, LR = Logistic Regression, SVM = Support Vector Machine.
ROC curves of Catboost, RF, LR, and SVM models predicting early ACRN recurrence after colorectal adenoma resection.
Predictive performance of the Catboost, RF, LR, and SVM models. Note: A1, B1, C1, and D1 represent the calibration curves for the Catboost, RF, LR, and SVM models in the training set, respectively. A2, B2, C2, and D2 represent the calibration curves for the Catboost, RF, LR, and SVM models in the testing set, respectively.
Calibration curves of the Catboost, RF, LR, and SVM models.
(Continued from [FIGURE:4]) Decision Curve Analysis (DCA) results showed that when the risk probability threshold was $< 75\%$ in the training set, the Catboost, LR, and SVM models provided superior clinical net benefits. In the testing set, when the risk probability threshold was $< 50\%$, the Catboost and SVM models achieved better clinical net benefits (see [FIGURE:5]).
Interpretability Analysis of the Catboost Model
The Catboost model was analyzed using SHAP (SHapley Additive exPlanations) values. Features were ranked by importance, revealing that the number of clinical symptoms, adenoma size, number of adenomas, degree of adenoma dysplasia, history of alcohol consumption, TyG index, and platelet count were significant factors influencing early ACRN recurrence. The feature summary (beeswarm) plot showed that the number of clinical symptoms, adenoma size, number of adenomas, degree of adenoma dysplasia, TyG index, and platelet count (SHAP values: 0.043, 0.042, 0.025, 0.020, 0.012, and 0.005, respectively) were positively correlated with early ACRN recurrence. Conversely, a history of alcohol consumption (SHAP value: 0.015) was negatively correlated with early recurrence (see [FIGURE:6]).
3 讨论
Colorectal adenoma resection is a critical surgical procedure for delaying the progression of adenomas into cancer, significantly reducing the incidence of colorectal cancer. To ensure data accuracy, this study utilized a second colonoscopy follow-up (within one year) to identify potentially missed adenomas. A total of 222 subjects were included, of whom 30.63% experienced an early recurrence of advanced colorectal neoplasia (ACRN), a rate consistent with existing literature. Some studies have indicated that the early recurrence rate after colorectal adenoma resection can reach 59.46%, with the peak of recurrence concentrated almost entirely within the first year.
This study found an early postoperative recurrence rate of 30.6%, which is similar to the findings of Chen et al. Currently, there is a lack of research on the construction of predictive models for early ACRN recurrence following colorectal adenoma resection. Developing such a model would provide a valuable reference for gastroenterologists, thereby helping to reduce the incidence of colorectal cancer.
In this study, four models—Catboost, Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM)—were constructed using machine learning methods. Among these, Catboost demonstrated superior predictive performance compared to the other models. Specifically, in the test set, the Catboost model achieved an AUC of 0.802 and a Brier score of 0.178; in the training set, it yielded an AUC of 0.772 and a Brier score of 0.188. Decision Curve Analysis (DCA) indicated that Catboost provided a high clinical net benefit in both the training and test sets.
Chinese General Practice
Note: A represents the test set, B represents the training set.
DCA of the Catboost, RF, LR, and SVM models.
Note: A is the feature importance ranking plot, B is the SHAP summary (beeswarm) plot; TyG = triglyceride-glucose index, SHAP = SHapley Additive exPlanations.
Interpretability results based on the Catboost model.
Chinese General Practice
The Catboost model achieved high clinical net benefit. Interpretability analysis using the SHAP method identified several key factors among 31 indicators that significantly influence early ACRN recurrence: the number of clinical symptoms, adenoma size, number of adenomas, degree of adenoma dysplasia, TyG index, and platelet count. These factors were positively correlated with SHAP values, while a history of alcohol consumption showed a negative correlation.
This study demonstrates that the number of clinical symptoms has a positive impact on model prediction; an increase in symptoms correlates with a higher predicted risk of early adenoma recurrence. In this study, clinical symptoms primarily included abdominal pain, bloating, and changes in bowel habits or stool consistency. Changes in bowel habits were defined mainly as constipation (fewer than three bowel movements per week, hard stools, or difficulty defecating) or diarrhea (more than three movements per day, stool volume exceeding 200 g/d, loose consistency, and water content >85%). Changes in stool consistency included bloody stools, melena, or loose stools. An increase in the number of symptoms reflects a change in the severity of the patient's condition. Zhang et al. quantified clinical symptoms (e.g., abdominal pain, hematochezia, and changes in stool consistency) using a "symptom score" and found that for every 1-point increase in the score, the risk of recurrence rose by 18%, which is consistent with our findings.
Furthermore, this study found a positive correlation between adenoma size and early recurrence. Existing guidelines generally categorize adenomas >10 mm as high-risk factors while treating adenomas 1–9 mm as having the same recurrence risk. However, several studies have challenged this, arguing that patients with 6–9 mm adenomas have a significantly higher risk of ACRN recurrence than those with 1–5 mm adenomas and should not be considered equivalent; surveillance intervals for the 1–5 mm group could potentially be extended \cite{14-16}. Hartstein et al. further subdivided small adenomas and found that the absolute risk of ACRN in patients with at least one 6–9 mm adenoma was significantly higher than in those with 1–5 mm adenomas. Existing research also confirms that adenoma size is negatively correlated with the time to recurrence after resection; the median recurrence time shortened from 20.1 months for adenomas $\leq$ 10 mm to 7.7 months for those >10 mm. This confirms that larger colorectal adenomas not only increase the recurrence rate but also shorten the time to recurrence \cite{17-18}.
The number of adenomas is also positively correlated with early ACRN recurrence; the more adenomas present, the higher the risk. Studies have shown that recurrence rates for patients with 1, 2, and $\geq$ 3 adenomas were 19.6%, 23.2%, and 30.8%, respectively. Research has identified $\geq$ 3 polyps as an independent risk factor for recurrence, consistent with the findings of Ge et al. \cite{22-23}. The underlying mechanism may be that a high number of adenomas indicates a genetic phenotype and gut microecology conducive to polyp growth.
Domestic and international guidelines recommend shorter follow-up intervals for patients with adenomas exhibiting high-grade intraepithelial neoplasia (HGIN). This is due to the higher recurrence rates, shorter recurrence times, and increased likelihood of progressing to advanced adenomas in these patients. In this study, the degree of dysplasia was categorized into HGIN and low-grade intraepithelial neoplasia (LGIN). Dysplasia severity was positively correlated with early ACRN recurrence, aligning with current guidelines \cite{24-26}. Dysplasia refers to cellular differences in morphology, size, and arrangement compared to normal cells and is a critical step in tumorigenesis.
The higher the degree of dysplasia, the greater the malignant potential of the cells and the higher the risk of ACRN recurrence. Studies have shown that patients with high-grade dysplasia have a higher risk of ACRN recurrence than those with low-grade dysplasia. Baile-Max found that high-grade dysplasia is an independent risk factor for metachronous colorectal cancer or advanced adenomas after endoscopic resection of high-risk adenomas. However, a few results differ slightly from ours \cite{29-30}. Analysis by Zhang et al. found that while HGIN was a risk factor in univariate analysis, it was not an independent risk factor in multivariate regression, possibly due to different pathological classification methods or an insufficient sample size (283 cases). Liu et al. categorized pathology into non-neoplastic polyps, tubular adenomas, tubulovillous adenomas, and adenomas with LGIN, HGIN, or non-invasive cancer; this overlapping of histology and pathology might also influence results.
The TyG index, a combined measure of triglycerides and glucose, is a simple and reliable surrogate marker for insulin resistance. Studies have proven that a high TyG index increases the risk of gastrointestinal tumors. Li et al. found that the highest quartile of the TyG index was associated with a higher risk of advanced adenoma recurrence; for every 1-unit increase in the TyG index, the risk of colorectal adenoma increased by 22.5% (95% CI = 1.027–1.460), likely due to insulin resistance. The insulin and insulin-like growth factor axis promotes tumor formation by directly stimulating cell proliferation and indirectly altering glucose metabolism. Insulin may also promote tumorigenesis through the following mechanisms:
Up-regulating cholesterol acyltransferase-1 (which mediates the proliferation and metastasis of colorectal cancer cells) and increasing the expression of vascular cell adhesion molecule-1 in tumor endothelial cells, thereby altering the homing behavior of other immune cells to the tumor microenvironment. Insulin may also promote tumor formation by up-regulating acyl-CoA:
cholesterol acyltransferase-1, which mediates cell proliferation and metastatic effects in colorectal cancer cells, and by increasing the expression of vascular cell adhesion molecule-1 in tumor endothelial cells, thus altering the homing of other immune cells to the tumor microenvironment.
Platelet count (PLT) is an important factor in early ACRN recurrence; abnormal PLT is positively correlated with recurrence. Research by Koper-Lenkiewicz et al. showed that for every $10 \times 10^3/\mu\text{L}$ increase in PLT, the average concentration of serum interleukin-6 in colorectal cancer patients increased by 4%. This suggests that while platelets play a role in thrombosis, they are also directly involved in inflammation, promoting tumorigenesis. In this study, a history of alcohol consumption appeared as a protective factor for early ACRN recurrence. However, when our research group previously constructed a risk prediction model for advanced tumors after resection, alcohol history was not included. A possible reason is that this study defines early recurrence as occurring within one year, a timeframe in which the cumulative effects of alcohol may not be significant. During long-term follow-up, alcohol consumption is generally not a protective factor.
Chinese General Practice
Furthermore, the relationship between alcohol consumption and early ACRN recurrence is complex and likely dose-dependent. The protective effect observed in this study may be limited to a specific range of low-to-moderate consumption.
Moderate alcohol consumption, particularly beverages like red wine (rich in antioxidants such as resveratrol), may possess certain anti-inflammatory or antioxidant properties that could theoretically offer a weak protective effect. However, this remains controversial in the scientific community, and alcohol consumption is by no means recommended for disease prevention. More importantly, long-term excessive drinking is a globally recognized risk factor for colorectal cancer. Subsequent studies need to analyze the type, dose, and frequency of alcohol consumption more deeply to clarify the nature of this association.
The predictive model developed in this study is primarily applicable to patients who have undergone colorectal adenoma resection, aiming to predict their risk of early ACRN recurrence within one year. The effective application of this model requires complete clinical data. It is not intended for primary screening of the general population but rather as a clinical decision support tool for the follow-up management of post-resection patients. It helps clinicians identify high-risk individuals requiring intensive monitoring, enabling more precise follow-up interventions and risk management during the critical first postoperative year.
This study has certain limitations. First, the retrospective design may introduce selection and information bias; the relatively small sample size relative to the follow-up period may affect the accuracy of statistical inferences. Second, although internal validation was performed using a test set, the lack of external cohort validation limits the generalizability of the model. We plan to expand the sample size through multi-center collaboration to enhance the level of evidence. Additionally, regarding inflammatory markers, we only used baseline SII values and lacked dynamic postoperative monitoring data, making it difficult to fully assess the association between temporal changes in inflammation and recurrence risk. Future studies will adopt multi-timepoint data collection strategies.
In summary, this study identified seven influential factors—number of clinical symptoms, adenoma size, number of adenomas, degree of dysplasia, alcohol history, platelet count, and TyG index—using Boruta feature selection. A machine learning-based model was constructed to predict the risk of early ACRN recurrence after colorectal adenoma resection. Validation via ROC curves, calibration curves, and DCA curves demonstrated the model's strong predictive capability. Endoscopists can use this model to identify high-risk populations and develop targeted intervention strategies.
Author Contributions: Sun Yujie, Li Jiaoyan, Chen Jingfeng, and Ding Suying designed the experiments, implemented the research, and collected/organized the data. Sun Yujie performed the statistical analysis, analyzed the results, and wrote the manuscript. Li Jiaoyan, Chen Jingfeng, and Ding Suying provided critical intellectual review and research guidance. The authors declare no conflicts of interest.
参考文献
References
[1] BRAY F, LAVERSANNE M, SUNG H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries [J]. CA Cancer J Clin, 2024, 74(3): 229-263. DOI: 10.3322/caac.21834.
[2] MORGAN E, ARNOLD M, GINI A, et al. Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from GLOBOCAN [J]. Gut, 2023, 72(2): 338-344.
[3] CAO W, CHEN H D, YU Y W, et al. Changing profiles of cancer burden worldwide and in China: a secondary analysis of the global cancer statistics 2020 [J]. Chin Med J (Engl), 2021, 134(7): 783-791. DOI: 10.1097/CM9.0000000000001474.
[4] CARDOSO R, GUO F, HEISSER T, et al. Colorectal cancer incidence, mortality, and stage distribution in European countries in the colorectal cancer screening era: an international population-based study [J]. Lancet Oncol, 2021, 22(7): 1002-1013.
[5] ZENG H M, CHEN W Q, ZHENG R S, et al. Changing cancer survival in China during 2003-15: a pooled analysis of 17 population-based cancer registries [J]. Lancet Glob Health, 2018, 6(5): e555-e567. DOI: 10.1016/S2214-109X(18)30127-X.
[6] ZHANG B P, WEI W, LI A M, et al. Expert consensus on the diagnosis and treatment of colorectal adenoma and early colorectal cancer with integrated traditional Chinese and Western medicine (2021) [J]. Journal of Traditional Chinese Medicine, 2022, 63(10): 989-997. DOI: 10.13288/j.11-2166/r.2022.10.017.
[7] GAO Q Y, CHEN H M, SHENG J Q, et al. The first year follow-up after colorectal adenoma polypectomy is important: a multiple-center study in symptomatic hospital-based individuals in China [J]. Front Med China, 2010, 4(4): 436-442. DOI: 10.1007/s11684-010-0103-y.
[8] O'CONNELL M J, CAMPBELL M E, GOLDBERG R M, et al.
Survival following recurrence in stage II and III colon cancer: findings from the ACCENT data set[J]. J Clin Oncol,2008,26 (14):2336-2341. DOI:10.1200/JCO.2007.15.8261. DOI: 10.1200/JCO.2007.15.8261 [9]GUPTA S,LIEBERMAN D,ANDERSON J C,et al.
Recommendations for follow-up after colonoscopy and polypectomy: a consensus update by the US multi-society task force on colorectal cancer [J]. Gastrointest Endosc, 2020, 91(3): 463-485.e5.
[10] Cui Wenming. Analysis of risk factors and construction of a clinical prediction model for early recurrence of initially resectable colorectal cancer [D]. Zhengzhou University, 2023.
[11] WOLF A M D, FONTHAM E T H, CHURCH T R, et al.
Colorectal cancer screening for average-risk adults:2018 guideline update from the American Cancer Society[J]. CA Cancer J Clin, 2018,68(4):250-281. DOI:10.3322/caac.21457. [12]CHEN Y X,GAO Q Y,ZOU T H,et al. Berberine versus placebo for the prevention of recurrence of colorectal adenoma:a multicentre,double-blinded,randomised controlled study[J].
Lancet Gastroenterol Hepatol, 2020, 5(3): 267-275. DOI: 10.1016/S2468-1253(19)30409-1.
[13] ZHANG Beiping, ZHONG Cailing, LIANG Baoyi, et al. Clinical observation of Tiaochang Xiaoliu Decoction in preventing the recurrence of colorectal adenoma one year after surgery: A randomized controlled trial of 176 cases [J]. Journal of Traditional Chinese Medicine, 2020, 61(22): 1971-1976. DOI: 10.13288/j.11-2166/r.2020.22.012.
[14] HARTSTEIN J D, VEMULAPALLI K C, REX D K. The predictive value of small versus diminutive adenomas for subsequent advanced neoplasia [J]. Gastrointest Endosc, 2020, 91(3): 614-621.
Chinese General Practice [15]JUNG Y S,KIM T J,NAM E,et al. Comparative systematic review and meta-analysis of 1- to 5-mm versus 6- to 9-mm adenomas on the risk of metachronous advanced colorectal neoplasia[J]. Gastrointest Endosc,2020,92(3):692-701. [16]KIM N H,JUNG Y S,PARK J H,et al. Risk of developing metachronous advanced colorectal neoplasia after colonoscopic polypectomy in patients aged 30 to 39 and 40 to 49 years[J].
Gastrointest Endosc, 2018, 88(4): 715-723. DOI: 10.1016/
[17] GAO H, ZHANG C, YAN X Y, et al. Clinical analysis of polyp recurrence and endoscopic surveillance after colorectal adenoma resection [J]. Journal of Gastroenterology and Hepatology, 2014, 23(3):
[18] GAO X, XIAO J. Discussion on endoscopic and pathological factors affecting recurrence after colorectal adenoma surgery [J]. Chinese Journal of Integrated Traditional and Western Medicine on Digestion, 2025, 33(05): 567-
[19] LI Y, ZHAO L P, MA H, et al. Influencing factors of recurrence after endoscopic resection of colorectal adenomas [J]. Chinese Journal of Digestion, 2020, 40(12): 850-855. DOI: 10.3760/cma.j.cn311367-20200404-00205
[20] GENG R L. Analysis of risk factors for recurrence after endoscopic resection of colorectal adenomas [D].
[21] GE J, HUA M, ZHAO B, et al. Analysis of risk factors for recurrence after endoscopic resection of colorectal polyps [J]. Chinese Journal of Endoscopy, 2020, 26(08): 20-24.
[22] KAY M, ENG K, WYLLIE R. Colonic polyps and polyposis syndromes in pediatric patients [J]. Curr Opin Pediatr, 2015, 27(5): 634-641. DOI: 10.1097/MOP.0000000000000265.
[23] MO C, YANG Y S. Intestinal microecological dysbiosis and the formation of colorectal adenomas [J]. Chinese Journal of Clinical Gastroenterology, 2016, 28(6): 383-386. DOI: 10.3870/lcxh.
[24] Early Diagnosis and Treatment Group of the Oncology Branch of the Chinese Medical Association. Expert consensus on early diagnosis and treatment of colorectal cancer in China (2023 edition) [J]. National Medical Journal of China, 2023, 103(48): 3896-3908. DOI: 10.3760/cma.j.cn112137-20230804-00164
[25] RUTTER M D, EAST J, REES C J, et al. British Society of Gastroenterology/Association of Coloproctology of Great Britain and Ireland/Public Health England post-polypectomy and post-colorectal cancer resection surveillance guidelines [J]. Gut, 2020, 69(2): 201-223. DOI: 10.1136/gutjnl-2019-319858
[26] HASSAN C, ANTONELLI G, DUMONCEAU J M, et al.
Post-polypectomy colonoscopy surveillance: European Society of Gastrointestinal Endoscopy (ESGE) Guideline - Update 2020 [J]. Endoscopy, 2020, 52(8): 687-700. 33. DOI: 10.1055/a-1185-3109
[27] BAILE-MAX A S, JOVER R. Surveillance after colorectal polyp resection [J]. Best Pract Res Clin Gastroenterol, 2023, 66:
[28] BAILE-MAX A S, MANGAS-SANJU N C, LADABAUM U, et al. Risk factors for metachronous colorectal cancer or advanced adenomas after endoscopic resection of high-risk adenomas [J]. Clin Gastroenterol Hepatol, 2023, 21(3): 630-643. DOI: 10.1016/
[29] ZHANG L M, LIU Y L, ZHU Y M, et al. Analysis of recurrence after high-frequency electrocoagulation resection of colonic adenomas [J]. Chinese Journal of Digestive Endoscopy, 2012, 29(8):
[30] LIU N, LIU F G, SUN L J, et al. Study on the risk of recurrence after colorectal polypectomy [J]. Chinese Journal of Digestive Endoscopy, 2017, 34(12): 861-865.
[31] FRITZ J, BJ RGE T, NAGEL G, et al. The triglyceride-glucose index as a measure of insulin resistance and risk of obesity-related cancers [J]. Int J Epidemiol, 2020, 49(1): 193-204. DOI: 10.1093/ije/dyz053.
[32] LI J Y, CHEN J F, LIU H S, et al. Association of the triglyceride-glucose index with the occurrence and recurrence of colorectal adenomas: a retrospective study from China [J]. BMC Public Health, 2024, 24(1): 579. DOI: 10.1186/s12889-024-18076-x.
[33] KASPRZAK A. Insulin-like growth factor 1 (IGF-1) signaling in glucose metabolism in colorectal cancer [J]. Int J Mol Sci, 2021, 22(12): 6434. DOI: 10.3390/ijms22126434.
[34] CHEN X, LIANG H L, SONG Q B, et al. Insulin promotes progression of colon cancer by upregulation of ACAT1 [J]. Lipids Health Dis, 2018, 17(1): 122. DOI: 10.1186/s12944-018-0773-x.
[35] WANG X, H RING M F, RATHJEN T, et al. Insulin resistance in vascular endothelial cells promotes intestinal tumour formation [J]. Oncogene, 2017, 36(35): 4987-4996. DOI: 10.1038/onc.2017.107.
[36] KOPER-LENKIEWICZ O M, DYMICKA-PIEKARSKA V, MILEWSKA A J, et al. The relationship between inflammation markers (CRP, IL-6, sCD40L) and colorectal cancer stage, grade, size and location [J]. Diagnostics (Basel), 2021, 11(8): 1382. DOI: 10.3390/diagnostics11081382.
[37] LI J Y. Risk prediction model for advanced neoplasia after colorectal adenoma resection [D].
Zhengzhou University, 2024. (Received: 2025-05-07; Revised: 2025-08-11) (Editor: LI Weixia)