中国现代神经疾病杂志 ›› 2022, Vol. 22 ›› Issue (10): 841-849. doi: 10.3969/j.issn.1672-6731.2022.10.003

• 神经外科疾病大数据 • 上一篇    下一篇

2 大数据模型预测动脉瘤夹闭术后动脉瘤性蛛网膜下腔出血患者预后临床研究:模型建立与评价

杨翀1, 李旭东2, 吕良福3, 袁恒杰4, 张毅1   

  1. 1. 300350 天津市环湖医院药剂科;
    2. 300350 天津市环湖医院神经外科;
    3. 300072 天津大学数学学院;
    4. 300052 天津医科大学总医院药剂科
  • 收稿日期:2022-10-12 出版日期:2022-10-25 发布日期:2022-11-04
  • 通讯作者: 张毅,E-mail:103841540@qq.com
  • 基金资助:
    国家自然科学基金青年科学基金资助项目(项目编号:81102447)

Clinical study of a big data model for predicting the prognosis of patients with aneurysmal subarachnoid hemorrhage after aneurysm clipping: model development and evaluation

YANG Chong1, LI Xu-dong2, Lü Liang-fu3, YUAN Heng-jie4, ZHANG Yi1   

  1. 1. Department of Pharmacy, Tianjin Huanhu Hospital, Tianjin 300350, China;
    2. Department of Neurosurgery, Tianjin Huanhu Hospital, Tianjin 300350, China;
    3. School of Mathematics, Tianjin University, Tianjin 300072, China;
    4. Department of Pharmacy, Tianjin Medical University General Hospital, Tianjin 300052, China
  • Received:2022-10-12 Online:2022-10-25 Published:2022-11-04
  • Supported by:
    This study was supported by the National Natural Science Foundation of China for Young Scientists(No. 81102447)

摘要: 目的 筛查动脉瘤性蛛网膜下腔出血(aSAH)患者动脉瘤夹闭术后预后相关影响因素,并基于机器学习算法构建预测模型。方法 回顾2020年10月至2021年7月在天津市环湖医院行动脉瘤夹闭术的182例aSAH患者临床资料、实验室指标、手术相关资料和药物应用情况,按照7∶3比例随机分为训练集和测试集,训练集用于构建预测模型、测试集用于评价模型预测效能。采用合成少数类过采样技术(SMOTE)处理不平衡数据,通过递归特征消除法、Spearman秩相关分析、XGBoost特征重要性分析选择特征变量,并基于最优特征子集采用Logistic回归(LR)、随机森林(RF)、支持向量机(SVM)、决策树(DT)、K最近邻(KNN)和朴素贝叶斯(NB)共6种机器学习算法建立预后预测模型,绘制受试者工作特征(ROC)曲线并计算曲线下面积以及准确度、灵敏度、召回率和F1值。结果 训练集共纳入127例患者,包括预后良好[Glasgow预后分级(GOS)4 ~ 5级]103例、预后不良(GOS分级1 ~ 3级)24例,SMOTE技术生成79例预后不良数据,使预后良好与不良病例均达103例;测试集计55例,包括预后良好44例、预后不良11例。通过特征变量选择和特征重要性分析共获得17个最优特征子集,动脉瘤个数,碱性磷酸酶、肌酐,应用赖氨酸、肝素钠、硝酸甘油与预后良好呈正相关;年龄,Hunt?Hess评分,成熟中性粒细胞绝对值、血清钠、尿酸、总胆红素、嗜碱性粒细胞绝对值、肌酸激酶,应用呋塞米、人血白蛋白,住院时间与预后良好呈负相关。ROC曲线显示,LR模型预测预后的曲线下面积为0.75 ± 0.08(95%CI:0.615 ~ 0.857,P = 0.001),准确度0.764、灵敏度0.919、召回率0.773、F1值0.840;RF模型为0.57 ± 0.08(95%CI:0.428 ~ 0.701,P = 0.283),准确度0.745、灵敏度0.826、召回率0.864、F1值0.845;SVM模型为0.65 ± 0.08(95%CI:0.507 ~ 0.772,P = 0.034),准确度0.764、灵敏度0.860、召回率0.841、F1值0.850;DT模型为0.61 ± 0.09(95%CI:0.473 ~ 0.742,P = 0.135),准确度0.709、灵敏度0.850、召回率0.773、F1值0.810;KNN模型为0.66 ± 0.08(95%CI:0.519 ~ 0.782,P = 0.060),准确度0.618、灵敏度0.897、召回率0.591、F1值0.712;NB模型为0.56 ± 0.08(95%CI:0.417 ~ 0.691,P = 0.458),准确度0.673、灵敏度0.825、召回率0.750、F1值0.786;尤以LR模型预测效能最佳(均P < 0.05)。结论 基于6种机器学习算法开发的模型可以较好地预测aSAH患者动脉瘤夹闭术预后,其中以LR模型预测效能最佳,可用于术前评估,有助于神经外科医师制定临床决策。

关键词: 颅内动脉瘤, 蛛网膜下腔出血, 预后, 人工智能, 算法, ROC曲线

Abstract: Objective To explore the risk factors for the prognosis of patients with aneurysmal subarachnoid hemorrhage (aSAH) after clipping, and to construct a predictive model based on machine learning algorithms to guide early identification of high risk patients.Methods A total of 182 patients with aSAH who underwent clipping in Tianjin Huanhu Hospital from October 2020 to July 2021 were reviewed. According to the ratio of 7∶3, all the data were randomly divided into training set (to construct the prediction model) and test set (to evaluate the prediction model). Synthetic minority oversampling technique (SMOTE) was used to deal with imbalance data. Recursive feature elimination method, Spearman rank correlation analysis and XGBoost feature importance analysis were used to select the optimal variables. Logistic regression (LR), random forest (RF), support vector machine (SVM), decision tree (DT), K near neighbor (KNN) and naive Bayesian (NB) algorithms based on machine learning were used to construct a prediction model. Receiver operating characteristic (ROC) curve was plotted and the area under the curve (AUC) was calculated, as well as accuracy, precision, recall and F1 values.Results All 182 patients were randomly divided into a training set of 127 cases according to the ratio of 7∶3, including 103 cases with good prognosis [Glasgow Outcome Scale (GOS) grade 4-5] and 24 cases with poor prognosis (GOS grade 1-3). The data was balanced by generating 79 cases of poor prognosis by SMOTE technique (103 cases of good prognosis and 103 cases of poor prognosis). The test set consisted of 55 cases, including 44 cases with good prognosis and 11 cases with poor prognosis. A total of 17 optimal features were obtained by feature selection and feature importance analysis, the number of aneurysms, alkaline phosphatase, creatinine, application of lysine, sodium heparin and nitroglycerin tend to be positive correlated with good prognosis, while age, Hunt - Hess score, mature neutrophil count, serum sodium, uric acid, total bilirubin, basophilic granulocyte basophil count, creatine kinase, application of furosemide, human albumin, and length of hospital stay tend to be negative correlated with good prognosis. The AUC of LR model was 0.75 ± 0.08 (95%CI: 0.615-0.857, P = 0.001), an accuracy of 0.764, a precision of 0.919, a recall of 0.773, and an F1 value of 0.840; the RF model was 0.57 ± 0.08 (95%CI: 0.428-0.701, P = 0.283), an accuracy of 0.745, a precision of 0.826, a recall of 0.864, and an F1 value of 0.845; the SVM model was 0.65 ± 0.08 (95%CI: 0.507-0.772, P = 0.034), an accuracy of 0.764, a precision of 0.860, a recall of 0.841, an F1 value of 0.850; the DT model was 0.61 ± 0.09 (95%CI: 0.473-0.742, P = 0.135), an accuracy of 0.709, a precision of 0.850, a recall of 0.773, an F1 value of 0.810; the KNN model was 0.66 ± 0.08 (95%CI: 0.519-0.782, P = 0.060), an accuracy of 0.618, a precision of 0.897, a recall of 0.591, and F1 value of 0.712; and the NB model was 0.56 ± 0.08 (95%CI: 0.417-0.691, P = 0.458), an accuracy of 0.673, a precision of 0.825, a recall of 0.750, and an F1 value of 0.786. In particular, the LR model has the best prediction performance (P < 0.05, for all).Conclusions Machine learning algorithms performed well in predicting the prognosis of aSAH clipping, among which the LR model had the best prediction performance and could be used for preoperative prediction to help neurosurgeons make better clinical decisions.

Key words: Intracranial aneurysm, Subarachnoid hemorrhage, Prognosis, Artificial intelligence, Algorithms, ROC curve