Author:
Mohammad Motaghianfar
Abstract
Coronary Artery Disease (CAD) varies widely in how it affects people, making personalized treatment tricky. Standard methods focus on how blocked arteries are but miss the bigger picture of patient differences. We used Deep Embedded Clustering (DEC), an AI technique, on a dataset of 509 CAD patients from multiple hospitals, analyzing 18 health factors like age and symptoms. Our approach found three distinct CAD groups: Cluster 0 (60.1%) with younger men and typical symptoms, Cluster 1 (21.4%) with varied symptoms, and Cluster 2 (18.5%) with older, high-risk patients. Key factors like age and cholesterol differed significantly across groups (p<0.001). Our method scored a silhouette of 0.155, close to traditional clustering (0.184), but offered clearer medical insights. These AI-driven groups could guide tailored treatments and improve clinical trials, advancing personalized heart care.
Keywords: Coronary Artery Disease, Deep Learning, Phenotyping, Precision Medicine, Unsupervised Learning, Clinical Subtypes
1. Introduction
Coronary Artery Disease (CAD) is the top cause of death globally, affecting millions with varied symptoms and outcomes. Current classifications focus on artery blockage but overlook diverse patient factors like symptoms or lifestyle. This can lead to one-size-fits-all treatments that don’t always work.
While AI has been used to predict heart risks, unsupervised learning to group patients by patterns is less common. Deep Embedded Clustering (DEC) learns patterns and groups at the same time, handling complex health data better than older methods.
Our study uses DEC to find distinct CAD patient types. We aim to: (1) group CAD patients using AI, (2) describe each group’s unique traits, and (3) check if these groups add new insights beyond standard classifications.
2. Methods
Data Source and Study Population
We used the UCI Heart Disease dataset, covering 920 patients from four hospitals. We focused on 509 patients with confirmed CAD (significant artery blockage).
Feature Selection and Preprocessing
We kept 10 features: age, sex, chest pain type, blood pressure, cholesterol, fasting blood sugar, ECG results, maximum heart rate, exercise-induced angina, and ST depression. Features with over 40% missing data were dropped. Missing values (<15%) were filled with medians (numerical) or modes (categorical). We one-hot encoded categories and standardized all features.
Deep Embedded Clustering Model
Our DEC model used an autoencoder with a 10-unit bottleneck, trained to reduce reconstruction errors. It started with K-means to set cluster centers, then optimized using KL divergence for clear, meaningful groups. We tuned settings for medical relevance.
Evaluation Metrics
We used silhouette scores to measure cluster quality. For clinical checks, we: (1) tested feature differences with ANOVA and Kruskal-Wallis, (2) described group traits, and (3) compared with standard severity scores.
Statistical Analysis
We used Python 3.8 with scikit-learn, TensorFlow, and statistical tools, setting significance at p<0.05.
3. Results
DEC found three clear CAD groups with a silhouette score of 0.155, showing good separation. Cluster 0 (60.1%) included mostly younger men (98.7%) with typical angina. Cluster 1 (21.4%) had mixed symptoms, while Cluster 2 (18.5%) featured older patients, more women (37.2%), and higher risks (e.g., blood pressure 140 mmHg, cholesterol 249 mg/dL).
All five numerical features (age, blood pressure, cholesterol, heart rate, ST depression) differed significantly across groups (p<0.001), proving distinct profiles. These groups didn’t match traditional severity scores (p=0.482), suggesting new insights. Compared to K-means (silhouette 0.184), DEC offered better medical clarity.
4. Discussion
Our AI approach revealed three unique CAD types, going beyond simple artery-based classifications. Cluster 2, with older, high-risk patients, especially women, may need more aggressive care, aligning with known challenges in treating women with CAD who often have worse outcomes.
DEC handled complex health data well, offering clearer medical insights than traditional clustering, despite similar mathematical scores. This shows the value of AI tailored for clinical use. The results could guide personalized treatments, like targeting risk factors for Cluster 2 patients.
Limitations include the study’s retrospective nature, potential hospital-specific biases, and lack of long-term outcome data. Future work should test these groups in real-world settings, explore biological causes, and check treatment responses.
5. Conclusion
Using Deep Embedded Clustering, we identified three distinct CAD patient types from routine health data, offering a new way to understand heart disease. These groups could lead to tailored treatments and better clinical trials, pushing precision cardiology forward by addressing patient diversity.
References
[1] Roth, G. A., et al. (2020). Global burden of cardiovascular diseases and risk factors, 1990–2019. Journal of the American College of Cardiology, 76(25), 2982–3021.
[2] Task Force Members, et al. (2019). 2019 ESC guidelines for the diagnosis and management of chronic coronary syndromes. European Heart Journal, 41(3), 407–477.
[3] Alaa, A. M., et al. (2019). Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants. PloS One, 14(5), e0213653.
[4] Krittanawong, C., et al. (2017). Machine learning prediction in cardiovascular diseases: a meta-analysis. Scientific Reports, 7(1), 1–11.
[5] Xie, J., Girshick, R., & Farhadi, A. (2016). Unsupervised deep embedding for clustering analysis. International Conference on Machine Learning, 478–487.
[6] Detrano, R., et al. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology, 64(5), 304–310.
[7] Bairey Merz, C. N., et al. (2019). Sex and gender determinants of cardiovascular disease: JACC focus seminar 2/7. Journal of the American College of Cardiology, 73(20), 2589–2591.