Logo

Heart Disease Diagnosis Project

AIO2025: Module 03.

๐Ÿซ€
About this demo
Predict heart disease risk from patient data with optimized ML models trained on the Cleveland dataset.
Dataset: Cleveland Heart Disease ยท Models: Decision Tree, k-NN, Naive Bayes, Random Forest, AdaBoost, Gradient Boosting, XGBoost
โš ๏ธ
Educational Use Only
This interactive heart disease prediction demo is provided strictly for educational purposes. It is not intended for clinical use and must not be relied upon for medical advice, diagnosis, treatment, or decision-making. Always consult a qualified healthcare professional.

๐Ÿซ€ How to Use: Enter patient features โ†’ Run prediction โ†’ View ensemble results!

60 90

Loading dataset and training models...

Cleveland Preview (first rows)

Model Performance Comparison (Validation Set Results)

sex (0=female, 1=male)
cp (chest pain type 1..4)
fbs (>120 mg/dl? 1/0)
restecg (0..2)
exang (exercise angina 1/0)
slope (1..3)
ca (major vessels 0..3)
thal (3=normal, 6=fixed, 7=reversible)
Select Example Patient

๐Ÿ“ˆ Model Predictions

Individual Model Results

All Model Predictions

๐Ÿ“‹ Notes

  • Models are trained at launch on data/cleveland.csv with customizable train/validation split (default 80/20).
  • Target is binarized automatically (0 = no disease, >0 = disease).
  • Retrain functionality: Adjust the split ratio and click "๐Ÿ”„ Retrain Models" to see how data size affects performance.
  • Seven optimized models are compared: Decision Tree, k-NN, Naive Bayes, Random Forest, AdaBoost, Gradient Boosting, and XGBoost.
  • Hyperparameters are optimized for heart disease prediction tasks using best practices.
  • Ensemble uses weighted soft voting with optimized weights based on model performance.
  • Best performing model on test set is highlighted with ๐Ÿ† in the validation metrics table.
  • Optimization highlights:
    • Decision Tree: entropy criterion, balanced classes, optimal depth
    • k-NN: distance weighting, Manhattan metric, optimized neighbors
    • Random Forest: 200 trees, class balancing, feature sampling
    • Gradient Boosting: regularization, subsampling, lower learning rate
    • AdaBoost: SAMME algorithm, increased estimators
    • XGBoost: L1/L2 regularization, optimal depth and learning rate
  • Feature descriptions:
    • age: Patient age in years
    • sex: Gender (0=female, 1=male)
    • cp: Chest pain type (1-4)
    • trestbps: Resting blood pressure (mmHg)
    • chol: Serum cholesterol (mg/dl)
    • fbs: Fasting blood sugar >120 mg/dl (1=true, 0=false)
    • restecg: Resting ECG results (0-2)
    • thalach: Maximum heart rate achieved
    • exang: Exercise induced angina (1=yes, 0=no)
    • oldpeak: ST depression induced by exercise
    • slope: Slope of peak exercise ST segment (1-3)
    • ca: Number of major vessels colored by fluoroscopy (0-3)
    • thal: Thalassemia (3=normal, 6=fixed defect, 7=reversible defect)