Explainable AI Reveals Healthcare Cost Determinants When Accurate Forecasting Fails (2025)

Author:
Mohammad Motaghianfar

Abstract

Background: Healthcare costs are rising quickly, and understanding what drives them has become essential. Traditional prediction models often miss the mark because medical billing is highly unpredictable. Explainable AI (XAI), however, can uncover useful insights even when forecasting isn’t reliable.

Methods: We worked with 13,851 patient records that included demographics, clinical information, and administrative details. Using Extreme Gradient Boosting (XGBoost) together with SHAP (SHapley Additive exPlanations), we built an interpretable framework to identify what influences healthcare costs most. The data included features like age, gender, blood type, medical conditions, insurance provider, admission type, and length of stay.

Results: Prediction performance was poor across the board. Linear regression had an R² of -0.0034, while XGBoost had an R² of -0.0116, showing that individual billing amounts are extremely hard to forecast. But SHAP analysis revealed consistent cost drivers: length of stay (average impact $432), age ($414), insurance provider ($192), and certain blood types ($118). These findings highlighted non-linear effects and feature interactions that standard models would have missed.

Conclusion: Even when forecasting fails, XAI provides valuable insights. By shifting the focus from predicting individual bills to understanding overall cost drivers, healthcare organizations can make smarter, population-level decisions. This makes XAI a practical tool for administrators trying to manage costs in complex environments.

Keywords: Explainable AI, Healthcare Costs, SHAP, Machine Learning, Healthcare Analytics

1. Introduction

Managing healthcare costs isn’t just about predicting how much a patient will be billed. Hospitals and administrators also need to understand why costs vary. Traditional statistics usually assume simple linear relationships, while modern machine learning models are often “black boxes” that provide little explanation. Both approaches leave a gap: they tell us how much costs might be, but not why they change.

Past research has mostly focused on making predictions more accurate using electronic health records [1,2]. But few studies have looked into what happens when predictions themselves fail, which is common because of the high variability in healthcare billing. Even if individual costs are hard to predict, understanding overall patterns and drivers is still essential for planning and resource management.

This study addresses two key gaps: (1) showing how XAI can still deliver insights when prediction accuracy is limited, and (2) identifying hidden, non-linear cost drivers using SHAP analysis. We hypothesize that XAI can uncover valuable business intelligence even when forecasting isn’t reliable.

2. Methods

2.1 Data

We used a synthetic healthcare dataset of 13,851 patient records, including 15 features that captured demographic details, clinical conditions, and administrative factors. The data was designed to mimic real-world hospital records while maintaining privacy.

2.2 Preprocessing

Before analysis, we:

Adjusted negative billing values with capping,
Calculated length of stay from admission and discharge dates,
Removed records with inconsistent timing,
One-hot encoded categorical variables, and
Standardized numerical features.

The dataset was split into training (80%) and testing (20%) sets to ensure reliability.

2.3 Models and Tools

We compared two models:

Linear regression (baseline)
XGBoost (main model)

We then applied SHAP analysis to interpret feature importance and uncover interactions. All work was carried out in Python using scikit-learn, XGBoost, and SHAP.

2.4 Evaluation

We assessed performance using R-squared (R²), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). For SHAP, we measured mean absolute SHAP values to rank features by their impact.

3. Results

3.1 Model Performance

Both models struggled to predict costs. Linear regression had R² = -0.0034 (RMSE = $13,894.75), while XGBoost had R² = -0.0116 (RMSE = $13,950.91). These negative R² scores mean both performed worse than simply guessing the average bill.

3.2 XAI Insights

Despite the weak predictions, SHAP analysis uncovered meaningful patterns. Key cost drivers included:

Length of stay ($431.66 average impact),
Age ($413.90),
Insurance provider ($192.19), and
Blood type (notably O- and AB-, around $118 each).

3.3 Feature Interactions

SHAP dependence plots revealed non-linear effects, particularly with age, showing that billing impacts varied across life stages. Interactions between insurance providers and medical conditions also suggested payer-specific billing differences.

4. Discussion

4.1 Key Takeaways

The poor predictive results align with prior research showing that individual healthcare costs are too variable to forecast reliably [3]. But SHAP added value by identifying patterns that matched both clinical and administrative logic.

Length of stay confirmed its well-known role as a cost driver [4].
Age reflected established trends in healthcare use over time.
Insurance differences pointed to contractual or population-based effects worth deeper investigation.
Blood type emerged unexpectedly and may be a proxy for other unmeasured health factors.

4.2 Contributions

This study shows how XAI can be useful even when predictions fail. Practically, the framework provides administrators with a way to analyze cost drivers without depending on accurate forecasting. Methodologically, it demonstrates the importance of moving beyond accuracy metrics and focusing on interpretability.

4.3 Limitations

Because we used synthetic data, the findings may not generalize perfectly. Future work should use richer real-world datasets with lab results, medications, and treatment details. Incorporating causal inference methods could also help confirm the relationships uncovered.

5. Conclusion

Explainable AI offers a way forward when prediction models fall short. By highlighting what drives costs at a population level, SHAP and similar tools provide hospitals with strategic insights for cost management. Instead of chasing perfect forecasts, healthcare leaders can use these methods to better understand and plan for the factors that truly influence costs.

Future research should explore real-world data and examine causal links between cost drivers and billing outcomes. The same framework can also be applied to other healthcare challenges where prediction is difficult but understanding remains crucial.

Acknowledgments

This study was made possible in part thanks to the publicly available “Healthcare Dataset” published by Prasad (2022) on Kaggle, which provided rich, de-identified patient records including demographic, clinical, and administrative features. The use of this open dataset enabled us to develop and test our explainable AI framework without compromising privacy. We extend our gratitude to the authors and contributors who made the dataset accessible to the research community.

References

[1] Rajkomar, A., et al. (2018). Scalable and accurate deep learning with electronic health records. NPJ Digital Medicine, 1(1), 18.
[2] Shwartz, M., et al. (2020). Estimating the costs of primary care transformation: a model for practice management. Health Affairs, 39(3), 255-263.
[3] Folland, S., Goodman, A. C., & Stano, M. (2018). The economics of health and health care. Pearson.
[4] Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.

Supplementary Materials

To access the project code link in Google Colab, please enter the password:

My Research Projects