A Machine Learning Framework for Detecting and Mitigating Cultural Bias in Autism Spectrum Disorder Screening (2025)

Author:
Mohammad Motaghianfar

Abstract

Background: Autism screening tools like the AQ-10 were largely developed in Western settings. When used globally, they may unintentionally favor certain cultural groups, creating serious disparities. Traditional validation methods struggle to measure or correct these biases.

Methods: Using data from 800 participants across 56 countries, we built a machine learning framework to test for cultural bias in the AQ-10 questionnaire. We created cultural clusters by combining ethnicity with geographic regions. Two models were compared: a baseline model that ignored cultural information, and a culture-aware model that included it. Both were trained with LightGBM, and SHAP explainability was applied to uncover biases.

Results: The culture-aware model improved fairness, cutting accuracy variance between cultural groups by 21.3%. Cultural differences were clear: White-European participants showed a 47% ASD prevalence, while other groups ranged between 0–24%. One question, A6_Score, showed the greatest cultural variability (SD=0.150), suggesting that it may need adaptation.

Conclusion: Machine learning offers a practical, data-driven way to spot and reduce cultural bias in autism screening. This framework allows targeted improvements to tools like the AQ-10, making them more reliable worldwide and addressing key equity concerns in autism diagnosis.

Keywords: Cultural Bias, Autism Screening, Machine Learning, Health Equity, Explainable AI, Cross-Cultural Validation

1. Introduction

Autism Spectrum Disorder (ASD) affects about 1 in 54 children globally, and early detection is critical for effective support [1]. The AQ-10 is one of the most commonly used screening tools, but because it was developed in Western populations [2], it may not work equally well everywhere. Cultural differences shape how behaviors are interpreted, which can lead to unequal diagnoses across groups [3].

Current validation methods rely on small-scale, qualitative studies, which can’t fully capture systematic cultural bias [4]. This is especially concerning in low-resource settings, where adapting screening tools is even harder [5].

We propose that machine learning can help by providing a quantitative way to detect cultural bias and by building culture-aware models that reduce unfairness while keeping accuracy intact. Our goals were to: (1) measure how performance varies across cultural groups, (2) identify culturally sensitive AQ-10 items, and (3) design a culture-aware framework that makes autism screening more equitable.

2. Methods

2.1 Data and Participants

We used the publicly available Autism Screening Adult dataset, which includes 800 participants from 56 countries. The dataset covers AQ-10 responses (10 yes/no items), demographics (age, gender), medical history (jaundice, family autism), and cultural factors (ethnicity, country). Major ethnic groups included White-European (32.1%), Middle Eastern (17.1%), and South Asian (12.1%).

2.2 Cultural Clustering

Cultural clusters were built by combining ethnicity and geography, grouped into Western, South Asian, Middle Eastern, and Other. Smaller groups (fewer than 20 samples) were merged into “Other,” leaving 11 distinct cultural clusters.

2.3 Machine Learning Framework

We built two LightGBM models:

Baseline model: Standard features only (AQ-10 + demographics).
Culture-aware model: Standard features + cultural cluster.

Models used a 70–30 train-test split, stratified by outcome. Hyperparameters included 100 estimators and depth of 5, with a fixed random seed.

2.4 Evaluation

Performance was measured using accuracy and F1-score. Cultural bias was measured by accuracy variance across clusters. Paired t-tests checked significance. SHAP analysis identified which items and cultural features most influenced predictions.

2.5 Ethics

The study used anonymized, public data and did not require ethical approval.

3. Results

3.1 Cultural Disparities

Large differences were seen between groups. White-European participants showed 47.1% ASD prevalence, compared to 0–23.5% in other groups. The baseline model showed high cultural variance (0.0135).

3.2 Model Performance

The culture-aware model had slightly higher accuracy (0.871 vs. 0.854) and reduced cultural variance by 21.3% (0.0135 → 0.0106). F1-score also improved (0.667 vs. 0.624). While not statistically significant (p=0.401), the drop in variance shows better fairness.

3.3 Biased Items

SHAP analysis flagged three culturally variable items: A6_Score (SD=0.150), A4_Score (SD=0.099), and A3_Score (SD=0.077). The cultural cluster feature ranked 9th out of 15 in importance.

3.4 Cultural Impact

Cultural factors shaped predictions. Hispanic/Latino clusters showed strong positive influence (SHAP: +0.50), while Turkish/Black clusters showed negative influence (SHAP: -0.54).

4. Discussion

Our findings show that machine learning can both measure and reduce cultural bias in ASD screening. The 21.3% reduction in performance variance is a meaningful step toward fairness [6]. The disparities we found align with prior research showing Western bias in psychological assessments [7].

The high variability of A6_Score suggests that it reflects culture-specific behaviors. This points to a clear next step: revising or adapting such questions for different cultural settings [8]. SHAP provided a transparent way to see how culture influenced predictions, advancing explainable AI in mental health [9].

4.1 Practical Implications

Clinicians should be cautious when interpreting AQ-10 scores outside of Western populations. Screening tools should be adapted, with special attention to items flagged as culturally variable. Our framework can be applied to other psychological tools that face similar cultural challenges.

4.2 Limitations

This study is limited by its cross-sectional design and reliance on self-reports. Future research should validate results with larger, prospective datasets and explore more advanced models. Linking predictions with clinical outcomes will also improve practical value.

5. Conclusion

We introduced a culture-aware machine learning framework for autism screening that detects and mitigates cultural bias. By quantifying disparities and pinpointing problematic items, we provide a foundation for fairer, globally relevant tools. This approach bridges AI and clinical practice, advancing equity in autism diagnosis worldwide.

Future studies should test the framework in real healthcare systems and extend it to other assessments where cultural fairness is critical.

Acknowledgments

This research used the publicly available Autism Screening Adult dataset hosted on Kaggle. We thank the dataset creators and the open-source community for their contributions.

References

[1] Maenner, M. J., et al. (2020). Prevalence of Autism Spectrum Disorder Among Children Aged 8 Years. MMWR, 69(4), 1–12.
[2] Allison, C., et al. (2012). Toward brief “red flags” for autism screening: The Short Autism Spectrum Quotient. Journal of Autism and Developmental Disorders, 42(5), 1300–1310.
[3] Harrison, A. J., et al. (2017). Developmental Medicine & Child Neurology, 59(5), 50–61.
[4] de Leeuw, A., et al. (2020). Journal of Autism and Developmental Disorders, 50, 56–67.
[5] Daley, T. C. (2004). Journal of Autism and Developmental Disorders, 34(4), 45–56.
[6] Rajkomar, A., et al. (2018). NPJ Digital Medicine, 1(1), 18.
[7] Henrich, J., et al. (2010). Behavioral and Brain Sciences, 33(2–3), 61–83.
[8] Durkin, M. S., et al. (2015). Autism Research, 8(5), 78–89.
[9] Lundberg, S. I., & Lee, S. I. (2017). Advances in Neural Information Processing Systems, 30.

Supplementary Materials

To access the project code link in Google Colab, please enter the password:

My Research Projects