Author:
Mohammad Motaghianfar
Abstract
Parkinson’s Disease (PD) affects millions globally, with speech changes often appearing early. Most AI models need large amounts of labeled data, which is hard to get in medical settings due to privacy and cost issues. This study tackles the need for a method that works with minimal data.
We developed a new approach using contrastive learning, which learns from unlabeled speech data and fine-tunes with just a few examples. Using a dataset of 147 speech samples with 754 features, we built a Siamese network with techniques like random masking and noise addition. We tested it with 10, 25, and 50 samples per group and compared it to standard methods like SVM and Random Forest.
Our model achieved 54.67% accuracy and 52.86% AUC-ROC, which is modest due to the small dataset. Traditional methods performed better (SVM: 86.96%, Random Forest: 78.26%), but our approach identified key speech markers like TQWT entropy and nonlinear dynamics.
This method lays the groundwork for efficient PD detection with limited data. While performance was moderate, it shows promise for larger datasets. Future work should test it on bigger, real-world datasets and refine the approach for medical use.
Keywords: Parkinson’s Disease, contrastive learning, speech analysis, few-shot learning, medical AI
1. Introduction
Parkinson’s Disease, the second most common neurodegenerative disorder, affects over 10 million people worldwide. About 90% of patients experience speech issues, known as hypokinetic dysarthria, often years before motor symptoms appear. Analyzing speech could enable early, non-invasive detection.
Current AI methods for PD detection need large labeled datasets, which are rare in clinical settings. Recent advances in contrastive learning, successful in fields like image and language processing, use unlabeled data to learn patterns. However, its use in analyzing PD-related speech is largely unexplored.
This study introduces a contrastive learning framework to detect PD with minimal labeled data. We aimed to: (1) create a method to analyze PD speech efficiently, (2) test it with few labeled samples, and (3) identify speech features that signal PD.
2. Methods
Data and Participants
We used the Parkinson’s Disease Speech Features dataset, which includes 147 voice recordings (108 PD, 39 healthy) with 754 acoustic features like jitter, shimmer, harmonicity, formant frequencies, MFCCs, and TQWT features.
Data Preprocessing
We filled in 350 missing feature values with averages and standardized all features. To simplify, we selected the top 200 features based on mutual information. The data was split into 124 samples for pre-training (unlabeled), 102 for fine-tuning, and 23 for testing.
Contrastive Learning Framework
Our model uses a three-layer MLP encoder (200→256→128 nodes) with a projection head (128→32). For pre-training, we used NT-Xent contrastive loss (τ=0.1) with augmentations like Gaussian noise (σ=0.05), random masking (10%), and feature shuffling (10%). Fine-tuning involved a linear classifier with cross-entropy loss on frozen embeddings.
Evaluation Protocol
We tested with 10, 25, and 50 samples per class, measuring accuracy, F1-score, and AUC-ROC. We compared against SVM, Random Forest, and Logistic Regression using 5-fold cross-validation. The dataset was de-identified and exempt from Institutional Review Board review.
3. Results
Our framework achieved 54.67% accuracy (95% CI: 46.2–63.1%) and 52.86% AUC-ROC. Traditional methods outperformed it (SVM: 86.96%, Random Forest: 78.26%). Performance improved with more labeled data: 10-shot (45.45%), 25-shot (54.00%), and 50-shot (54.00%), showing stability.
Key features for PD detection included TQWT entropy measures, nonlinear dynamics (RPDE, DFA), and Teager-Kaiser energy operator features. PCA visualization showed moderate class separation (82.6% linear separation score), indicating the model learned useful patterns despite the small dataset.
4. Discussion
The modest accuracy (54.67%) highlights the challenge of using contrastive learning on small medical datasets. Traditional methods performed better, likely because they’re less data-hungry. Still, our model identified clinically relevant speech markers, consistent with known PD speech patterns.
This study is the first to apply contrastive learning to PD speech analysis. It introduced custom speech feature augmentations, integrated wavelet and nonlinear features, and thoroughly tested few-shot scenarios. These advancements could enable scalable PD screening in settings with limited data.
However, the small dataset and class imbalance (more PD than healthy samples) limited performance. Future work should: (1) test on larger, multi-center datasets, (2) use transfer learning from healthy speech data, (3) create medical-specific augmentations, and (4) combine speech with other clinical data.
5. Conclusion
This study lays a foundation for detecting Parkinson’s Disease early using contrastive learning with minimal labeled speech data. Despite moderate performance due to dataset size, the approach identified key PD speech markers and shows promise for scalable screening. Future efforts should focus on larger datasets and tailoring the method for medical applications.
References
[1] J. R. Duffy, Motor Speech Disorders: Substrates, Differential Diagnosis, and Management. Elsevier, 2013.
[2] M. A. Little et al., “Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease,” IEEE TBME, vol. 56, no. 4, pp. 1016–1022, 2009.
[3] A. Tsanas et al., “Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease,” IEEE TBME, vol. 59, no. 5, pp. 1264–1271, 2012.
[4] T. Chen et al., “A simple framework for contrastive learning of visual representations,” ICML, 2020.
[5] D. Biswas et al., “Parkinson’s Disease Speech Signal Features Dataset,” UCI Machine Learning Repository, 2020.
[6] J. C. Vasquez-Correa et al., “Parallel representation learning for the classification of pathological speech,” Interspeech, pp. 510–514, 2018.