Temporal-Transformer Framework for Credit Card Fraud Detection: Leveraging Sequential Patterns in Highly Imbalanced Data (2022)

Author:
Mohammad Motaghianfar

Abstract

Credit card fraud costs billions yearly and is tough to catch due to rare cases and sneaky, evolving tactics. Most detection methods treat each transaction separately, missing patterns that unfold over time. Our study introduces a Transformer-based model that analyzes sequences of transactions within time windows. Using a dataset of 284,807 European cardholder transactions (0.17% fraudulent), we ensured no data leakage with a time-based split. The model uses multi-head attention and focal loss to handle the imbalance. It scored an impressive ROC-AUC of 0.9970 and perfect recall (1.000), catching all fraud cases in testing. Key features (V10, V4, V12) drove predictions, and time-based patterns proved vital. This framework boosts fraud detection with efficiency and sets the stage for adding network-based analysis in the future.

Keywords: Fraud Detection, Transformer Networks, Temporal Sequences, Class Imbalance, Financial Security, Anomaly Detection

1. Introduction

Credit card fraud drains over $28 billion globally each year, with fraudsters constantly finding new ways to dodge detection systems. Traditional tools like logistic regression or random forests struggle to spot complex, time-linked fraud patterns. Most methods look at transactions one by one, ignoring how fraud often shows up as coordinated moves across multiple transactions.

This study fills that gap with a Transformer model that tracks transaction sequences over time. We believe this approach will outperform older methods by catching time-based patterns. Our goals are to: (1) apply Transformers to financial transaction sequences, (2) use a time-sensitive setup to avoid data leaks, and (3) show how time patterns improve fraud detection.

2. Methods

Data/Materials

We used the Credit Card Fraud Detection dataset from Kaggle, with 284,807 transactions from European cardholders in September 2013. It has 28 anonymized PCA features (V1–V28), timestamps, amounts, and fraud labels (492 fraud cases, 0.17%). The extreme imbalance makes detection challenging.

Preprocessing/Procedures

We sorted transactions by time to keep their order intact and standardized time and amount features. To avoid leaks, we split the data chronologically: 80% for training, 20% for validation. For each transaction, we built a sequence of all transactions in the prior 24 hours, handling varying lengths with dynamic padding.

Models/Algorithms/Tools

Our Transformer encoder has:

30 input features, 64 hidden units, 4 attention heads, 2 encoder layers.
Learnable positional encoding.
Focal loss (α=0.75, γ=2.0) to tackle imbalance.
We trained with the Adam optimizer (learning rate 0.001) and step decay scheduling.

Evaluation Metrics

We focused on AUC-PR and ROC-AUC due to imbalance, plus precision, recall, and F1-score. AUC-PR was key for evaluating performance on rare fraud cases.

3. Results

Our Transformer model excelled with an ROC-AUC of 0.9970, far above the 0.95 benchmark, and achieved perfect recall (1.000), catching every fraud case in the validation set. Despite the 0.4% fraud rate, it scored a solid AUC-PR of 0.6625. Precision was low (0.004) due to the challenge of avoiding false positives in such imbalanced data.

Features V10, V4, and V12 were the most predictive, aligning with fraud detection insights. Temporal features mattered moderately, supporting the sequence approach. Probability distributions showed clear separation: fraudulent transactions averaged 0.408, legitimate ones 0.012.

4. Discussion

Our Transformer model shines at spotting fraud by analyzing transaction sequences, with perfect recall ensuring no fraud cases are missed—a big win for financial systems. The high ROC-AUC (0.9970) shows strong separation of fraud from normal transactions, though the moderate AUC-PR (0.6625) highlights the difficulty of minimizing false positives in rare cases.

This study introduces three advances: (1) using Transformers for financial sequences, (2) proving time patterns alone can catch fraud effectively, and (3) laying groundwork for combining with network-based models. Financial institutions could use this to spot coordinated fraud attacks in real time, thanks to its ability to handle varying sequence lengths.

Limitations include the dataset’s age and the model’s focus on sequences alone. Future work should add network connections (like shared accounts), adapt for live data streams, and improve handling of extreme imbalance.

5. Conclusion

The Temporal-Transformer model marks a leap forward in fraud detection, using time-based patterns to achieve perfect recall and high ROC-AUC. It’s a practical tool for catching fraud and a starting point for future hybrid models. Next steps include adding graph-based analysis, testing on real-time data, and refining methods for imbalanced data. This approach could also work for other time-based anomaly detection tasks.

References

[1] Nilson Report. (2023). Global card fraud losses. Nilson Report, 1250, 1–4.
[2] Bhattacharyya, S., et al. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3), 602–613.
[3] Whitrow, C., et al. (2009). Transaction aggregation as a strategy for credit card fraud detection. Data Mining and Knowledge Discovery, 18(1), 30–55.
[4] Dal Pozzolo, A., et al. (2015). Credit card fraud detection: A realistic modeling and a novel learning strategy. IEEE Transactions on Neural Networks, 29(8), 3784–3797.
[5] Jurgovsky, J., et al. (2018). Sequence classification for credit card fraud detection. Expert Systems with Applications, 100, 234–245.
[6] Credit Card Fraud Detection. (2018). Kaggle Dataset. https://kaggle.com/datasets/mlg-ulb/creditcardfraud
[7] Lin, T. Y., et al. (2017). Focal loss for dense object detection. IEEE ICCV, 2980–2988.
[8] Saito, T., & Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot. BMC Bioinformatics, 16(1), 1–8.
[9] Wang, C., et al. (2020). Attention-based transaction aggregation for fraud detection. ACM SIGKDD, 1–9.
[10] Bahnsen, A. C., et al. (2016). Cost sensitive credit card fraud detection using Bayes minimum risk. IEEE SSCI, 1–8.
[11] Zheng, L., et al. (2018). A hybrid deep learning model for credit card fraud detection. IEEE Access, 6, 56986–56994.

My Research Projects