Automatic Bank Fraud Detection Using Support Vector Machines

Presented by

Kanika Chopra, Yush Rajcoomar

Introduction

Automatic Bank Fraud Detection Using Support Vector Machines is a paper written by Djeffal Abdelhamid, Soltani Khaoula, and Ouassaf Atika in 2014. This paper proposes data mining methods to obtain relevant information for various fraudulent activities and the hybridization of supervised and unsupervised algorithms to improve fraud detection. Fraud detection is very important for financial companies and has become increasingly common due to e-commerce. The previous institutional methods such as PINs, passwords and identification systems are no longer sufficient; thus, requiring a more data-driven approach. In this paper, credit card fraud, money laundering and mortgage fraud are discussed with regards to data mining and for each a support vector machine (SVM) variant is proposed. The data is used to first distinguish between normal and fraudulent activities using binary SVM (supervised learning). In the cases where data to identify fraudulent transactions is not available, a single-class SVM model is used to obtain the decision boundary distinguishing the two. The normal transactions are investigated further to observe strange trends as shown in Figure 1 (unsupervised learning). These methods were then tested on various bank databases to determine how effective they were in detecting fraud.

Figure 1: Method using Supervised and Unsupervised Learning

Previous Work

Support vector machines have been a robust and reliable approach to statistical learning for several years. As financial institutions have had access to more data, they have been able to utilize this to predict patterns of behaviour that have a higher probability of being fraudulent. Different methods have been used to tackle this problem impacting financial institutions such as: Bayesian Networks[1] , K Nearest Neighbours [2] and Artificial Neural Networks [3]. Throughout the literature, there are two approaches for fraud detection:

1. Supervised Learning

2. Unsupervised Learning

With supervised learning, models are constructed based on legitimate transactions and fraudulent transactions. Examples of a supervised learning approach would be Bayesian Networks and Support Vector Machines. However, fraudsters are able to bypass security and prevention methods and hence, unsupervised methods are used to identify abnormal approaches and unusual transactions from the normal transactions. Examples of Unsupervised methods would be k-nearest neighbours and self-organizing maps [4].

Motivation

The identification of behavioural patterns for fraud detection is not an efficient and effective approach as fraud techniques are updating rapidly; therefore, making current models obsolete.

Therefore, single-class SVM [5], an unsupervised algorithm, is used to learn a decision function for novelty detection. This classifies new data as similar or an anomaly based on the training set. Binary SVM, a supervised algorithm, is used to find a separating hyperplane between the fraudulent and non-fraudulent classes. The proposed method is a combination of the supervised and unsupervised learning to flag fraudulent behaviour. The supervised component will learn from previous transactions and the unsupervised component will detect strange behaviour.

Model Architecture

Data

The results are based on three databases with normal transactions: General Ledger, Payables Data and Revenue Data corresponding to credit card fraud, money laundering and mortgage fraud. These datasets do not provide indicators of whether transactions or users are fraudulent. In addition, the German and Australian databases of credit cards were used in order to obtain data for the binary SVM; these datasets included fraudulent transactions.

Results

Conclusion

Critiques

This method results in a higher precision than previous works and using a hybridization method allows to narrow down the datasets that the binary SVM model predicts on. The hybridization technique is also able to be adapted for various types of fraud, such as credit card fraud detection and mortgage fraud detection in which case the abnormal behaviour presents itself differently. However, it is important to note that there is less data that has fraudulent indicators available. In addition to this, there was no clear indication of the distribution of the fraudulent vs. non-fraudulent data. It is likely that these datasets have imbalanced data where the majority of the data is non-fraudulent. Hence, precision would be higher since the model is more likely to predict non-fraudulent data as this is the more common case. In addition to this, the credit score is used to determine precision and measure the efficacy for the hybridization method whereas customer behaviour would be more accurate. Access to more real databases would be ideal and beneficial in determining how successful this proposed method is.

References

[1] Maes, Sam & Tuyls, Karl & Vanschoenwinkel, Bram & Manderick, Bernard. (2002). Credit Card Fraud Detection Using Bayesian and Neural Networks.

[2] Nami, Sanaz, and Mehdi Shajari. “Cost-Sensitive Payment Card Fraud Detection Based on Dynamic Random Forest and k -Nearest Neighbors.” Expert Systems with Applications, vol. 110, 2018, pp. 381–92. Crossref, doi:10.1016/j.eswa.2018.06.011.

[3] Patidar, R. D. and Lokesh Sharma. “Credit Card Fraud Detection Using Neural Network.” (2011).

[4] Olszewski, Dominik. “Fraud Detection Using Self-Organizing Map Visualizing the User Profiles.” Knowledge-Based Systems, vol. 70, 2014, pp. 324–34. Crossref, doi:10.1016/j.knosys.2014.07.008.

[5] Schölkopf, Bernhard & Williamson, Robert & Smola, Alex & Shawe-Taylor, John & Platt, John. (1999). Support Vector Method for Novelty Detection. NIPS. 12. 582-588.

Automatic Bank Fraud Detection Using Support Vector Machines

Contents

Presented by

Introduction

Previous Work

Motivation

Model Architecture

Data

Results

Conclusion

Critiques

References

Navigation menu

Automatic Bank Fraud Detection Using Support Vector Machines

Presented by

Introduction

Previous Work

Motivation

Model Architecture

Data

Results

Conclusion

Critiques

References

Navigation menu

Search