Automatic Bank Fraud Detection Using Support Vector Machines
Presented by
Kanika Chopra, Yush Rajcoomar
Introduction
Automatic Bank Fraud Detection Using Support Vector Machines is a paper written by Djeffal Abdelhamid, Soltani Khaoula, and Ouassaf Atika in 2014. This paper proposes data mining methods to obtain relevant information for various fraudulent activities and the hybridization of supervised and unsupervised algorithms to improve fraud detection. Fraud detection is very important for financial companies and has become increasingly common due to e-commerce. The previous institutional methods such as PINs, passwords and identification systems are no longer sufficient; thus, requiring a more data-driven approach. In this paper, credit card fraud, money laundering and mortgage fraud are discussed with regards to data mining and for each a support vector machine (SVM) variant is proposed. The data is used to first distinguish between normal and fraudulent activities using binary SVM (supervised learning). In the cases where data to identify fraudulent transactions is not available, a single-class SVM model is used to obtain the decision boundary distinguishing the two. The normal transactions are investigated further to observe strange trends as shown in Figure 1 (unsupervised learning). These methods were then tested on various bank databases to determine how effective they were in detecting fraud.
Figure 1: Method using Supervised and Unsupervised Learning
Previous Work
Support vector machines have been a robust and reliable approach to statistical learning for several years. As financial institutions have had access to more data, they have been able to utilize this to predict patterns of behaviour that have a higher probability of being fraudulent. Different methods have been used to tackle this problem impacting financial institutions such as: Bayesian Networks[1] , K Nearest Neighbours [2] and Artificial Neural Networks [3]. Throughout the literature, there are two approaches for fraud detection:
1. Supervised Learning
2. Unsupervised Learning
With supervised learning, models are constructed based on legitimate transactions and fraudulent transactions. Examples of a supervised learning approach would be Bayesian Networks and Support Vector Machines. However, fraudsters are able to bypass security and prevention methods and hence, unsupervised methods are used to identify abnormal approaches and unusual transactions from the normal transactions. Examples of Unsupervised methods would be k-nearest neighbours and self-organizing maps [4].
Motivation
The identification of behavioural patterns for fraud detection is not an efficient and effective approach as fraud techniques are updating rapidly; therefore, making current models obsolete.
Therefore, single-class SVM [5], an unsupervised algorithm, is used to learn a decision function for novelty detection. This classifies new data as similar or an anomaly based on the training set. Binary SVM, a supervised algorithm, is used to find a separating hyperplane between the fraudulent and non-fraudulent classes. The proposed method is a combination of the supervised and unsupervised learning to flag fraudulent behaviour. The supervised component will learn from previous transactions and the unsupervised component will detect strange behaviour.
Model Architecture
Data
The results are based on three databases with normal transactions: General Ledger, Payables Data and Revenue Data corresponding to credit card fraud, money laundering and mortgage fraud. These datasets do not provide indicators of whether transactions or users are fraudulent. In addition, the German and Australian databases of credit cards were used in order to obtain data for the binary SVM; these datasets included fraudulent transactions.