Data Mining Techniques To Detect And Prevent Credit Card Fraud
- Pages: 13
- Word count: 3132
- Category: Credit Card
A limited time offer! Get a custom sample essay written according to your requirements urgent 3h delivery guaranteed
Order NowAbstract
Cashless transactions such as online transactions, credit card transactions, and mobile wallet are becoming more popular in financial transactions now a days. With increased number of such cashless transaction, number of fraudulent transactions are also increasing. Fraud can be distinguished by analyzing spending behavior of customers (users) from previous transaction data. If any deviation is noticed in spending behavior from available patterns, it is possibly a fraudulent transaction. To detect fraud behavior, bank and credit card companies are using various methods of data mining such as decision tree, rule based mining, neural network, fuzzy clustering approach, hidden markov model or hybrid approach of these methods. Any of these methods is applied to find out normal usage pattern of customers (users) based on their past activities. The objective of this paper is to provide comparative study of different techniques to detect fraud. Data mining and machine learning techniques help us to better and deeper understanding of collected data.
Keywords: credit card fraud, fraud detection, data mining
I. Introduction
A lot of study and research has gone into the topic of ‘fraud detection and avoidance’. Banking fraud can be defined as “The unauthorized use of an individual’s confidential information to make purchases, or to remove funds from the user’s account.” As per survey of statista [1] 41% of global internet users have purchased products online in 2013. In 2011, the number of digital buyers worldwide reached 792.6 million. A year later, the number rose to 903.6 million. In 2013, 41.3% of global internet users had purchased products online. In 2017, this figure is expected to grow to 46.4%. In survey by BBC news, Losses from online banking fraud rose by 48% in 2014 compared with 2013 as consumers increasingly conducted their financial activities on the internet. So with increased number of such cashless transaction and online shopping, fraudulent transactions are also increasing. Frauds can be caused by stealing or compromising banking details by email phishing, telephonic phishing, malware, non-secure security details, social networking sites and shoulder surfing. Fraudulent transactions can be detected by either classification approach or by detecting outlying transaction from normal transactions. For classification approach, first model is trained from training data. Features are extracted and transformed from raw data while giving it to train model [15]. In this paper various methods and comparison are given that is used to detect fraudulent transactions. The financial services industry and the industries that involve financial transactions are suffering from fraud-related losses and damages. 2016 Was a banner year for financial scammers. In the US alone, the number of customers who experienced fraud hit a record 15.4 million people, which is 16 percent higher than 2015. Fraudsters stole about $6 billion from banks last year. A shift to the digital space opens new channels for financial services distribution. It also created a rich environment for fraudsters. If earlier criminals had to counterfeit client IDs, now getting a person’s account password may be all that’s needed to steal money. Customer loyalty and conversions are affected in both environments, the digital and the physical. According to Javelin Strategy & Research, it takes 40+ days to detect fraud for brick-and-mortar financial institutions. Fraud also impacts banks that provide online payments service. For instance, 20 percent of customers change their banks after experiencing scams. So, the challenge for industry players is to implement real-time claim assessment and improve the accuracy of fraud detection.
II. Related Work
Meta-learning techniques extend this concept by providing methods for knowledge discovery process automatization. Meta-learning introduces various interesting concepts, including data meta-features, meta-knowledge, algorithm recommendation systems autonomous process builders, etc. All these techniques aim to improve usually expensive and demanding data mining analysis. With the rise of machine learning and artificial intelligence, Researchers in tandem have developed a number of predictive and classification models to predict the fraudulent activity. Right from the supervised learning conventional machine learning models to deep learning, a number of models have been developed. There are many review papers describing types of frauds and different fraud techniques.
The earliest paper to explore the data mining based approach towards fraud detection was by Lu Q, Ju C [1]. Ghosh and Reilly [9] used three-layer feed forward Neural network to detect frauds in 1994. The Neural Network was trained on examples of fraud containing stolen cards, application fraud, counterfeit fraud, Non Received Issue (NRI) fraud, and mail order fraud. Abhinav and Amlan [7] proposed a Hidden Markov Model to detect the frauds in credit cards. Proposed Model does not require fraud signatures and still it can detect frauds by considering a cardholder’s spending habit. This system is also scalable to handle large number of transactions. Y. Sahin and E. Duman [6] proposed approach to detect credit card fraud by decision tree and Support Vector Machine. Performance of classifier models of various decision tree methods (C5.0, C&RT and CHAID) and a number of different SVM methods (SVM with polynomial, sigmoid, linear and RBF kernel functions) are compared in this study. An approach is proposed towards fraud detection in banking transactions in [2] using fuzzy clustering and neural network.
In this approach fraud detection is done in three phase. First phase is initial user authentication and verification of card details. After successfully completing this phase, fuzzy c means clustering algorithm is performed to find out normal usage behavior of user based on past transactions. If new transaction is found to be doubtful in this phase, mechanism based on neural network based is applied to determine whether it was actually fraudulent transaction or not. Kang Fu, Dawei Cheng, Yi Tu, and Liqing Zhang at [3] proposed a convolutional neural network (CNN) based approach to find fraudulent transactions. Convolutional Neural Network is a part of deep learning and is a type of feed-forward Neural Network that consists of more than one hidden layer. In this paper, for finding more complex fraud patterns and to improve classification accuracy, a new feature trading entropy is proposed. To relieve the problem of the imbalanced dataset, cost based sampling method is used to generate more number of frauds. Generally, CNN is used for image recognition, Character recognition, image processing, video recognition and recommender system. In this paper for the first time, CNN is used to detect frauds. Different outlier techniques [13] can also use to differentiate fraudulent transaction as outlier data.
III. Problems With Credit Card Fraud Detection
One of the biggest problem associated with researchers in fraud detection is lack of real life data because of sensitivity of data and privacy issue. Many researchers have done research with real life data [3], [9], [6], [11] of bank with agreements. To deal with this problem, many tools are available to generate synthetic data. Second problem is to deal with Imbalance data or skewed distribution because number of fraudulent transactions are very less compared to legitimate transactions. To overcome this problem, synthetic minoring oversampling methods are used to increase number of low incidence data in dataset that generate synthetic fraudulent transactions related with original data set. In [3], cost based sampling is used to generate synthetic fraudulent transactions to balance data set. Overlapping of data is one more problem as some of transactions look like fraudulent transaction, when actually they are legitimate transactions. It is also possible that fraudulent transactions appear to be normal transactions.
III. Various Techniques To Detect Fraud
A. Decision Tree
A decision tree is a graphical representation of possible solutions to a choice based on certain situations. Decision tree starts with root node, divides into separate branches, these branches are connected with other nodes and so on. Decision tree end up in node called leaf node. Each node in Decision tree represents a test, branches associated with it represents its possible results and a leaf node has a label of class. With this tactical approach of separating and deciding, decision tree usually isolate the complex problem into simple ones. Asimple example of decision tree is shown in figure 5 that distinguishes possibility of transaction being legitimate or fraudulent.
B. Rule based method
Association
Rules are generated to detect fraudulent transactions and normal transactions. In fraud detection, generated rules will be used to classify fraudulent and legitimate transactions. Therefore rules are generated as per behavior. This method is similar to decision tree. Example of these rules may be
R1: (transaction amount = low) ->legitimate
R2: (transaction amount = high) ^ (is cvv verified = yes) -> legitimate
R3: (transaction amount = high) ^ (is cvv verified = no) ^ (income=high) -> legitimate
R4: (transaction amount = high) ^ (is cvv verified = no) ^ (income=low) -> fraudulent Ultimate goal in rule-based method is to mine set of rules.
C. Hidden Markov Model
A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov chain with hidden states. An HMM is a double embedded probability distribution process with hierarchy levels. Fraud detection Approach using HMM is proposed in [7]. They have considered three price ranges low, medium and high {l,m,h} as set of possible observation. For example, letl=(0,100$],m=($100,$500],h=($500,credit card limit]. If a user makes a transaction of $320, then resultant observation symbol will be m. Each transaction amount usually depends on the equivalent type of purchase. The set of all possible types of purchase and the set of all possible lines of business of merchants forms the set of hidden states of the HMM. The proposed approach in [7], Hidden Markov Model (HMM)-based credit card FDS does not require fraud signatures and still it can detect frauds by considering a user’s spending pattern. Figure 1 Probabilistic parameters of a hidden Markov model
D. Neural networks and Deep Neural Networks
Neural Network is a model that allows for determining non-linear relations between the records. The algorithm structure is built on principles close to those of the human brain neurons. The model is trained on a labeled dataset making input data pass through several layers (i.e. Sets of mathematical functions). The models of this type employ 1-2 hidden layers.
Deep Neural Networks works similar to neural networks but employs much more layers than a usual Neural Network. This provides more accurate results, as well as requires more computing power and time for data processing. Deep learning has created a revolution in data science over the past years. Consequently, it also impacted the financial services industry. Currently, neural networks are being applied both for transactional verification and insurance claims. Neural networks and especially deep neural networks are powerful at finding non-linear and very complex relations in large datasets. This works both for transactional data and for text and image analysis, which may be used in insurance cases. They usually provide high accuracy, which makes neural networks a necessary part of a modern fraud detection ensemble. Neural networks are state-of-the-art systems that are very difficult to build and tweak to reach efficiency. They require highly skilled professionals and a powerful computing architecture. For this reason, we don’t recommend using the method for express analysis of all transactions. Another major problem with deep neural networks is the lack of interpretability. While they may be highly accurate, it’s nearly impossible to define how specifically the system arrived at one conclusion or the other.
E. K-Nearest Neighbors
K-Nearest Neighbor is an algorithm which classifies records by similarity based on the distance in multidimensional space. The record is assigned to the class of the nearest neighbors. The record of each cluster is voting for each new record using the distance parameter.
K-nearest neighbors is another common approach used to analyze credit card transactions. The method is insensitive to missing and noisy data, which allows for configuring larger datasets with less preparation. It’s also considered highly accurate and doesn’t require much engineering effort to tweak models. Like neural networks, k-nearest neighbors require powerful infrastructures and they also lack interpretability.
F. Support vector machine
A support vector machine (SVM) is a supervised machine learning model that uses a non-probabilistic binary linear classifier to group records in a dataset. What does it mean? The algorithm divides data into two categories with a clear gap. The division line is defined by making several hyperplanes in the multidimensional space. Then the algorithm selects the hyperplane which separates records better than the other ones.
As some studies show, SVM can be inferior to random forests in credit card transactions with small datasets, but can also approach their accuracy once datasets are large enough. Support vector machines are particularly good at working with complex multidimensional systems. They also allow for avoiding the overfitting problem that random forests may experience. Generally, SVM is a very common method in credit card fraud detection. And the abundance of research work makes adjusting SVM-models for credit card fraud detection simpler for a data science team. The complexity of SVM models will require much engineering effort to fine tune the algorithm and achieve high accuracy. Also, SVMs are very slow and computationally heavy. That’s why they will require powerful computing architecture. The advantages of SVM are highlighted below:
A. Regularization parameter
The SVM’s have a regularization parameter. Regularization parameter is defined asin mathematics and statistics and particularly in the fields of machine learning and inverse problems, is a process of introducing additional information in order to solve an ill-posed problem or to prevent over fitting. Hence while training the models to detect a fraudulent activity; the data scientists need not worry about the phenomena of over fitting which is a major cause of concern for the data scientists.
B. Uses the kernel trick
Because of the use of kernel trick by the SVM, This operation is often computationally cheaper than the explicit computation of the coordinates. Since the dataset used to train the model make use of huge datasets, the use of kernel trick by the SVM helps us in reducing huge computations, thus saving time.
C. Convex Optimization
Since SVM is defined by convex optimizations, there are efficient techniques to solve the problem. A convex optimization problem is a problem where all of the constraints are convex functions, and the objective is a convex function if minimizing, or a concave function if maximizing. To solve such problems, efficient techniques like SMO are already available
G. Random forests
Random forest (or an ensemble of decision trees) is an algorithm which builds decision trees to classify the data objects. The model selects a variable that enables the best splitting of records and repeats the splitting process multiple times. As a result, if we were to visualize how the algorithm works, the image would look like a tree. To make predictions more precise, data scientists train multiple decision trees on random subsets from a general dataset. To decide whether transaction looks like a fraud, trees vote, and the model provides a consensus judgment.
V. Matrics to evaluate system
As the data is highly imbalance, overall accuracy is not appropriate to evaluate model, since with very high accuracy, almost all fraudulent transactions can be misclassified. Precision, recall, f1 score, ratio of true positive, true negative, false positive and false negative are taken into account for evaluating binary classification. True positive (tp) is number of fraudulent transactions that actually predicted as fraudulent one. True negative (tn) is number of legitimate transactions that actually predicted as legitimate one. False positive (fp) is number of legitimate transactions that wrongly predicted as fraudulent one. False negative (fn) is number of fraudulent transactions that wrongly predicted as legitimate one.IV. Discussions
Choosing the right machine learning method depends on the problem type, size of a dataset, resources, etc. A good practice is to use several models to both streamline assessment and achieve higher accuracy. For example, paypal implements express assessment using linear models to separate uncertain transactions from ordinary ones. Then, all transactions that look suspicious are run through an ensemble of three models comprising a linear model, a neural network, and a deep neural network. The three then vote to arrive at the final result with the higher accuracy. As of today, antifraud systems should meet the following standards: detect fraud in real-time improve data credibility analyze user behavior uncover hidden correlations while these qualities can be offered by machine learning algorithms, they have two serious drawbacks to be aware of. They still require large and carefully prepared datasets for training and still need some features of rule-based engines, like checking legal limitations for cash transactions. Also, machine learning solutions usually require substantial data science skills to build complex and robust ensemble algorithms. This sets a high barrier for small and medium companies to use the technique leveraging internal talent. The task requires deep technological and domain expertise. The common practice is to engage third-party data science experts. Data consultancy and engineering services accelerate development, requiring less expenses than building an in-house data science team from the ground up.
VI. Conclusion
In this paper, we bring together various methods to detect fraudulent transactions and comparison of these methods. One of these or combination of these methods can be used to detect fraudulent transactions. New features can be added and various sampling methods can be used to train the model more accurately. The paper explores different fraud detection data mining techniques according to different areas. Data mining is a well known zone of analyzing, predicting and defining rules from the large amount of data and finding true, previously unknown patterns. This research focuses on data mining techniques as impressive approach for fraud patterns detection to curb the fraudulent activities of the fraudster.
But with the advent of neural networks and deep learning, wherein the model learn through data and get better in classification through experience, data mining based fraud detection has grown a lot. Deep learning, which is a process wherein there is more than one layer of neural networks, and each layer is associated with a task, has been in the news for its surprisingly excellent results in classification. Recently a new model of deep learning called gan has come into the scene. Generative adversarial networks are a type of artificial intelligence algorithms used in unsupervised machine learning, implemented by a system of two neural networks competing against each other in a zero-sum game framework gan which consists of two deep learning models fighting with each other in order to get the maximum results is one of the proposed measure to combat the issue.
By applying the gan to the proposed problem, we can combat two issues. One is that, since there are two neural networks fighting with each other to get maximum efficiency in classification, there is no need to re-train the model again and again as in earlier case. Second the efficiency also increases due to the presence of two neural networks. Only downside of the use of such a model would be the time consumption. The time taken by the model is higher than the conventional neural network model. But since, here more than the time, the efficiency of the classification is of utmost importance. Future work is to implement the model suggested above and record the analysis of the classifications with respect to the svm model and the deep learning models