Machine learning – an approach to fraud detection and protection

This article is a continuation of “The rise of machine learning and artificial intelligence in fraud detection” where we aim to share more things you should know about these emerging techs.

Machine learning is being used at many levels in the online fraud detection market. Some solutions are designed to run alongside existing capabilities, taking in structured and unstructured data to identify anomalies, while others are designed to provide a score and information codes that can be used by a real-time policy and decision engine.

A machine learning solution needs access to a big store of historical data to train its models and increase the probability that it will uncover patterns of new suspicious activity. This technology has the potential to fight card-not-present fraud, chargebacks, account takeover, transaction laundering, and more. Also, machine learning is implemented in solutions such as device assessment, passive behavioural biometrics, bot detection, phone printing, and voice biometrics.

With the waves of new and evolving fraud, Gartner has observed the increasing need of financial institutions and enterprise-scale merchants for rapid and complex risk decisions, and businesses are turning to machine learning to gain the ability to make rapid and effective risk decisions. However, with the increased number of machine-learning systems, clients are demanding explanations, as well as decisions, with the aim of:

controlling the machine – a model that explains its logic empowers security managers to adapt the model to evolving fraud patterns with more speed and accuracy;

auditing the machine – financial institutions and large merchants operate in highly regulated environments. These organisations need to provide trails of explanations for compliance, to demonstrate that the basis for their decisions is lawful and ethical;

trusting the machine – a system is only as powerful as the decisions we entrust it to make. How can we trust that the machine is finding the delicate balance between good risk management and good CX?

To achieve these goals, Gartner suggests that businesses should ensure that each model they develop incorporates a capability to explain and, moreover, has a loop that provides feedback on the quality of the explanation. The second method is to develop two systems – one that makes decisions and another that takes the input from the first system and generates an explanation.

Here are some types of machine learning that can be deployed:

Deep Learning – is a class of machine learning algorithms that use a cascade of multiple layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input. These algorithms learn in supervised (eg classification) and/or unsupervised (eg pattern analysis) manners and understand multiple levels of representations that correspond to different levels of abstraction; the levels form a hierarchy of concepts.

Ensemble Learning – ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.

Unsupervised Learning – does not require outcomes, so it can learn without waiting for the completion of a three-month chargeback reporting cycle, for example. This type of learning often relies on clustering, peer group analysis, breakpoint analysis, or a combination of these. This enables fraud prevention solutions to detect patterns and anomalies rapidly within extremely large sets of data.

Supervised Learning – uses outcome-labelled training data sets to learn. Models include neural networks, Bayesian classifiers, regression, decision trees, or an ensemble combination. Massive amounts of data run through defined models to assess risk outcomes.

The power of supervised and unsupervised machine learning

There are two approaches that are used mostly by fraud prevention vendors – supervised and unsupervised learning, the former approach being the most common and widespread.

Maxpay explains briefly how these systems interact to identify anomalies (outliers). With the supervised approach, in the beginning, a risk analyst creates a machine learning model based upon historical data. Afterwards, with new transaction data, the algorithm creates potentially right baskets: fraud and not fraud. After that, the system collects external signals such as fraud alerts, chargebacks, complaints etc. Based on that information, the algorithm starts looking for new unrecorded dependencies. Finally, the model starts retraining. Consequently, all the risk analysts are one step behind the game, thus the cycle continues, and in time new techniques emerge.

Otherwise, unsupervised learning is regarded as an alternative to supervised learning. These algorithms infer patterns from a dataset without reference to known or labelled outcomes. Unsupervised learning allows risk analysts to approach problems with no exact idea about what the result will look like. One can derive structure from data where they don’t necessarily know the effect of the variables. With unsupervised learning, there is no feedback based on the prediction results. But it can divide data on the basis of anomalous behaviour and, afterwards, risk analysts can apply well-known supervised approaches to this data.

Therefore, unsupervised machine learning is more applicable to real-world problems and can help solve them when risk managers are one step behind the fraudsters.

As fraud prevention services use both rule-based and machine learning approaches, including unsupervised techniques, we should also consider that there is a significant difference between fraud detection systems that directly use machine-learning systems and those that are essentially static, rule-based systems. Characteristics of the former type include flexibility in response to new fraud attack patterns. The latter type benefits from keeping a human element in the change control process, which makes it more resistant to skilfully crafted attacks that try to poison the model.

Some banks, merchants, retailers have traditionally relied upon rules-based fraud detection systems in order to counter threats, such as leveraging weak points through coordinated attacks, but fraud advancements have outpaced the capabilities of these systems.

According to Feedzai, rules-based systems tend to be either too broad or too narrow in scope to adequately address fraud attack vectors, requiring financial institutions to combine multiple solutions into a single system to cover their bases.

Surely, machine learning does not replace rules completely, but it complements them to expand the capabilities of the risk management platform. Thus, when applied to large datasets, like those found in account opening analyses, these algorithms can pinpoint surprising and unintuitive fraud signals.

About Mirela Ciobanu

Mirela Ciobanu is a Senior Editor at The Paypers and has been actively involved in covering digital payments and related topics, especially in the cryptocurrency, online security and fraud prevention space. She is passionate about finding the latest news on data breaches, machine learning, digital identity, blockchain, and she is an active advocate of the need to keep our online data/presence protected. Mirela has a bachelor degree in English language and holds a Master’s degree in Marketing.