Data and its role in the finance and banking ecosystem

What is the story of Synthesized? What inspired you to start the company? What is the underlying technology?

While pursuing my PHD I identified a fundamental business problem - innovation is impossible without data and I found a new way to share data in a compliant manner.

At Synthesized, my team and I have built a next-generation DataOps platform that enables data driven organisations with a new and secure way of using data when innovating. It automates all stages of data provisioning and curation, enabling the creation of new, high-quality Synthesized data scenarios for test and development purposes. All with zero risk to customer privacy.

The platform provides CTO, CIO’s and CDO’s with a solution to fast-track development in a secure and compliant manner.

Additionally, Synthesized has made accessibility a part of its DNA and recently we launched a free-to-use Community Edition.

You say that your mission is to transform the way our society works with data using AI. What are the challenges in the financial field when it comes to data and how is Synthesized responding to these?

DataOps is probably the most important thing for any organisation looking to break forward in the new world. It’s a fundamental shift in how we use data to drive innovation, to create competitive advantage, to reduce costs, and to make decisions critical to the future of any business over the next five-year horizon. Any company that fails to adopt dataOps is unlikely to survive for long, because today real-time business intelligence is not just a competitive advantage, it is a competitive necessity.

DataOps is about moving forward from the heavily centralised data strategies we’ve been investing in and trying to deliver over the last 5-10 years - what’s happened is all our critical data is still trapped in our data silos, our data warehouses, our data lakes - the vast investments every financial organisation has made in these data strategies simply aren’t generating the ROIs we were promised. While yes, there are pockets of value and success created from these investments, most CTO’s and CDO’s I’m talking with have realised these architectures simply aren’t agile enough to deliver the new paradigm of ‘Data as a Service’.

To deliver this new world of data self-service one would need a couple of things: data needs to be easily found by the teams that need it, for it to be readily available and not suffer from approval delays by internal control functions. It needs to be easily consumed and also flexible so one can shape it for a specific task at hand, whether that’s testing new risk models or collaborating with external partners.

What are the common mistakes that lead to data bias? What are the risks for financial reasons due to biased data?

Compliance regulations significantly challenge how financial organisations can use their customer data. Regulations like GDPR and other more industry specifics, prevent them from sharing their customer data with third parties and, in some cases internally. Additionally, sanitising data is a time-intensive task and without the proper tools it can take months to do it. This results in product teams wasting over 60% of their time to manually create just 20% of possible test cases.

Even when data is sanitised correctly, the dataset itself may be under-representative making it ill-suited to the accurate development and training of AI solutions. This bias within data can result in incorrect outputs at scale, which can negatively affect customers and damage the reputation of the financial organisation.

These data limitations are the most cited major barriers that prevent finance organisations utilising their data assets, with over 60% of company data remaining unused for analytics.

Can you walk us through some examples of data access and curation in the finance and banking ecosystems? What are the biases in banking?

Some of the major challenges organisations face when accessing data is around sanitising it, so it can be used for development purposes and shared with third parties. Research has shown that data teams waste up to 80% of their time finding, collecting, and curating data, while analysts waste up to 40% of their time validating data. As a result of the time and resources it takes for companies to get access to data they can actually use, 65% of organisation’s data is underutilised, which leads to lost revenue and failure to innovate.

Companies often try to bypass data sanitisation and instead use anonymisation techniques, but this can actually put data at risk. Firstly, research has shown the data is never totally anonymous, and secondly, mistakes are easy to make which can impact privacy and lead to the financial organisation falling foul of compliance regulations.

Another challenge organisations face is around bias in datasets. Until five years ago, the topic of data bias was not clearly understood or even investigated. Yet now, as technology has developed, specifically AI, we've seen a profound shift in trying to better assess the potential impact that data could have, both in a positive and negative way. The most significant misconception about AI is that it discriminates. The central point to understand is not that AI tools are biased before even being used, it is that bias arises when the data that is inputted into this technology isn’t representative or is of poor quality. When bias occurs in financial datasets it can lead to unfair decisions being made, which can result in lawsuits against the institution.

The industry therefore needs a new solution, which is completely secure and allows financial organisations to access clean data which complies with data regulations. We designed the Synthesized platform to understand a wide array of regulatory and legal definitions regarding contextual bias, across data attributes like gender, age, race, religion, sexual orientation, and more. The implications on broader society cannot be overstated: unbiased data can be used to create fairer credit ratings and to assess insurance claims more equitably.

Fairness and ethical use of data are essential when it comes to data sharing. Can you tell us how institutions can make sure they comply with regulations while using data to its fullest potential?

Machine learning and AI are increasingly used to assess applicants for a variety of services within finance with advocates highlighting the speed and precision they bring to what have traditionally been time-consuming paper-based processes. However, despite the obvious efficiency gains, there is a rising tide of concern about the possibility for unintentional (and potentially illegal) discrimination through the large-scale deployment of automated decision making. An application can be outwardly unbiased at the design level but if the data used to develop it or to make decisions is imbalanced, the decisions made could themselves have an inherent, unintended indirect discriminatory effect.

What’s more, as many such systems ‘learn’ as they go forward, refining their analysis of data and therefore changing how they make decisions, even a balanced model can have a negative impact over time. An important step to prevent this is to ensure that the datasets used to build new applications are not in themselves contributing to the problem, either through imbalanced data that leads to disproportionate decisions taken against underrepresented groups, or through the use of historical data that contains a pattern of discrimination previously applied to protected groups. It’s important to remember that, as well as taking steps to ensure AI is not harmful, organisations can also choose to create and deploy machine learning solutions that improve outcomes for all members of society.

What is your vision for the future of data?

My vision for Synthesized is to transform the way financial organisations interact with data using AI — and to make data go from being a part of the problem to a part of the solution.

About Nicolai Baldin

Nicolai Baldin is CEO of Synthesized.io, the all-in-one dataOps platform. Nicolai has led the company’s growth from a simple idea to a service being used by tech companies in the UK, Europe, and the US. Nicolai is responsible for the direction and product strategy of Synthesized. He holds a PhD in Machine Learning from the University of Cambridge.

About Synthesized

Synthesized DataOps platform enables data-driven regulated organisations to automate data provisioning for research and development staying compliant with data privacy using AI-curated simulated data streams.

Data and its role in the finance and banking ecosystem – interview with Synthesized