What are the Big Data Challenges in Insurance and Banking?

Cover banking

What are the Big Data Challenges in Insurance and Banking?

Share on twitter
Share on linkedin
Share on pinterest
Share on reddit
Share on pocket
Share on email

The emergence of Big Data has deeply transformed the way companies optimize their data every day, especially in the insurance and bank sectors. The goal is to analyze the data in order to minimize the financial risks they are used to take repeatedly. A few explanations will follow. Today’s fierce competition in the banking sector, the decreasing yields and the increasing control are leading the banks to change their approach to data and clients.

Lower Churn Rate

In those times of banks facing a very strong competition, analyzing customers’ behaviors thanks to data seems crucial: wages, trends, consumption….Banks are now able to offer their customers personal proposals with a great additional value, aiming to lower their churn rate. In this context, when a big number of clients is blowing, this is the essential process to do. Thanks to the use of Big Data, insurances and banks’ churn rate tend to break down while their capture rate is increasing.

Minimizing Financial Risks

A deep and large-scale data analysis can also give banks the keys to minimize their financial risks. The predictive models the Data Scientists created will allow banks to determine risky assets and then adapt it to the risk premium and performance rate they need. In the banking and insurance sectors, data have to be exploited in regards of the law to keep the personal details of customers safe.

Personalizing the Offer

Insurances need even more often to use Big Data to have accurate knowledge about their customers: sociodemographic features, consumption habits, practice of a sport, driving behaviours, health condition…In order to be aware in real time of their client’s habits and determine those who are more likely to leave.

Data Analysts have great technical skills applied to the jobs they are targetting: their 360 degree views help them determine “moments of life” of customers to offer personal insurance policies at the right time. Analyzing data also helps prevent clients from risky situations and make them loyal in a long-term relationship. The Individualization of their offers may also get to a better customer satisfaction. Obviously, it is clear that thanks to a deep data analysis, insurance companies and banks are now able to get to know their clientele and update their offer.

Saagie and Caisse d’Epargne : How to Fight Churn Rate?

Our work with Caisse d’Epargne aimed to understand a recurring problem facing by CEN: churn rate among 15-26-years-ol people. Thanks to the large storage capacity of our platform, we were able to work within a very short time and on big volumes: our experts analyzed more than a billion of lines (ie 800 billions of operations, 200 billions of profiles…). After a data pre-treatment, one day was enough to create and design the first model. Our Data Scientists used several different technologies to gather and download, analyse the data and visualisation: Sqoop, Talend, Impala, Python

Then they tested and mixed several algorithms (such as Random Forest, Gradient Boosting and ACP) to balance the lack of data on churners who actually represented a very low percentage compared to the rest of the clientele. The data analysis of customers helped to segment the clientele depending on their actions and bank transfers, in order to accurately target clients who tend to leave, and so offer them better proposals and optimize the clientele service.

With the Saagie Big Data platform you can easily and quickly manage your data: from collecting to visualisation, our experts and the best new technologies are in your hands. You will easily handle (in the best way possible) your own data, including the most sensitive of them on our platform.

Smart Data in Banking and Insurance

Net banking income and customer profitability are now key goals to banks and insurance companies as interest rates are low. In this context, data is now considered as a meaningful lever to get to know customers better and increase business efficiency. Being able to anticipate claims or daily life events like moving, having a child, retiring are as many opportunities to offer new services and develop both cross-selling and up-selling.

Another great challenge ahead is churn rate. If customers cut ties with a firm or the service they offer, it is said that they have “churned”. Plenty of reasons can cause customers to churn:

  • Sociodemographic causes: social, marital, professional conditions
  • Psychographic causes: way of life, values
  • Life events: buying a car, moving, getting married…
  • Uninterest or dissatisfaction towards subscribed services or customer relation

Several situations at the time can cause churn, which makes it hard to measure. To exactly understand why customers left is the hardest part, and there are 2 steps to find out. Before that, it is necessary to define which customers churn (which means having an efficient way to measure churn rate) and to have a learning sample with clean data.

The first unsupervised step consists in separating churners from the others. This segmentation usually starts with dimensional reduction using PCA (Principal component analysis) or MCA (Multiple Correspondence Analysis), followed by partitioning operations (K-Means, Kohonen maps) and/or hierarchical clustering in order to lower the number of features. Clusters each representing different churner types then appear clearer.

The next thing to do is usually to individually use scoring techniques (supervised). It is now easier to confront logistic regression and discriminant analysis traditional methods to more precise ones like random forests or gradient boosting. It is still important to keep an eye on the churners / not churners distribution in the learning base. If it is not balanced, one class SVM models can be used.

All the methods discussed above were only theories a few years back. They now can be used thanks to customer data sources plurality and the frameworks getting more and more powerful to execute complex algorithms.