Test Saagie in a few clicks with our interactive demo!

logo saagie red
illustration blog mining data

What is Data Mining ?

Data mining is booming since a few years. It consists in extracting and processing data, to transform them in useful information. This tool allows the finding of several shared structures or patterns among the data, thus giving the links between different phenomenon to foresee tendencies.

Data mining uses complex algorithms in various fields such as Artificial Intelligence, computer science, or statistics. Data Mining is a sequence of algorithm exploiting Deep data (deep learning, weak signals, and precise data) to find similar patterns in customer relationship for example, inducing more revenues and less spending for the business.

The Origins of Data Mining

Despite his name, created in 1990, the data miner exists since 1960. Businesses used data gathering and processing technologies a long time before Big Data was born. However, it has seen a huge increase the last years. A few years ago, business used a wide range marketing strategy, but nowadays, they opted for a more chirurgical approach, and big data can help, as it allows data miners to better anticipate customer’s needs and wants or mechanical failure shall we about predictive or preventive maintenance. Machine learning, read: statistical learning, wouldn’t have gained such a fame without the DataWareHouses to DataLakes transitioning, DataLakes which are more efficient and efficient. Those big-sized storage facilities act like data bases, and are used to gather, pick and store all the data from big data platforms.

How to datamine?

Data mining was born from Machine Learning. Machine learning is the way an Artificial intelligence can learn from each of its failures and errors, as a human would do. Usually we categorize five types of data mining: Association, classification, sequential analysis (which all use patterns), clustering (regrouping data by kind) and prediction (predictive analysis). Here you can find some data mining techniques: Incomplete data comparison, consisting in analyzing several data sets, comparing them to complete data sets and try to fill the voids.

Text analysis, often used in big data platforms combined with deep learning. It permits to automatically detect patterns in any given text. It is often used by universities to detect plagiarism. Algorithmic genetic, based on natural selection, mutations, and genetic recombination-like algorithms. Decisional trees, illustrated by a complex structure of choices and decisions used to classify data.

Usecases and Jobs in Data Mining

Machine Learning has several significant advantages in various fields. As for most inventions, data mining was first used by businesses to maximize profits through customer-targeting and by analyzing the business in itself. But rapidly, data mining was seen as useful in other fields than marketing.

Retail stores are critical target for big data companies, right after bank & insurance firms. Thus, loyalty programs, are a real database of your alimentary and buying habits. Allowing Target, for example, to use an AI and big data analysis to predict which family was about to have a baby, even before the official announcement, Target was then able to provide special discounts for diapers and baby food to those families.

Another example of big data’s efficacity can be found in the police for example. In many American cities, a big data processing software is used by the authority to predict the probability, time and place of a crime. It allows a predictive intervention.

Plus, big data is creating jobs, as a part of the digital revolution engineers and mathematicians for example are required to expand this new opportunity, thus creating new jobs in those fields. For the moment, Data Mining is cloistered to marketing and advertisement, but we can hope big data will allow extraordinary breakthrough in nuclear chemistry, medical research or even more. Despite all those advantages, let us not forget that as for the internet of things (IOT), that those technologies need massive data gatherings. It could lead to a world where privacy is no more than a dying utopia. Who knows who will have access to your personal data? For sure we know our data will never be completely safe from hackers, government, or a business. Despite all these worries, progress can’t and won’t be stopped.