Data Mining

front page image

Data mining is a analytical process designed to explore large amounts of data in search of consistent patterns and systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data.

The main goal of Data Mining is prediction and predictive data mining is the most common type of data mining and one that has the most direct business application.

What Can Data Mining Do

Companies in a wide range of industries are already using data mining tools and techniques to take advantage of historical data. By using pattern recognition technology and statistical and mathematical techniques to sift through warehoused information, data mining helps analysts recognise relationships, trends patterns, exceptions and anomalies. For Businesses, data mining is used to discover sales trends, develop smarter marketing campaigns, and accurately predict customer loyalty.

Specific Uses Of Data Mining Are

  • Market segmentation – Identify common characteristics of customers who buy the same products from your company.
  • Fraud detection – Transactions which are most likely to be fraudulent.
  • Direct Marketing – Which prospects should be included in a mailing list to obtain highest response rate.
  • Interactive Marketing – Predict what each individual accessing a web site is most likely interested in seeing.
  • Market Basket analyses – Understand what products or services are commonly purchased together, eg Bread, butter.
  • Trend Analyses – Reveal the difference between a typical customer this month and last.

Data Mining Process


Problem Definition

Data mining projects are often structured around specific needs of an industry sector, or even tailored and built for a single organisation. A successful data mining project starts from a well defined question or need.

Data Gathering and Preparation

Data gathering and preparation is about constructing a dataset from one or more data sources to get familiar with the data. Data preparation is usually a time consuming process and prone to errors.

Model Building And Evaluation

Predictive modelling is the process by which a model is created to predict on outcome. If the outcome is categorical it is called classification and if the outcome is numerical it is called regression.

Descriptive modelling or clustering is the assignment of observations into clusters so that observations in the same cluster are similar. Association rules can find interesting associations amongst observations.

Knowledge Deployment

The knowledge gained will need to be organised and presented in a way that the customer can use it. It will be mainly up to the customer to decide and carry out the deployment steps.


  1. Data Mining Tools
  2. Programming Language (Java, C, VB, R)
  3. Database SQL Script
  4. PMML (Predictive Model Markup Language)

Data Mining Algorithms

There are many different Algorithms that Organisations can use in predictive modelling I am going to list just a few of them.

Decision Trees

Decision trees are commonly used in data mining with the objective of creating a model that predicts the value of a target dependent variable based on the values of several input independent variables. The structure of the decision trees reflects the structure that is possibly hidden in your data.

Clustering – the K-mean Algorithm

Process of grouping the data into classes or clusters so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. There are a large number of Clustering Algorithms.

Association Analysis

Association Analysis is the task of uncovering relationships among data.

Association rules – It is a model that identifies how the data items are associated with each other. It is used in retail sales to identify what are frequently purchased together. It is sometimes referred to as the Market Basket Analysis.


Leave a Reply

Your email address will not be published. Required fields are marked *