What are the top data mining startups

Software infrastructure

Data mining: close connection with predictive analytics

Data mining and predictive analytics are often used synonymously. Indeed, data mining methods and tools play an essential role in predictive analytics solutions; Predictive analytics goes beyond data mining and uses other methods such as machine learning, elements of game theory or simulation processes. Predictive analytics also uses text mining, algorithm-based analysis methods, to find out structures from unstructured text data (articles, blogs, tweets, Facebook content, etc.).

What is data mining? Data Mining tries to use sophisticated statistical and mathematical processes or algorithms to identify hidden patterns, trends and relationships in large amounts of data. Classic data mining methods include, for example

• Clustering: This is about segmenting data and forming different groups (e.g. customers according to income levels)

• Classification: The groups / classes are already specified here. Data elements are automatically assigned to the various classes (for example high-turnover and low-turnover branches). Decision tree analysis is also part of the classification.

• Regression analysis: relationships between (several) dependent and independent variables are identified (for example: product sales depend on product price and customer income).

• Association analysis: search for patterns in which one event is linked to another event; the dependencies between the data sets are described using if-then rules (for example, if a customer buys coke, he also buys pretzel sticks).

Data mining also uses neural networks that are similar to the way the human brain works and that learns existing structures or patterns over many data runs. Data mining is therefore closely related to machine learning, i.e. applications and methods in which computer programs independently acquire new knowledge. While data mining focuses on finding new patterns that are already present in the existing data, machine learning is about deriving new calculation functions from existing data. Here the algorithms are trained in such a way that they learn from the available data, independently generate a data model and use this for forecasts or decisions. Example: An insurance company uses historical cancellation data to create a model that predicts customers who could also cancel in the future.