An intro to business analytics
• • ☕️ 7 minute readIn this article I want to lay the basis for some upcoming lessons and tutorials on Data Science. I overview all the topics that will be covered in the series and shed some light on different technical terms and other tricks data scientists tend to pull out of their wizard hat. We start with Analytics which is a catch-all term covering a wide variety of data processing techniques. It is a toolbox containing a variety of instruments and methodologies allowing users to analyze data for a diverse range of well-specified purposes. Closely related is Business Intelligence, which provides insight by customized reporting. It is an umbrella term that includes the applications, infrastructure, tools and best practices that enable access to end analysis of information to improve and optimize decisions and performance.
The difference between data and information is that data fundamentally is comprised of zeroes and ones, and information implies in addition a certain utility or value to the end user or recipient.
When we take a closer look at the hocus pocus of analytics, we can clearly draw a line inbetween:
- Predictive Analytics: Which is based on observed variables, the aim is to accurately estimate or predict an unobserved value.
- Descriptive Analytics: Which aims at identifying specific types of patterns.
- Clustering aims at grouping entities of similar nature.
- Association Analysis aims at finding groups of events that frequently co-occur.
Whereby:
Predictive:
- Classification
- Regression
- Survival Analysis
- Most of the deep learning models
Descriptive:
- Clustering
- Association Analysis
- Sequence Analysis
- Text Mining
- Encoding/Decoding
It is important to note analytical techniques apply to structured data. Rows are typically called observations, instances, records. Columns are typically called variables, predictors, characteristics, attributes and features. Specialized techniques exist to deal with unstructured or semi-structured data such as texts, graphs, etc. Given that roughly 90% of all data is unstructured, there is clearly a large potential for these types of analytics to be applied in businesses.
Profit Driven Business Analytics
It is clear analytics is to be adopted in business for better decision making, striving for the optimal in terms of maximizing net profit/value resulting from decisions made on insights obtained from the data analysis. It facilitates optimization of the fine granular decision-making activities leading to lower costs or losses and higher revenues and profit.
The quality of data-driven decision making depends on the extend to which the actual use of the predictions, estimates or patterns is accounted for in the development and application of the analytical approaches. The actual goal, that is to generate profits, should be central when applying analytics. There is a tangible difference between a statistical approach to analytics and a profit driven approach.
Data scientists tend to be pragmatic when designing analytical models, as from a statistical perspective no differentiation is made between both high-and low value customers, but from an analytical perspective the aim would be to steer or tune the predictive model so it accounts for value.
An additional difference concerns the choice between explaining and predicting. The aim of estimating a model may be either of these two goals:
- To establish the relation or detect dependencies between different predictors and a target variable.
- To estimate or predict a target variable as a function of different predictors.
In applications where the aim is to predict, we are essentially not interested in what drivers explain how to realize a target variable of certain value. We mainly wish to predict as accurately as possible. This is in many business settings the case.
Predictive Model | |
---|---|
Classification | A classification model partitions observations in sets based on the target variable. |
Regression | A regression estimates a continuous target variable. |
Survival Analysis | Survival Analysis, in comparison with classification, is mainly concerned with when the event will occur rather than whether it will occur. |
Forecasting | Forecasting or time series modeling techniques allow an accurate prediction of the short-term evolution of demand based on historical demand patterns. |
Descriptive Model | |
---|---|
Clustering | Clustering facilitates automated decision making by comparing a new transaction to clusters or groups of historical transactions |
Association Analysis | Often applied for detecting patterns within transactional data. |
Analytical Process Model
- Identifying Business Problem
- Identifying Data Sources
- Data Selection
- Data Preprocessing (Cleaning, Transformation)
- Optional: Feature Generation
- Optional: Hyper-parameter Optimization
- Analyze the data with models
- Interpret, Evaluate & Tune
- Deploy the model
The objective of applying analytics needs to be unambiguously defined. Defining the perimeter of the analytical modeling exercise requires a close collaboration between data scientists and business experts. Next all source data that could be of potential interest need to be identified. The golden rule is: the more data, the better!
Basic exploratory analysis can than be considered using OLAP facilities for multidimensional analysis, followed by a data-cleaning step to get rid of all consistencies/redundancies. Additional transformations may also be considered such as binning, alphanumeric to numeric coding, geographical aggregation etc. , as well as deriving additional characteristics that are typically called features. These steps are the most time-consuming and can take up to 80% of the work.
In the analytics step, an analytical model will be estimated/trained. Machine learning models often need (hyper-parameter) tuning to increase their performance. Once the results are obtained, they will be interpreted and evaluated by Business Experts. The key is to find unknown yet interesting and actionable patterns that can provide new insights into your data that can then be translated into new profit opportunities.
Once the model has been validated and approved, it can be put into production as an analytics application, Decision Support System or Scoring Engine. The process model is iteratively in nature in the sense that one may have to return to previous steps during the exercise.
Analytical Model Evaluation
Before adopting an analytical model and making operational decisions, the model needs to be thoroughly evaluated. Depending on the exact type of output, the setting or business environment and the particular usage characteristics, different aspects may need to be assessed during evaluation in order to ensure the model is acceptable for implementation.
A number of key characteristics of successful business analytical models are defined and explained in the following table:
Characteristics | |
---|---|
Accuracy: | Refers to the predictive power or the correctness of the model. Several evaluation criteria such as hit rate, lift, AUC may be applied to assess the model. It may also refer to the statistical significance, the underlying data needs to be robust and not a consequence of coincidence. We need to make sure the model generalizes well and is not overfitted to the historical dataset. |
Interpretability: | This aspect involves a certain degree of subjectivism, since interpretability may depend on the user’s knowledge and skills. The interpretability depends highly on the models’ format. Models that allow the user insights in how it obtained certain results are called white box models. F.e. decision trees, linear regression, etc. Other models such as random forests and neural nets are called black box models. |
Operational Efficiency: | Refers to the time it takes to make a business decision based on the model’s outcome. Crucial for certain business applications such as fraud detection and other banking systems. OE also entails the efforts needed to construct the complete analytical process model. |
Regulatory Compliance: | A model should be in line and comply regulatory standards. |
Economical Cost: | Developing, implementing, deploying and maintaining the model involves significant costs to an organization. External data may be purchased, cloud computing resources may incur large costs etc. |
We covered the basis of Data Science, in the next article I will shed some light on the analytical techniques and how we data engineers put them into practice.
References
[1] Applied Business Analytics: Integrating Business Process, Big Data, and Advanced Analytics; FT Press Analytics; 1st Edition; Nathaniel Lin; ISBN 978–0133481501
[2] Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking; 1st Edition; Provost et al.; ISBN 978–1449361327
[3] Profit Driven Business Analytics; Wiley; 1st Edition; Wouter Verbeke; EAN 9781119286554