Introduction: In our previous blog, OAC Machine Learning we learned about identifying, loading and preparing data using two different data sources in order to achieve the objective of our model. After loading the data, we also looked into the data enrichment feature called “Recommendations” which is data cleaning efficient. In this blog we will gain deeper insights into our data through exploratory data analysis and as a supplement we will create a Data Visualization project with graphs generated from the Explain feature. Later in the third blog we will create a model using one of the machine learning algorithms and apply it to the test data to predict the customers that are likely to open Term deposits.
- Part 1: OAC Machine Learning – Let’s Get Started & Data Preparation
- Part 2: OAC Machine Learning – Gain Deeper Insights Through Visual Exploration
- Part 3: OAC Machine Learning – Knowing Your Model Performance, Simplified
Objective: We will gain deeper insights fromour data through exploratory data analysis which summarizes the main characteristics, correlations and anomalies of the data. Also as a supplement we walk through the various types of visualisations and create a Data Visualization (DV) project with the graphs generated from the Explain feature.
We will cover “Part 2: OAC Machine Learning – Gain Deeper Insights Through Visual Exploration” in this section
Exploratory Data Analysis
Exploratory data analysis (EDA) is an approach to analysing data sets that summarizes the main characteristics, often with visual methods. In summary, EDA can show us hidden relationships and attributes present in our data even before we throw it at a Machine Learning (ML) model. To discover quick visual insights in our data, we can start off by leveraging this built-in feature.
We will continue to build on the csv file that we loaded in our previous blog and we will create a project on top of it.
- Click on ‘Create Project’ on the top right corner for the uploaded csv file
- You will be brought to the project canvas where you can drop the attributes to create visualisations.
- Save the project by giving a name
- Right clicking on the attribute will introduce you to the Explain feature. Let’s use this feature on the ‘Outcome of the marketing event’ attribute of our dataset.
- Explain feature provides the below information on the column ‘Outcome of the marketing event’:
- Basic Facts: Provides the values of the column and how they relate to each other
- Key Drivers: Shows that there are four columns which are strongly correlated i.e. previous outcome (previous event’s outcome), month (last contacted month), existing loan (whether the client has a personal loan or not) and owns house (whether the client has a house or not).
- Segments: Shows the hidden groups in the data that can predict outcome of the column
- Anomalies: Shows groups in the data that exhibit unexpected results for the column
- We can add these graphs from Key Drivers to our project by clicking on the graphs and then choose ‘Add Selected’ on the top right corner.
- The green color indicates ‘yes’ and blue color indicates ‘no’ on the bar graphs that have been added to the project. The client’s decision to subscribe to the term deposit also depends on their choice in the previous campaign and whether they have a loan and own a house, which is all very valuable analysis in a short time.
- This analysis can be used as a starting point and further visualizations can be built using Line chart, Bar chart, Scatterplot, Pie chart, Bubble scatterplot, Heat chart, Area chart, Sankey etc., are available as shown in below.
- Let’s look at an example of a visualisation. A Sankey visualization below says most of the users that did not subscribe are married and have secondary education and this provides useful information to the financial institution.
In this blog we learned to analyse data sets that summarizes the main characteristics using the ‘Explain’ built-in feature of Oracle Analytics Cloud (OAC). We also created a project using the uploaded csv file along with graphs that resulted from this feature and developed Sankey visualisations that provided valuable analysis in a short time.
In my subsequent blog we shall create a model using one of the Binary Classifiers and apply it to the testing dataset of new customers to predict who are very likely to subscribe to the term deposit and also evaluate the model. Stay tuned for Modelling, Evaluation and Prediction to arrive at a solution for our business problem.
#OACS #Analytics #MachineLearning #Oracle #SmartFeatures #DataScience #ExploratoryDataAnalysis #DataVisualisations
For more information about how Apps Associates can help on your Oracle projects go to www.appsassociates.com
Tejaswini Uppu is one of our Analytics Associate based in Hyderabad, India and has been with Apps Associates for more than 2 years working on multiple Analytics technologies spanning Oracle, Informatica, Snowflake and Python.