Add a Reference Line

reference line tableau data science

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

In the previous article we learned how to work with aliases. We will learn how to add a reference line in our bar chart.

Before I start, I’ll show you a trick in Tableau.

In our bar chart we can see the labels in this order : percentage and below : « Stayed » or « Exited ».

We will reverse this order. You go in this rectangle.

reference line tableau data science

And you place the label « Exited » above the label « SUM(Number of Records ».

reference line tableau data science

Look, the label « Stayed » is above percentage.

reference line tableau data science

With that, we can understand the bar chart more easily.

Let’s add a reference line, let’s go . But before, I think you’d like to know why I’m talking to you about a reference line.

A reference line helps us to compare bar chart results with a benchmark. This benchmark is represented by this reference line.

In our case, the benchmark is the percentage of clients who left the bank in our sample of 10 000 people.

The first thing to do is find this percentage in our bar chart. To be able to do that, remove « Gender » from « Columns ».

reference line tableau data science

Boom, we have a new bar chart.

reference line tableau data science

Look, we only have the percentage of clients who left the bank and the percentage of clients who stayed in the bank.

We see that on our sample of 10 000 people, there are 20% of the clients who left the bank and 80% of the clients stayed in the bank. This means that the churn rate (client departure rate) is 20%.

What we’re going to do is we will add this churn rate in our A/B test. To return to our A/B test, press 2 times on Ctrl+Z or Command+Z or you can click 2 times on the « Back » button in the menu bar.

reference line tableau data science

Now we know that the average clients who left the bank is 20%.

We will add a horizontal line in the Y axis (Y = 20%) to compare the 20% of the churn rate and the 2 categories male and female.

Let’s go. Right-click on the vertical axis (Y axis) and select « Add Reference Line ».

reference line tableau data science

A window appears with several options.

reference line tableau data science

You have the choice to add a line, a band, a distribution or a box plot.

We will use the line for the entire table.

Click on the « Line » button and activate the « Entire Table » checkbox. In « Value » selects « Constant ».

reference line tableau data science

The constant is 20%, so it’s necessary that you put 0.20 in « Value ».

reference line tableau data science

It’s possible to put a label on this reference line. For example, if the line reference corresponds to a formula, the label displays the formula. But for our case, our constant is 20% and it’s already displayed on the vertical axis so we will select « None ».

reference line tableau data science

For the format of the line, select the continuous line and click on the « OK » button.

reference line tableau data science

We have our reference line is added to our chart.

reference line tableau data science

Here is what we can see. Female clients are more likely to leave the bank than average clients. Male clients are less likely to leave the bank than average clients. 

In our case, it’s obvious to see that because there is only 2 categories, men and women.

Now you know how to add a reference line in a bar chart.

Share this article if you think it can help someone you know. Thank you.

-Steph

Please follow, like and share:

Dataset For Data Mining

dataset data mining

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

To have the dataset to do Data Mining, you need to go to the superdatascience website . In « Part.1 Visualization », you see the section « How to use Tableau for Data Mining ». Click on « Churn Modeling » to download the file.

dataset data mining

Once you have downloaded the file, move the file to the directory you created for the course. In this directory, create a new directory (unless you already do it) named « 2.Chunk investigation ».

dataset data mining

dataset data mining

Open this fiel with Excel or with other spreadsheet software.

dataset data mining

Know that we use this dataset for the visualization part but we will also use this dataset for the modeling part.

Let’s analyze the data of this dataset.

This dataset is quite large because it contains 10 000 lines and a few columns. This is the list of a bank’s client. The client information is :

  • Customer id (login)

  • Surname (last name)

  • Credit score ( is the measure that indicates the client’s ability to borrow)

  • Geography (client’s country)

  • Gender (male or female)

  • Age

  • Tenure -(the number of years the client is in the bank)

  • Balance (balance of the client’s bank account)

  • NumOfProduct (number of product that the client has in the bank – credit card, contract, account)

  • HasCrCard (does the client have a credit card ?)

  • IsActiveMember (did the client use his/her credit card during the last month ?)

  • EstimatedSalary (the bank’s estimate of the client’s annual salary)

  • Exited (did the client leave the bank ?)

Now, I will explain the context related to this dataset. This bank has branches in several countries like Germany, Spain and France. This bank noticed that lately there were many clients who left the bank. The bank has a report called « churn rate » which is the customers rate who leave the bank and for a few months the « churn rate » is really higher than usual. It’s for this reason that the bank needs a data scientist (you) to find the problem and propose solutions.

This dataset is a small sample of clients bank. These are 10 000 randomly selected client.

The column « Exited » is a column that didn’t exist before. This column has created when the bank realized that there was an abnormal number of client who were leaving the bank.

dataset data mining

Then the bank observed these clients for 6 months to see which client left the bank.

dataset data mining

In the « Exited » column, the number « 1 » means that the client left the bank and the number « 0 » means that the client stayed in the bank.

To analyze this dataset, you’ll need to do A/B Tests. For exemple, a classic A/B Test is to see if women are more likely to left the bank than men. That’s means, see the number of men who left the bank, see the number of women who left the bank and then normalize by the total number of clients. It’s important to normalize the number of clients because there are not the same proportions of women as men. Next, based on the last column « Exited », you’ll find out if it’s the men or women who are likely to left the bank.

Once you have relevant results, you can show your report to the bank. And with this report you should be able to propose solutions to the bank. For example, if the report says that women leave the bank in bulk, it’s because there is a problem and it’s necessary to see whether the bank is offering women something right. Or it’s possible that another bank offers a much more attractive offer for women or something else.

You will learn how to investigate in the dataset and find answer through client information with A/B tests.

Share this article if you think it can help someone you know. Thank you.

-Steph

Please follow, like and share: