Create Bins and View Distributions

tableau, bins, bar, chart, distribution, age, data, science

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

It’s cool, you finished the 1st part. Now we’re going to do more deep Data Mining analysis with this bank’s dataset.

tableau, bins, bar, chart, distribution, age, data, science

To make these analyzes more deep, we’ll create a more statistical approach.

To do that we will create a new tab.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

For this new tab, we want to understand how client distributed according to their age. Is there a majority of young or old people ?

tableau, bins, bar, chart, distribution, age, data, science

Move the variable « Age » in « Columns ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

As we want to see the distribution of client ages, we need to use the variable « Number of Records » to see the number of observations. Move the variable « Number of Record » to « Rows ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

Boom, we have a chart but there is only one point on the top right. What happened is that Tableau took the sum of the ages of all the bank’s clients and the sum of all the « Number of Records », it means the total number of clients, 10 000 clients.

We’ll find a solution but before we’ll change the format to better see the chart. Right-click in the middle of the chart and select « Format ».

tableau, bins, bar, chart, distribution, age, data, science

For the font’s size, select « 12 ».

tableau, bins, bar, chart, distribution, age, data, science

Here you can see that the total age is 39 218 but that’s not what we’re looking for. What we want to see is the number of clients for each age.

I’ll explain what’s going on. We took the aggregated sums of our variables. Aggregate means that we took the total sum of the variable for each category. We added the ages but in fact we want to see the total number of observations for each age separately.

To have that, just click on the arrow in « SUM(Age) » in « Columns ».

tableau, bins, bar, chart, distribution, age, data, science

Then select « Dimensions »

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

You see, Tableau doesn’t take the aggregated sum of ages but it takes ages separately. We have a curve that shows us the continuous distribution of our clients ages. That is to say, for each age, the curve gives is the number of clients of this age.

We’ll look at the dataset. Right-click on « Churn Modelling » and select « View Data… ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

There is window that appears that shows us the data in detail. If you scroll to the right, you will find the column « Age ».

tableau, bins, bar, chart, distribution, age, data, science

We see that the ages rounded. As all ages rounded, Tableau is able to group clients by age. By positioning the mouse on the curve, we can see that there are 200 clients who are 26 years old.

tableau, bins, bar, chart, distribution, age, data, science

If in the dataset, ages weren’t rounded, you would have seen clients with 26.5 or 26.3 years. It would create a lot of irregularity, there would be plenty of spikes with lots of variations.

Oooooh look, there is a variation that isn’t normal.

tableau, bins, bar, chart, distribution, age, data, science

Let’s analyze it in detail. Around this peak, we see that there are 348 clients who are 29 years old.

tableau, bins, bar, chart, distribution, age, data, science

Here, 404 clients who are 31 years old.

tableau, bins, bar, chart, distribution, age, data, science

And this peak down that shows us that there are 327 clients who are 30 years old.

tableau, bins, bar, chart, distribution, age, data, science

How to explain this irregularity ? It’s possible that many people of 29 years old are about to turn 30 years old and many people of 31 years old who just had 31 years old. It’s chance that make us have inaccuracies. You may have other inaccuracies if you data isn’t precise and rounded. In our case, the ages are rounded but we want to get rid of our small irregularity that we see on our curve.

There is way to see our distribution without our irregularities, it’s « bins ». « Bins » consists of grouping the information into different categories. That is we’re going to regroup our clients in different age groups.

Right-click on « Age » in « Measures ». Select « Create » and select « Bins… ».

tableau, bins, bar, chart, distribution, age, data, science

A window appears. We’ll group our clients in 5-years increments. In « Size of bins », write « 5 » and click on the « OK » button.

tableau, bins, bar, chart, distribution, age, data, science

As you can see, the variable « Age » has remained in « Measures » but there is a new variable in « Dimensions ».This is the variable we created « Age(bins) ».

tableau, bins, bar, chart, distribution, age, data, science

Our « Age(bins) » variable was correctly placed in « Dimensions » because it is a category variable because each category corresponds to a 5-year age group.

For example, one category is 20 to 24 age group. Now we’ll create a new distribution based on « bins ».

To do that, we’ll remove the variable « Age » from « Columns » with a click and drag outside.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

You move the variable « Age(bins) » from « Dimensions » to « Columns ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

Note

In this case, it’s not possible to directly replace « Age » by « Age(bins) » over « Age » on « Columns ». This is because « Age » is a measure and « Age(bins) is a dimension.

That’s nice distribution, it’s usually the type of distribution (chart) we see in economics or mathematics. The difference with the old chart is that this chart is discrete. This chart is discrete because the clients grouped by age group while the previous chart was continuous.

On this distribution (chart), each bar corresponds to an age range. For example, this bar corresponds to the 25-29 age group.

tableau, bins, bar, chart, distribution, age, data, science

Now, we’ll change the colors.

In « Row », move « SUM(Number of Record) » while holding down the « Ctrl » or « Command » key on your keyboard to « Colors ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

We get our distribution in blue but we’ll change the color to red. Click on « Colors » and click on « Edit Colors »

tableau, bins, bar, chart, distribution, age, data, science

In the window that appears, click on the blue square on the right to display the color pallet.

tableau, bins, bar, chart, distribution, age, data, science

Select the red color and click on the « OK » button.

tableau, bins, bar, chart, distribution, age, data, science

Click on the « OK » button of the « Edit Colors » window.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

To facilitate the reading of the bar chart, we’ll add the number of clients in each age group. In « Row », move « SUM (Number of Record) » while holding the « Ctrl » or « Command » key on your keyboard to « Label ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

That’s it, we can see how many clients there are in each age group.

We see that the dominant bar is the 35-39 age bracket and the second dominant bar is the 30-34 age bracket. Overall, we can see that most clients are between 25 and 40 years old, which seems consistent.

On our bar chart, we have absolute values. We’ll replace that with percentages. Click in the little arrow in « SUM(Number of Records) » in « Label » and you select « Add Table Calculation… » but I’ll show you another way to do it.

tableau, bins, bar, chart, distribution, age, data, science

Instead of clicking « Add Table Calculation… », click on « Quick Table Calculation » and select « Percent of total ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

It’s cool, we have the exact percentage of people in each age bracket. Now, we can see that in the 25 to 40 age group, we have 20 + 23 +17= 60% of clients.

I’ll show you one last thing.You can change the size of the slices easily, just click on « Age(bins) » and select « Edit ».

tableau, bins, bar, chart, distribution, age, data, science

In the windows, you can change the size of the slices (bins). Put « 10 » instead of « 5 » to get 10-years slices. Click on the « OK » button.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

Now, we have a distibution with fewer slices and the dominant slice is 30 to 39 years old.

Well, it was just to show you how to change the size of bins. To go back to the old distribution with the 5-years slices, click on « Back » button.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

As you can see, the values on bars are in percentages but the values on the axis are in absolutes values. Here is an exercise that I ask you to do : « Put the values of the axis in percentage ». I’ll give you the answer the next article.

Share this article if you think if can help someone you know.Thank you.

-Steph

Please follow, like and share:

Connect Tableau to An Excel File

tableau connect excel file geographic map

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

Now that you downloaded the dataset in Excel file format, we’ll use Tableau to analyze this.

We’ll connect to the dataset using the « Excel » option.

Now that you downloaded the dataset which is in Excel format, we will use Tableau to analyze this.

We will connect the the dataset using the « Excel » option.

tableau connect excel file geographic map

Select the dataset in Excel file you downloaded and click on the « Open » button.

tableau connect excel file geographic map

And as you can see, there is only one tab.

tableau connect excel file geographic map

There is only one tab because in the Excel file there is only one tab. If in the Excel file there were several tabs, they would all have been listed here.

tableau connect excel file geographic map

It’s necessary to check that all data is « OK ». Scroll the lines and columns to see that. Everything is good, there are 10 000 lines as in the Excel file.

tableau connect excel file geographic map

Excellent, we connected our Excel source file to Tableau.

Now, click on the « Sheet1 » tab to access the Worksheet.

tableau connect excel file geographic map

tableau connect excel file geographic map

We’ll have a little fun.

For example, let’s look at what we have with « Geography »

tableau connect excel file geographic map

« Geography » is the dimension that gives us the country, so we’ll make a map to see where the clients from the bank come from.

Move « Geography » on this area.

tableau connect excel file geographic map

tableau connect excel file geographic map

Ah, it’s odd, nothing happens ?!? Why ? Look, when you look at « Geography », it’s not recognized by Tableau as a geographic dimension. Here,, you can see that Tableau recognized « Geography » as a dimension of type text with the label « ABC »

tableau connect excel file geographic map

Don’t worry, we can fix it quickly. Click on the arrow of « Geography ».

tableau connect excel file geographic map

Selects « Geography Roles » and « Country Region » so that the « Geography » dimension become geography’s type.

tableau connect excel file geographic map

Now you remove « Geography » made a table with a click-and-drag.

tableau connect excel file geographic map

tableau connect excel file geographic map

Look, we have a globe next to « Geography ». This means that Tableau recognize that « Geography » is a geographic dimension.

tableau connect excel file geographic map

Since « Geography » is a dimension of geography type, there are 2 new measures that have appeared : Latitude (generated) and Longitude (generated).

tableau connect excel file geographic map

Put « Geography » in this space with a click and drag.

tableau connect excel file geographic map

Look, this time there is a map.

tableau connect excel file geographic map

You have the possibility of zooming with these buttons.

tableau connect excel file geographic map

The map is fine but we’ll remove the blue dots and modify the map so that it’s easier to read.

We’ll color the countries and display the clients number that has in each country.

We know that in the dataset each line corresponds to a client. What we can do is use the « number Of Record », it means the total of number of observations. In this way, we can visualize the number of lines attended to each country and the number of lines attended to each country is the number of client per country.

Then, take the « number Of Record » and move it to « Colors ».

tableau connect excel file geographic map

Boom ! Each country has a color.

tableau connect excel file geographic map

Look at the color contrasts. France has a darker color which indicates that it is the country with the most clients. Germany and Spain have almost the same colors which indicates that they have almost the same clients number.

But we want to know the clients number per country without have the cursor on the country.

To do this we’ll add a label. Take « number Of Record » and moves it to « Label ».

tableau connect excel file geographic map

tableau connect excel file geographic map

We’ll increase the text’s size and put in bold. Click on « Label », click on « Font » and select « 12 » and bold.

tableau connect excel file geographic map

It’s cool, we can see the clients number per country. You have the possibility to zoom on a region. Click on « Zoom area » and drag and drag to select the region on the map.

tableau connect excel file geographic map

tableau connect excel file geographic map

Now we can see that the majority of clients are in France, this represents almost half of the total clients number of the dataset. Germany and Spain have almost the same number of clients.

Share this article if you think it can help someone you know. Thank you.

-Steph

Please follow, like and share:

Dataset For Data Mining

dataset data mining

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

To have the dataset to do Data Mining, you need to go to the superdatascience website . In « Part.1 Visualization », you see the section « How to use Tableau for Data Mining ». Click on « Churn Modeling » to download the file.

dataset data mining

Once you have downloaded the file, move the file to the directory you created for the course. In this directory, create a new directory (unless you already do it) named « 2.Chunk investigation ».

dataset data mining

dataset data mining

Open this fiel with Excel or with other spreadsheet software.

dataset data mining

Know that we use this dataset for the visualization part but we will also use this dataset for the modeling part.

Let’s analyze the data of this dataset.

This dataset is quite large because it contains 10 000 lines and a few columns. This is the list of a bank’s client. The client information is :

  • Customer id (login)

  • Surname (last name)

  • Credit score ( is the measure that indicates the client’s ability to borrow)

  • Geography (client’s country)

  • Gender (male or female)

  • Age

  • Tenure -(the number of years the client is in the bank)

  • Balance (balance of the client’s bank account)

  • NumOfProduct (number of product that the client has in the bank – credit card, contract, account)

  • HasCrCard (does the client have a credit card ?)

  • IsActiveMember (did the client use his/her credit card during the last month ?)

  • EstimatedSalary (the bank’s estimate of the client’s annual salary)

  • Exited (did the client leave the bank ?)

Now, I will explain the context related to this dataset. This bank has branches in several countries like Germany, Spain and France. This bank noticed that lately there were many clients who left the bank. The bank has a report called « churn rate » which is the customers rate who leave the bank and for a few months the « churn rate » is really higher than usual. It’s for this reason that the bank needs a data scientist (you) to find the problem and propose solutions.

This dataset is a small sample of clients bank. These are 10 000 randomly selected client.

The column « Exited » is a column that didn’t exist before. This column has created when the bank realized that there was an abnormal number of client who were leaving the bank.

dataset data mining

Then the bank observed these clients for 6 months to see which client left the bank.

dataset data mining

In the « Exited » column, the number « 1 » means that the client left the bank and the number « 0 » means that the client stayed in the bank.

To analyze this dataset, you’ll need to do A/B Tests. For exemple, a classic A/B Test is to see if women are more likely to left the bank than men. That’s means, see the number of men who left the bank, see the number of women who left the bank and then normalize by the total number of clients. It’s important to normalize the number of clients because there are not the same proportions of women as men. Next, based on the last column « Exited », you’ll find out if it’s the men or women who are likely to left the bank.

Once you have relevant results, you can show your report to the bank. And with this report you should be able to propose solutions to the bank. For example, if the report says that women leave the bank in bulk, it’s because there is a problem and it’s necessary to see whether the bank is offering women something right. Or it’s possible that another bank offers a much more attractive offer for women or something else.

You will learn how to investigate in the dataset and find answer through client information with A/B tests.

Share this article if you think it can help someone you know. Thank you.

-Steph

Please follow, like and share: