Chi-Square Test With More Than 2 Categories

tableau chi square test

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

In this article, we will do a Chi-square test with more than 2 categories. We will use the A/B test « Country » which has 3 categories which corresponds to 3 countries : German, Spain and France. Select « Gender Actual » tab, make a copy with a right-click and select « Duplicate ».

tableau chi square test

Name the tab « Gender Actual (2) » by « Country Actual ».

tableau chi square test

In « Dimensions », move the variable « Geography » over « Gender » in « Columns » to replace « Gender » with « Geography ».

tableau chi square test

tableau chi square test

Here’s how to do an A/B statistical test when there are 3 categories. We’ll start with the classic method and then I’ll show you another way to do Chi-square test with any number of categories.

Let’s start with the classical method. In this case, there are 3 categories so we can’t use the online tool of the previous article. In the previous article we used an online tool with only 2 categories « Sample1 » and « Sample2 ». That’s why we’re going to use another online tool, click here  .

tableau chi square test

In this online tool, we can enter the values without using the total values. That is, we enter only the number of observations in each category. We simply need to enter the values that are on our A/B test. And I’m going to show you how to turn our A/B test into a table. In this way, it will be easier to enter the values in the online tool without making any mistakes.

Go to the « Show me » tool at the top right.

tableau chi square test

Click on « text tables »

tableau chi square test

tableau chi square test

Click on « Swap Rows ans Columns » button.

tableau chi square test

tableau chi square test

Cool, now you have a table arranged in exactly the same way as the online tool.

In the online tool, we will select 2 rows and 3 columns.

tableau chi square test

As we have 3 categories and 2 possible results, we enter our values exactly as in the table we just created on Tableau.

tableau chi square test

Perfect, our table is ready. You can click on the « Calculate » button.

tableau chi square test

tableau chi square test

As you can see, we observe the same thing as the other online tool. There is our indicator « p » value which is less than 5%. Which means there is a meaning.

tableau chi square test

This statistical significance means that these results are valid for the total number of the bank’s clients and not just for the sample of 10 000 clients. We observe similar differences with A/B test « Country » whose results are based solely on the sample of 10 000 clients. We can conclude that in the total number of the bank’s clients, it’s the clients in Germany who are more likely to leave the bank. This is how we do things cleanly.

You saw, this online tool limited by 5 by 5 tables so you can’t use this tool when you have 6 categories or more. But fortunately it’s possible to do Chi-square test with any number of categories. It’s a special method and for you to understand that, I’ll give you a theoretical explanation.

Here we have 3 countries : German, Spain and France.

tableau chi square test

What we’re trying to compare is the clients number leaving the bank in each of these countries.

tableau chi square test

With our basic A/B test based on a sample of 10 000 clients, we obtained 16% for France, 32% for Germany and 17% for Spain. Now the question is : « Do we observe the same results on the total clients number of the bank ? », it means : « In general, does the country have a significant effect on the clients number leaving bank ? ». Germany has the largest number of clients leaving the bank so the idea is : « Why would we need to compare the 3 countries at the same time ? ».

tableau chi square test

If we do an A/B test statistical test with Germany and France and we get a significant difference in the clients number leaving the bank between these 2 countries, then that would mean that in general, the country has a significant effect on the clients number who bank. Indeed, if we find by comparing Germany and France that the Germans are more likely to leave the bank than the French, we can consider that Spain will not change anything. Germans will always be more likely to leave the bank than the French. Maybe there will be a different relationship between Germany and Spain but there will always be a statistically significant difference between France and Germany with a larger number of clients leaving the bank in Germany than France.

Here is a way to confirm that this logic is true. There is a test and the participants of this test are German, Spanish and French. Imagine that this test was done without looking at what is happening in Spain. Now you get the result and you ask yourself the question : « Would the results changed if you added Spain ? ». The answer is « no » because there is no interdependence between Germany, Spain and France. That is, the decision to leave the bank in France and Germany doesn’t depend on Spain. And therefore, it’s quite correct to separate the categories by putting 1 aside to compare the 2 others. And as now we have 2 categories, we can do a Chi-square test with the online tool that we used in the previous article.

So let’s go back to our worksheet and put a country aside to compare only 2 countries. Select « Country » tab.

tableau chi square test

What we observe is that the difference between Spain and France is very small, so it wouldn’t be interesting to do a Chi-square test between Spain and France. It’s more interesting to do a Chi-square test between Germany and France and to prove that there is a statistically significant difference between these 2 countries. This will be enough to conclude that the country has a statistically significant impact on the clients number who leave the bank.

Selects « Country Actual » tab.

tableau chi square test

We will use the online tool of the previous article, click here  .

We will make a copy of « Country Actual » to have a bar chart with absolute values. Select « Country Actual », right-click and select « Duplicate ».

tableau chi square test

In « Show Me », select « horizontal bars ».

tableau chi square test

tableau chi square test

Removes « SUM (Number of Records )» from « Columns » and removes « Exited » and « Geography » from « Rows ».

tableau chi square test

tableau chi square test

In « Dimensions », move « Geography » in « Columns ».

tableau chi square test

tableau chi square test

In « Measures », move « Number of Records » to « Rows ».

tableau chi square test

tableau chi square test

In « Measures », move « SUM(Number of Records) » in « Label ».

tableau chi square test

tableau chi square test

In « Dimensions », move « Exited » in « Label ».

tableau chi square test

tableau chi square test

In « Dimensions », move « Exited » in « Colors ».

tableau chi square test

tableau chi square test

We also need total absolute values, which means the total number of men and women. There is a very fast way to get that. Right-click on the vertical axis and select « Add Reference Line ».

tableau chi square test

Then in « Value », click on the drop-down on the right and select « Sum » to have the total sum of the observations.

tableau chi square test

And in « Scope », you select « Per Cell » option to specify that you want the total sums for each category, male and female.

tableau chi square test

Now, we have the total sum at the top of the bars. We will modify labels to have the absolute values. In « Label », we will change « Computation » to « Value » and click on the « OK » button.

tableau chi square test

tableau chi square test

tableau chi square test

Here’s how to enter the data :

For « Sample1 » in #success, you enter 810 because there are 810 people who left the bank. For « Sample1 » in #trials, you enter 5014 because there are 5014 people in total.

For « Sample2 » in #success, you enter 814 because there are 814 people who left the bank. For « Sample2 » in #trials, you enter 2509 because there are 2509 people in total.

tableau chi square test

Here is the verdict : « Sample2 is more successful ». « Sample2 » corresponds to German’s clients and #success is :« yes, the client left the bank ». This verdict means that of all the clients from German are more likely to leave the bank than clients from France. And look, there is something important, it’s « p<0.001 ». This means that the « p » is strictly less than 0.001. As you can see, « p » value is very small, which concludes that the tests are statistically significant.

Ooh, there’s another thing I wanted to show you with the tab « age » with the 2 bar charts in parallel.

tableau chi square test

As you can see, there are many categories (more than 5) because each category corresponds to a 5-year ago group with clients of the bank aged from 15 to 90 years old. This is a lot of comparison but it would be a good exercise for you to find what are the 2 categories to compare that shows that there is a significant statistic difference.

I give you a hint, compare slices from 50 to 54 years old or from 35 to 39 years olds. In fact, you should compare all peer categories where you observe difference on this basic A/B test. Do a basic A/B test with absolutes values. Then do a Chi-square test to check if the difference is statistically significant, I mean, if the result is valid for the total number of bank’s clients.

This is a way to statistically validate the insights we see onTableau. You see, it’s not very difficult and it’s effective. Here is a way to find insights on Tableau and validate them.

Subscribe to my newsletter and share this article if you think it can help someone you know. Thank you.

-Steph

Validate Data Mining In Tableau With A Chi-Square Test

validate validation

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

In this article we will start using statistics. Don’t worry we’ll do something simple, we’ll use the Chi-square test in a basic way. There is a special section to learn how to do statistics at an advanced level.

I’ll explain why we’re going to learn how to use the Chi-square test. The results we have with theses 2 bar charts are good. We see on theses 2 bar charts that age has a significant impact on the rate of client leaving the bank. We also see in which age groups the clients leaves the bank the most and which age groups the clients leave the bank the least. With that we have good insights.

In the A/B test « Gender », we can see that there is a correlation between the male and female sex and the choice to leave the bank. But as I said before, this A/B test is basic. The results of a basic A/B test visually shows us what is probably happenning in reality but we aren’t 100% sure of these results. To validate these results, we need do to use statistical tests like Chi-square test.

Doing a report based on basic A/B test is very risky and you can have completely false insights. I don’t advise you to do it (unless you want to leave your job). It’s for this reason that using Chi-square will help us to have strong insights.

Chi-square will allow us to know if our results are statistically significant. Our results are based on a sample of 10 000 clients and Chi-square test will tell us if these results are due to chance effects or if these results can represent all the client of the bank.

For example in our A/B test « Gender », we observed that in our sample of 10 000 clients, women are more likely to leave the bank compared to men.

tableau data mining science chi square test a/b test

Now, we aren’t sure if the results of this sample represent the behavior of all the bank’s clients.

To use basic Chi-square test, we use an online tool. Click here  .

tableau data mining science chi square test a/b test

On internet, there are plenty of websites to do a Chi-square test but we’ll use this one so that you can understand how it works. To do a Chi-square test, we need to use absolute values and in our A/B test we have percentage.

Let’s go back to Tableau. We’ll create a new tab with a version of A/B test with absolute values. In this way, we keep the A/B test with the percentages. Do a right-click on the « Gender » tab and select « Duplicate ».

tableau data mining science chi square test a/b test

Name the new tab « Gender Actual » to specify that it’s absolute values.

tableau data mining science chi square test a/b test

To have the absolute values, move « Number of Records » in « Measures » to the « Marks » area and put it over top of « SUM(Number of Records ».

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Move « Number of Records » in « Measures » to « Rows » over « SUM(Number of Records ».

tableau data mining science chi square test a/b test

Cool, we have our absolute values.

tableau data mining science chi square test a/b test

We also need total absolute values, which means the total number of men and women. There is a very fast way to get that. Right-click on the vertical axis and select « Add Reference Line ».

tableau data mining science chi square test a/b test

Then in « Value », click on the drop-down on the right and select « Sum » to have the total sum of the observations.

tableau data mining science chi square test a/b test

And in « Scope », you select « Per Cell » option to specify that you want the total sums for each category, male and female.

tableau data mining science chi square test a/b test

Now, we have the total sum at the top of the bars. We will modify labels to have the absolute values. In « Label », we will change « Computation » to « Value » and click on the « OK » button.

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Perfect, we have the total amount of observation at the top of each bar : 4543 women and 5457 men. We have what we need to use our online tool.

tableau data mining science chi square test a/b test

OK, I’ll explain how this tool works. « Sample1 » and « Sample2 » correspond to the independent variable « Gender ». You choose in which order you enter the data, « Sample1 » for men or the opposite. In our case, we use « Sample1 » for women and « Sample2 » for men.

« #success » corresponds to the result Y=1, which means in our case « yes, the client left the bank ».

« #trials » is the total number of observations, which means the total number of women in « Sample1 » and the total number of men « Sample2 ».

That’s how you enter the data :

  • For « Sample1 » in #success, you enter 1139 because there are 1139 women who left the bank. For « Sample1 » in #trials, you enter 4543 because there are 4543 women in total.

 

  • For « Sample2 » in #success, you enter 898 because there are 898 men who left the bank. For « Sample2 » in #trials, you enter 5457 because there are 5457 men in total.

tableau data mining science chi square test a/b test

Here is the verdict : « Sample1 is more successful ». « Sample1 » corresponds to women and #success is :« yes, the client left the bank ». This verdict means that of all the bank’s client, women are more likely to leave the bank than men. And look, there is something important, it’s « p<0.001 ». This means that the « p » is strictly less than 0.001.

tableau data mining science chi square test a/b test

« p » is the value that indicates whether an independent variable has a statistically significant effect on a dependent variable. In our case, the independent variable is « Gender » and the dependent variable is « Exited », which is : « yes, the client left the bank ». So « p » is strictly less than 0.001, which means that the independent variable « Gender » has a statistically significant effect on the dependent variable « Exited ». This shows us that out of the total number of bank’s clients, women are more likely to leave the bank than men.

This is how we use Chi-square test with this online tool. This is the same principle on all online tools that you can find on Google or DuckDuckGo . You can repeat these instructions that I gave you with other tools, you will get the same results.

It’s cool with the Chi-square we validated the A/B test and to specify that this A/B test is validated, we’ll color the tab in green.

Right-click on the tab, select « Color » and select « Green ».

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Perfect, now we’ll validate another A/B test. Selects « HasCreditCard » tab.

tableau data mining science chi square test a/b test

We’re going to create an A/B test « HasCreditCard » only with absolute values. To save time, right-click on « Gender Actual » tab and select « Duplicate ».

tableau data mining science chi square test a/b test

We’ll remove the green color on the tab « Gender Actual (2) ». Right-click on the tab and select « Color » and « None ».

tableau data mining science chi square test a/b test

You rename the tab « HasCreditCard Actual ».

tableau data mining science chi square test a/b test

Move the variable « HasCrCard » over « Gender » in « Columns ».

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Excellent, everything is ready to do a Chi-square test. We’ll remove « Exited » labels to better see the absolutes values. Make a click and drag out.

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Perfect, let’s go back to our online tool. In this case, « Sample1 » is « no », which means client who don’t have credit card and « Sample2 » for « yes », which means clients who have a credit card.

That’s how you enter the data :

  • For « Sample1 » in #success, you enter 613 because there are 613 clients who left the bank. For « Sample1 » in #trials, you enter 2945 because there are 2945 clients who don’t have a credit card.
  • For « Sample2 » in #success, you enter 1424 because there are 1424 clients who left the bank. For « Sample2 » in #trials, you enter 7055 because there are 7055 clients who have a credit card.

tableau data mining science chi square test a/b test

Let’s look at the verdict, it’s « No significant difference ». « p » value is very high, it’s above 5%. This confirms that the independent variable « HasCrCard » has no statistically significant effect on the dependent variable « Exited ». That was the conclusion we had made when we had done the A/B test with percentages.

We had seen that there was 21% of « Exited » (clients who left the bank) in the category « no » and 20% in the category « yes ». With these results we concluded that most likely the variable « HasCrCard » had no impact on the rate of clients who left the bank. Chi-square test confirms our conclusion and we can put the tab « HasCrCard » in green to say that it’s OK.

Right-click on the tab « HasCreditCard » => « Color » => « Green ».

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Excellent, now, you can do a statistical A/B test with 2 categories. Soon, we will do statistical A/B tests with more than 2 categories.

Share this article if you think it can help someone you know. Thank you.

-Steph

Non Constructive Criticism

non constructive criticism negative comment

I watched an Olivier Roland’s video  and I learned good stuff.

These are comments that published regularly on social media. What is the difference between non-constructive criticism and constructive criticism ? Ask yourself this question : « Does this criticism have the will to help me ? ». If you don’t feel the will to help you, it’s like 95% of critics, it’s someone who criticizes just to criticize. These non-constructive criticisms based on an extremely superficial analysis of your content.

These people read the first 5 lines of an article, listen to the first 2 minutes of a podcast, watch the 2 first minutes of a video or read only the title.

When you publish content on internet, you’ll have people who will afford advice on what you do when they haven’t taken time to really understand, to analyze what you mean. It’s known that in communication there is an huge loss of the message’s meaning between what you mean, what you say, what people hear, what people understand and what people remember.

There is also an important factor is that in all areas, beginners have an opinion on everything and they say it. Does this give them a false sense of competence on topics they don’t master ? Yes, it’s ego on my opinion.

To be honest, it has happend to me to comment on something that I didn’t take time to analyze deeply. I gave my opinion based on an extremely superficial analysis and we all have already do it. Just understand that this is a normal human behavior and that it’s not a problem.

Attention, it’s not necesssary to remove these comments because they will never disappear. The goal is to reach enough people who understand your message and practice your advice.

1000 real fans

1000 fans

To succeed on internet, you only need to have 1000 real fans. It means people who understand what you say, who agree with what you say, who like your content, who share and who buy your products. You can have 1 million people who hate you, if you have 1000 real fans, that’s enough.

Many people are afraid of negative comments and that’s a shame. Imagine yourself in 10 years. You talk to a person and you explain to him/her that 10 years ago, you had a blog, a podcast or a Youtube channel and that you stopped it because of a negative comment that touched you. Think about that, do you realize how ridiculous it is ? So continue to create content on internet.

Share this article if you think it can help someone you know.Thank you.

-Steph

Add a Reference Line

reference line tableau data science

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

In the previous article we learned how to work with aliases. We will learn how to add a reference line in our bar chart.

Before I start, I’ll show you a trick in Tableau.

In our bar chart we can see the labels in this order : percentage and below : « Stayed » or « Exited ».

We will reverse this order. You go in this rectangle.

reference line tableau data science

And you place the label « Exited » above the label « SUM(Number of Records ».

reference line tableau data science

Look, the label « Stayed » is above percentage.

reference line tableau data science

With that, we can understand the bar chart more easily.

Let’s add a reference line, let’s go . But before, I think you’d like to know why I’m talking to you about a reference line.

A reference line helps us to compare bar chart results with a benchmark. This benchmark is represented by this reference line.

In our case, the benchmark is the percentage of clients who left the bank in our sample of 10 000 people.

The first thing to do is find this percentage in our bar chart. To be able to do that, remove « Gender » from « Columns ».

reference line tableau data science

Boom, we have a new bar chart.

reference line tableau data science

Look, we only have the percentage of clients who left the bank and the percentage of clients who stayed in the bank.

We see that on our sample of 10 000 people, there are 20% of the clients who left the bank and 80% of the clients stayed in the bank. This means that the churn rate (client departure rate) is 20%.

What we’re going to do is we will add this churn rate in our A/B test. To return to our A/B test, press 2 times on Ctrl+Z or Command+Z or you can click 2 times on the « Back » button in the menu bar.

reference line tableau data science

Now we know that the average clients who left the bank is 20%.

We will add a horizontal line in the Y axis (Y = 20%) to compare the 20% of the churn rate and the 2 categories male and female.

Let’s go. Right-click on the vertical axis (Y axis) and select « Add Reference Line ».

reference line tableau data science

A window appears with several options.

reference line tableau data science

You have the choice to add a line, a band, a distribution or a box plot.

We will use the line for the entire table.

Click on the « Line » button and activate the « Entire Table » checkbox. In « Value » selects « Constant ».

reference line tableau data science

The constant is 20%, so it’s necessary that you put 0.20 in « Value ».

reference line tableau data science

It’s possible to put a label on this reference line. For example, if the line reference corresponds to a formula, the label displays the formula. But for our case, our constant is 20% and it’s already displayed on the vertical axis so we will select « None ».

reference line tableau data science

For the format of the line, select the continuous line and click on the « OK » button.

reference line tableau data science

We have our reference line is added to our chart.

reference line tableau data science

Here is what we can see. Female clients are more likely to leave the bank than average clients. Male clients are less likely to leave the bank than average clients. 

In our case, it’s obvious to see that because there is only 2 categories, men and women.

Now you know how to add a reference line in a bar chart.

Share this article if you think it can help someone you know. Thank you.

-Steph

Connect Tableau to An Excel File

tableau connect excel file geographic map

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

Now that you downloaded the dataset in Excel file format, we’ll use Tableau to analyze this.

We’ll connect to the dataset using the « Excel » option.

Now that you downloaded the dataset which is in Excel format, we will use Tableau to analyze this.

We will connect the the dataset using the « Excel » option.

tableau connect excel file geographic map

Select the dataset in Excel file you downloaded and click on the « Open » button.

tableau connect excel file geographic map

And as you can see, there is only one tab.

tableau connect excel file geographic map

There is only one tab because in the Excel file there is only one tab. If in the Excel file there were several tabs, they would all have been listed here.

tableau connect excel file geographic map

It’s necessary to check that all data is « OK ». Scroll the lines and columns to see that. Everything is good, there are 10 000 lines as in the Excel file.

tableau connect excel file geographic map

Excellent, we connected our Excel source file to Tableau.

Now, click on the « Sheet1 » tab to access the Worksheet.

tableau connect excel file geographic map

tableau connect excel file geographic map

We’ll have a little fun.

For example, let’s look at what we have with « Geography »

tableau connect excel file geographic map

« Geography » is the dimension that gives us the country, so we’ll make a map to see where the clients from the bank come from.

Move « Geography » on this area.

tableau connect excel file geographic map

tableau connect excel file geographic map

Ah, it’s odd, nothing happens ?!? Why ? Look, when you look at « Geography », it’s not recognized by Tableau as a geographic dimension. Here,, you can see that Tableau recognized « Geography » as a dimension of type text with the label « ABC »

tableau connect excel file geographic map

Don’t worry, we can fix it quickly. Click on the arrow of « Geography ».

tableau connect excel file geographic map

Selects « Geography Roles » and « Country Region » so that the « Geography » dimension become geography’s type.

tableau connect excel file geographic map

Now you remove « Geography » made a table with a click-and-drag.

tableau connect excel file geographic map

tableau connect excel file geographic map

Look, we have a globe next to « Geography ». This means that Tableau recognize that « Geography » is a geographic dimension.

tableau connect excel file geographic map

Since « Geography » is a dimension of geography type, there are 2 new measures that have appeared : Latitude (generated) and Longitude (generated).

tableau connect excel file geographic map

Put « Geography » in this space with a click and drag.

tableau connect excel file geographic map

Look, this time there is a map.

tableau connect excel file geographic map

You have the possibility of zooming with these buttons.

tableau connect excel file geographic map

The map is fine but we’ll remove the blue dots and modify the map so that it’s easier to read.

We’ll color the countries and display the clients number that has in each country.

We know that in the dataset each line corresponds to a client. What we can do is use the « number Of Record », it means the total of number of observations. In this way, we can visualize the number of lines attended to each country and the number of lines attended to each country is the number of client per country.

Then, take the « number Of Record » and move it to « Colors ».

tableau connect excel file geographic map

Boom ! Each country has a color.

tableau connect excel file geographic map

Look at the color contrasts. France has a darker color which indicates that it is the country with the most clients. Germany and Spain have almost the same colors which indicates that they have almost the same clients number.

But we want to know the clients number per country without have the cursor on the country.

To do this we’ll add a label. Take « number Of Record » and moves it to « Label ».

tableau connect excel file geographic map

tableau connect excel file geographic map

We’ll increase the text’s size and put in bold. Click on « Label », click on « Font » and select « 12 » and bold.

tableau connect excel file geographic map

It’s cool, we can see the clients number per country. You have the possibility to zoom on a region. Click on « Zoom area » and drag and drag to select the region on the map.

tableau connect excel file geographic map

tableau connect excel file geographic map

Now we can see that the majority of clients are in France, this represents almost half of the total clients number of the dataset. Germany and Spain have almost the same number of clients.

Share this article if you think it can help someone you know. Thank you.

-Steph

Dataset For Data Mining

dataset data mining

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

To have the dataset to do Data Mining, you need to go to the superdatascience website . In « Part.1 Visualization », you see the section « How to use Tableau for Data Mining ». Click on « Churn Modeling » to download the file.

dataset data mining

Once you have downloaded the file, move the file to the directory you created for the course. In this directory, create a new directory (unless you already do it) named « 2.Chunk investigation ».

dataset data mining

dataset data mining

Open this fiel with Excel or with other spreadsheet software.

dataset data mining

Know that we use this dataset for the visualization part but we will also use this dataset for the modeling part.

Let’s analyze the data of this dataset.

This dataset is quite large because it contains 10 000 lines and a few columns. This is the list of a bank’s client. The client information is :

  • Customer id (login)

  • Surname (last name)

  • Credit score ( is the measure that indicates the client’s ability to borrow)

  • Geography (client’s country)

  • Gender (male or female)

  • Age

  • Tenure -(the number of years the client is in the bank)

  • Balance (balance of the client’s bank account)

  • NumOfProduct (number of product that the client has in the bank – credit card, contract, account)

  • HasCrCard (does the client have a credit card ?)

  • IsActiveMember (did the client use his/her credit card during the last month ?)

  • EstimatedSalary (the bank’s estimate of the client’s annual salary)

  • Exited (did the client leave the bank ?)

Now, I will explain the context related to this dataset. This bank has branches in several countries like Germany, Spain and France. This bank noticed that lately there were many clients who left the bank. The bank has a report called « churn rate » which is the customers rate who leave the bank and for a few months the « churn rate » is really higher than usual. It’s for this reason that the bank needs a data scientist (you) to find the problem and propose solutions.

This dataset is a small sample of clients bank. These are 10 000 randomly selected client.

The column « Exited » is a column that didn’t exist before. This column has created when the bank realized that there was an abnormal number of client who were leaving the bank.

dataset data mining

Then the bank observed these clients for 6 months to see which client left the bank.

dataset data mining

In the « Exited » column, the number « 1 » means that the client left the bank and the number « 0 » means that the client stayed in the bank.

To analyze this dataset, you’ll need to do A/B Tests. For exemple, a classic A/B Test is to see if women are more likely to left the bank than men. That’s means, see the number of men who left the bank, see the number of women who left the bank and then normalize by the total number of clients. It’s important to normalize the number of clients because there are not the same proportions of women as men. Next, based on the last column « Exited », you’ll find out if it’s the men or women who are likely to left the bank.

Once you have relevant results, you can show your report to the bank. And with this report you should be able to propose solutions to the bank. For example, if the report says that women leave the bank in bulk, it’s because there is a problem and it’s necessary to see whether the bank is offering women something right. Or it’s possible that another bank offers a much more attractive offer for women or something else.

You will learn how to investigate in the dataset and find answer through client information with A/B tests.

Share this article if you think it can help someone you know. Thank you.

-Steph

How To Do Trap Bar Shrugs

trap bar shrugs

I read a Frederic Delavier’s book « Strength Training Anatomy » and I learned good stuff.

Standing with your legs slightly apart. You’re in front of the trap bar on the floor or on a support :

  • Take the trap bar, paying attention to centering your grip (attention : with heavy weight, a poorly adjusted grip will rock the bar backwards or forwards).

  • You have your head straight or a little bent forwards and your abs squeezed. Do shoulder shrugs.

This exercise works the upper trapezius which is inserted on the clavicle, the acromion and the scapular spine and which goes up to the superior nuchal line the the skull.

The rhomboid major / minor muscles and levator scapulae muscle work a little bit.

The trap bar created to work trapezius by lifting heavy weights without rubbing against the thighs like dumbbells or barbell.

la barbell.

Note

People with long clavicles will always have more difficulty doing shoulder shrugs than people with short clavicles.

Share this article if you think it can help someone you know. Thank you.

-Steph