Chi-Square Test With More Than 2 Categories

tableau chi square test

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

In this article, we will do a Chi-square test with more than 2 categories. We will use the A/B test « Country » which has 3 categories which corresponds to 3 countries : German, Spain and France. Select « Gender Actual » tab, make a copy with a right-click and select « Duplicate ».

tableau chi square test

Name the tab « Gender Actual (2) » by « Country Actual ».

tableau chi square test

In « Dimensions », move the variable « Geography » over « Gender » in « Columns » to replace « Gender » with « Geography ».

tableau chi square test

tableau chi square test

Here’s how to do an A/B statistical test when there are 3 categories. We’ll start with the classic method and then I’ll show you another way to do Chi-square test with any number of categories.

Let’s start with the classical method. In this case, there are 3 categories so we can’t use the online tool of the previous article. In the previous article we used an online tool with only 2 categories « Sample1 » and « Sample2 ». That’s why we’re going to use another online tool, click here  .

tableau chi square test

In this online tool, we can enter the values without using the total values. That is, we enter only the number of observations in each category. We simply need to enter the values that are on our A/B test. And I’m going to show you how to turn our A/B test into a table. In this way, it will be easier to enter the values in the online tool without making any mistakes.

Go to the « Show me » tool at the top right.

tableau chi square test

Click on « text tables »

tableau chi square test

tableau chi square test

Click on « Swap Rows ans Columns » button.

tableau chi square test

tableau chi square test

Cool, now you have a table arranged in exactly the same way as the online tool.

In the online tool, we will select 2 rows and 3 columns.

tableau chi square test

As we have 3 categories and 2 possible results, we enter our values exactly as in the table we just created on Tableau.

tableau chi square test

Perfect, our table is ready. You can click on the « Calculate » button.

tableau chi square test

tableau chi square test

As you can see, we observe the same thing as the other online tool. There is our indicator « p » value which is less than 5%. Which means there is a meaning.

tableau chi square test

This statistical significance means that these results are valid for the total number of the bank’s clients and not just for the sample of 10 000 clients. We observe similar differences with A/B test « Country » whose results are based solely on the sample of 10 000 clients. We can conclude that in the total number of the bank’s clients, it’s the clients in Germany who are more likely to leave the bank. This is how we do things cleanly.

You saw, this online tool limited by 5 by 5 tables so you can’t use this tool when you have 6 categories or more. But fortunately it’s possible to do Chi-square test with any number of categories. It’s a special method and for you to understand that, I’ll give you a theoretical explanation.

Here we have 3 countries : German, Spain and France.

tableau chi square test

What we’re trying to compare is the clients number leaving the bank in each of these countries.

tableau chi square test

With our basic A/B test based on a sample of 10 000 clients, we obtained 16% for France, 32% for Germany and 17% for Spain. Now the question is : « Do we observe the same results on the total clients number of the bank ? », it means : « In general, does the country have a significant effect on the clients number leaving bank ? ». Germany has the largest number of clients leaving the bank so the idea is : « Why would we need to compare the 3 countries at the same time ? ».

tableau chi square test

If we do an A/B test statistical test with Germany and France and we get a significant difference in the clients number leaving the bank between these 2 countries, then that would mean that in general, the country has a significant effect on the clients number who bank. Indeed, if we find by comparing Germany and France that the Germans are more likely to leave the bank than the French, we can consider that Spain will not change anything. Germans will always be more likely to leave the bank than the French. Maybe there will be a different relationship between Germany and Spain but there will always be a statistically significant difference between France and Germany with a larger number of clients leaving the bank in Germany than France.

Here is a way to confirm that this logic is true. There is a test and the participants of this test are German, Spanish and French. Imagine that this test was done without looking at what is happening in Spain. Now you get the result and you ask yourself the question : « Would the results changed if you added Spain ? ». The answer is « no » because there is no interdependence between Germany, Spain and France. That is, the decision to leave the bank in France and Germany doesn’t depend on Spain. And therefore, it’s quite correct to separate the categories by putting 1 aside to compare the 2 others. And as now we have 2 categories, we can do a Chi-square test with the online tool that we used in the previous article.

So let’s go back to our worksheet and put a country aside to compare only 2 countries. Select « Country » tab.

tableau chi square test

What we observe is that the difference between Spain and France is very small, so it wouldn’t be interesting to do a Chi-square test between Spain and France. It’s more interesting to do a Chi-square test between Germany and France and to prove that there is a statistically significant difference between these 2 countries. This will be enough to conclude that the country has a statistically significant impact on the clients number who leave the bank.

Selects « Country Actual » tab.

tableau chi square test

We will use the online tool of the previous article, click here  .

We will make a copy of « Country Actual » to have a bar chart with absolute values. Select « Country Actual », right-click and select « Duplicate ».

tableau chi square test

In « Show Me », select « horizontal bars ».

tableau chi square test

tableau chi square test

Removes « SUM (Number of Records )» from « Columns » and removes « Exited » and « Geography » from « Rows ».

tableau chi square test

tableau chi square test

In « Dimensions », move « Geography » in « Columns ».

tableau chi square test

tableau chi square test

In « Measures », move « Number of Records » to « Rows ».

tableau chi square test

tableau chi square test

In « Measures », move « SUM(Number of Records) » in « Label ».

tableau chi square test

tableau chi square test

In « Dimensions », move « Exited » in « Label ».

tableau chi square test

tableau chi square test

In « Dimensions », move « Exited » in « Colors ».

tableau chi square test

tableau chi square test

We also need total absolute values, which means the total number of men and women. There is a very fast way to get that. Right-click on the vertical axis and select « Add Reference Line ».

tableau chi square test

Then in « Value », click on the drop-down on the right and select « Sum » to have the total sum of the observations.

tableau chi square test

And in « Scope », you select « Per Cell » option to specify that you want the total sums for each category, male and female.

tableau chi square test

Now, we have the total sum at the top of the bars. We will modify labels to have the absolute values. In « Label », we will change « Computation » to « Value » and click on the « OK » button.

tableau chi square test

tableau chi square test

tableau chi square test

Here’s how to enter the data :

For « Sample1 » in #success, you enter 810 because there are 810 people who left the bank. For « Sample1 » in #trials, you enter 5014 because there are 5014 people in total.

For « Sample2 » in #success, you enter 814 because there are 814 people who left the bank. For « Sample2 » in #trials, you enter 2509 because there are 2509 people in total.

tableau chi square test

Here is the verdict : « Sample2 is more successful ». « Sample2 » corresponds to German’s clients and #success is :« yes, the client left the bank ». This verdict means that of all the clients from German are more likely to leave the bank than clients from France. And look, there is something important, it’s « p<0.001 ». This means that the « p » is strictly less than 0.001. As you can see, « p » value is very small, which concludes that the tests are statistically significant.

Ooh, there’s another thing I wanted to show you with the tab « age » with the 2 bar charts in parallel.

tableau chi square test

As you can see, there are many categories (more than 5) because each category corresponds to a 5-year ago group with clients of the bank aged from 15 to 90 years old. This is a lot of comparison but it would be a good exercise for you to find what are the 2 categories to compare that shows that there is a significant statistic difference.

I give you a hint, compare slices from 50 to 54 years old or from 35 to 39 years olds. In fact, you should compare all peer categories where you observe difference on this basic A/B test. Do a basic A/B test with absolutes values. Then do a Chi-square test to check if the difference is statistically significant, I mean, if the result is valid for the total number of bank’s clients.

This is a way to statistically validate the insights we see onTableau. You see, it’s not very difficult and it’s effective. Here is a way to find insights on Tableau and validate them.

Subscribe to my newsletter and share this article if you think it can help someone you know. Thank you.

-Steph

Visualize An A/B Test in Tableau

A/B tes

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

We are going to do a simple and very visual A/B test in Tableau.

The first thing to do is save this worksheet and name it « Map ».

Do a right-click on « Sheet1 » at the bottom of the screen and select « Rename Sheet ».

Tableau A/B test data science

The second thing to do is to save this workbook. For that we go to « File » and select « Save to Tableau Public As… »

Tableau A/B test data science

The « Tableau Public Sign In » window appears to connect to your Tableau account.

Tableau A/B test data science

Save the workbook with the name « DataMining » and click on the « Save » button.

Tableau A/B test data science

We will create a new worksheet specifically for the A/B test. Click on the « New Worksheet » icon at the bottom of the screen.

Tableau A/B test data science

Look, you created the new worksheet.

Tableau A/B test data science

To start we need the dependent variable we are studying. This dependent variable is « Exited » which is « 1 » if the client left the bank or « 0 » if the client stayed in the bank. Now look, this dependent variable is in « Measures » so Tableau recognized this variable as a dependent numeric variable.

For our case, the dependent variable « Exited » is actually a category. Our logic in this situation is : « Did the client leave or did the client stay ?. For this reason, we need to move the variable « Exited » in the dimensions.

Tableau A/B test data science

Now, the variable « Exited » is in dimensions.

Tableau A/B test data science

Let’s do a classic A/B test, the A/B test for gender (male or female).Here what we’re going to test, if we keep all the rest constant and if we take a male client and a female client, which of the two is most likely to leave the bank.

Let’s go, move « Gender » in « Column ».

Tableau A/B test data science

We have 2 columns : « Female » and « Male ».

Tableau A/B test data science

Move « Exited » on « Colors ».

Tableau A/B test data science

Look, we have 2 colors. Blue for « 0 », it means the clients who stayed in the bank. Orange for « 1 », it means the clients who have left the bank.

To be more specific, we want know how many clients stayed and how many clients left the bank.

Move the variable « number Of Records » into « Rows ».

Tableau A/B test data science

Tableau A/B test data science

What we can notice is that the total number of men is higher than the total number of women. Then we can see that among the women, a large proportion left and that among the men a small proportion left. However, this is not enough to allow us to understand what is happening.

We will add « number Of Record » as a label. Move « number Of Record » to « Label ».

Tableau A/B test data science

Tableau A/B test data science

Change the label’s size in « 12 » and bold.

Tableau A/B test data science

Now we know the number of people in each category.

To better visualize this, we will replace the numbers with percentages. We want to see what percentage of female clients are gone and what percentage of male client are gone. With percentages we can easily make the comparison that with the absolute numbers, it’s not possible because the total number of woman is different from the total number of man.

To convert a absolute number to a percentage, you need to click on the arrow next to « SUM(number Of Record) ».

Tableau A/B test data science

And click on « Add Table Calculation… »

Tableau A/B test data science

Tableau A/B test data science

In « Calculation Type », select « Percent of Total ».

Tableau A/B test data science

Be careful, here there is one important thing to do. Change « Table(across) » and choose « Table(down) ».

Tableau A/B test data science

« Table(down) » will give us the total percentage in each column. Now you can close the window, the changes have been made.

Boom, we have the percentages.

We will create labels to make it easier to read. Click on the small arrow next to « SUM(number Of Record) » and select « Format… ».

Tableau A/B test data science

The « Pane » tab appears. In the tab « Pane » in « Numbers », choose « Percentage » and select « 0 » decimal.

Tableau A/B test data science

We will make it even more consistent. Move « SUM(number Of Record) » by holding key « Ctrl » or « Command » to « Rows » to replace the old « SUM(number Of Record) ».

Tableau A/B test data science

As you can see, the vertical axis is in percentage.

Tableau A/B test data science

Let’s do an analysis of what we see. We see that the percentage of female clients who left the bank is 25%. We see that the percentage of male clients who left the bank is 16%. What we see that female clients are more likely to leave the bank than male clients, all the rest being equal.

This A/B test is not complete because we have not done any test of statistical significance but this approach is effective for quickly obtaining results.

We’re going to do a full A/B test later but today you learned how to do an effective A/B test by focusing on relevant things. With a test of statistical signifiance, there are irrelevant variables and it takes a little more time but we’ll that later.

Share this article if you think it can help someone you know. Thank you.

-Steph