## Chi-Square Test With More Than 2 Categories

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

In this article, we will do a Chi-square test with more than 2 categories. We will use the A/B test « Country » which has 3 categories which corresponds to 3 countries : German, Spain and France. Select « Gender Actual » tab, make a copy with a right-click and select « Duplicate ».

Name the tab « Gender Actual (2) » by « Country Actual ».

In « Dimensions », move the variable « Geography » over « Gender » in « Columns » to replace « Gender » with « Geography ».

Here’s how to do an A/B statistical test when there are 3 categories. We’ll start with the classic method and then I’ll show you another way to do Chi-square test with any number of categories.

Let’s start with the classical method. In this case, there are 3 categories so we can’t use the online tool of the previous article. In the previous article we used an online tool with only 2 categories « Sample1 » and « Sample2 ». That’s why we’re going to use another online tool, click here  .

In this online tool, we can enter the values without using the total values. That is, we enter only the number of observations in each category. We simply need to enter the values that are on our A/B test. And I’m going to show you how to turn our A/B test into a table. In this way, it will be easier to enter the values in the online tool without making any mistakes.

Go to the « Show me » tool at the top right.

Click on « text tables »

Click on « Swap Rows ans Columns » button.

Cool, now you have a table arranged in exactly the same way as the online tool.

In the online tool, we will select 2 rows and 3 columns.

As we have 3 categories and 2 possible results, we enter our values exactly as in the table we just created on Tableau.

Perfect, our table is ready. You can click on the « Calculate » button.

As you can see, we observe the same thing as the other online tool. There is our indicator « p » value which is less than 5%. Which means there is a meaning.

This statistical significance means that these results are valid for the total number of the bank’s clients and not just for the sample of 10 000 clients. We observe similar differences with A/B test « Country » whose results are based solely on the sample of 10 000 clients. We can conclude that in the total number of the bank’s clients, it’s the clients in Germany who are more likely to leave the bank. This is how we do things cleanly.

You saw, this online tool limited by 5 by 5 tables so you can’t use this tool when you have 6 categories or more. But fortunately it’s possible to do Chi-square test with any number of categories. It’s a special method and for you to understand that, I’ll give you a theoretical explanation.

Here we have 3 countries : German, Spain and France.

What we’re trying to compare is the clients number leaving the bank in each of these countries.

With our basic A/B test based on a sample of 10 000 clients, we obtained 16% for France, 32% for Germany and 17% for Spain. Now the question is : « Do we observe the same results on the total clients number of the bank ? », it means : « In general, does the country have a significant effect on the clients number leaving bank ? ». Germany has the largest number of clients leaving the bank so the idea is : « Why would we need to compare the 3 countries at the same time ? ».

If we do an A/B test statistical test with Germany and France and we get a significant difference in the clients number leaving the bank between these 2 countries, then that would mean that in general, the country has a significant effect on the clients number who bank. Indeed, if we find by comparing Germany and France that the Germans are more likely to leave the bank than the French, we can consider that Spain will not change anything. Germans will always be more likely to leave the bank than the French. Maybe there will be a different relationship between Germany and Spain but there will always be a statistically significant difference between France and Germany with a larger number of clients leaving the bank in Germany than France.

Here is a way to confirm that this logic is true. There is a test and the participants of this test are German, Spanish and French. Imagine that this test was done without looking at what is happening in Spain. Now you get the result and you ask yourself the question : « Would the results changed if you added Spain ? ». The answer is « no » because there is no interdependence between Germany, Spain and France. That is, the decision to leave the bank in France and Germany doesn’t depend on Spain. And therefore, it’s quite correct to separate the categories by putting 1 aside to compare the 2 others. And as now we have 2 categories, we can do a Chi-square test with the online tool that we used in the previous article.

So let’s go back to our worksheet and put a country aside to compare only 2 countries. Select « Country » tab.

What we observe is that the difference between Spain and France is very small, so it wouldn’t be interesting to do a Chi-square test between Spain and France. It’s more interesting to do a Chi-square test between Germany and France and to prove that there is a statistically significant difference between these 2 countries. This will be enough to conclude that the country has a statistically significant impact on the clients number who leave the bank.

Selects « Country Actual » tab.

We will use the online tool of the previous article, click here  .

We will make a copy of « Country Actual » to have a bar chart with absolute values. Select « Country Actual », right-click and select « Duplicate ».

In « Show Me », select « horizontal bars ».

Removes « SUM (Number of Records )» from « Columns » and removes « Exited » and « Geography » from « Rows ».

In « Dimensions », move « Geography » in « Columns ».

In « Measures », move « Number of Records » to « Rows ».

In « Measures », move « SUM(Number of Records) » in « Label ».

In « Dimensions », move « Exited » in « Label ».

In « Dimensions », move « Exited » in « Colors ».

We also need total absolute values, which means the total number of men and women. There is a very fast way to get that. Right-click on the vertical axis and select « Add Reference Line ».

Then in « Value », click on the drop-down on the right and select « Sum » to have the total sum of the observations.

And in « Scope », you select « Per Cell » option to specify that you want the total sums for each category, male and female.

Now, we have the total sum at the top of the bars. We will modify labels to have the absolute values. In « Label », we will change « Computation » to « Value » and click on the « OK » button.

Here’s how to enter the data :

For « Sample1 » in #success, you enter 810 because there are 810 people who left the bank. For « Sample1 » in #trials, you enter 5014 because there are 5014 people in total.

For « Sample2 » in #success, you enter 814 because there are 814 people who left the bank. For « Sample2 » in #trials, you enter 2509 because there are 2509 people in total.

Here is the verdict : « Sample2 is more successful ». « Sample2 » corresponds to German’s clients and #success is :« yes, the client left the bank ». This verdict means that of all the clients from German are more likely to leave the bank than clients from France. And look, there is something important, it’s « p<0.001 ». This means that the « p » is strictly less than 0.001. As you can see, « p » value is very small, which concludes that the tests are statistically significant.

Ooh, there’s another thing I wanted to show you with the tab « age » with the 2 bar charts in parallel.

As you can see, there are many categories (more than 5) because each category corresponds to a 5-year ago group with clients of the bank aged from 15 to 90 years old. This is a lot of comparison but it would be a good exercise for you to find what are the 2 categories to compare that shows that there is a significant statistic difference.

I give you a hint, compare slices from 50 to 54 years old or from 35 to 39 years olds. In fact, you should compare all peer categories where you observe difference on this basic A/B test. Do a basic A/B test with absolutes values. Then do a Chi-square test to check if the difference is statistically significant, I mean, if the result is valid for the total number of bank’s clients.

This is a way to statistically validate the insights we see onTableau. You see, it’s not very difficult and it’s effective. Here is a way to find insights on Tableau and validate them.

-Steph

## Validate Data Mining In Tableau With A Chi-Square Test

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

In this article we will start using statistics. Don’t worry we’ll do something simple, we’ll use the Chi-square test in a basic way. There is a special section to learn how to do statistics at an advanced level.

I’ll explain why we’re going to learn how to use the Chi-square test. The results we have with theses 2 bar charts are good. We see on theses 2 bar charts that age has a significant impact on the rate of client leaving the bank. We also see in which age groups the clients leaves the bank the most and which age groups the clients leave the bank the least. With that we have good insights.

In the A/B test « Gender », we can see that there is a correlation between the male and female sex and the choice to leave the bank. But as I said before, this A/B test is basic. The results of a basic A/B test visually shows us what is probably happenning in reality but we aren’t 100% sure of these results. To validate these results, we need do to use statistical tests like Chi-square test.

Doing a report based on basic A/B test is very risky and you can have completely false insights. I don’t advise you to do it (unless you want to leave your job). It’s for this reason that using Chi-square will help us to have strong insights.

Chi-square will allow us to know if our results are statistically significant. Our results are based on a sample of 10 000 clients and Chi-square test will tell us if these results are due to chance effects or if these results can represent all the client of the bank.

For example in our A/B test « Gender », we observed that in our sample of 10 000 clients, women are more likely to leave the bank compared to men.

Now, we aren’t sure if the results of this sample represent the behavior of all the bank’s clients.

To use basic Chi-square test, we use an online tool. Click here  .

On internet, there are plenty of websites to do a Chi-square test but we’ll use this one so that you can understand how it works. To do a Chi-square test, we need to use absolute values and in our A/B test we have percentage.

Let’s go back to Tableau. We’ll create a new tab with a version of A/B test with absolute values. In this way, we keep the A/B test with the percentages. Do a right-click on the « Gender » tab and select « Duplicate ».

Name the new tab « Gender Actual » to specify that it’s absolute values.

To have the absolute values, move « Number of Records » in « Measures » to the « Marks » area and put it over top of « SUM(Number of Records ».

Move « Number of Records » in « Measures » to « Rows » over « SUM(Number of Records ».

Cool, we have our absolute values.

We also need total absolute values, which means the total number of men and women. There is a very fast way to get that. Right-click on the vertical axis and select « Add Reference Line ».

Then in « Value », click on the drop-down on the right and select « Sum » to have the total sum of the observations.

And in « Scope », you select « Per Cell » option to specify that you want the total sums for each category, male and female.

Now, we have the total sum at the top of the bars. We will modify labels to have the absolute values. In « Label », we will change « Computation » to « Value » and click on the « OK » button.

Perfect, we have the total amount of observation at the top of each bar : 4543 women and 5457 men. We have what we need to use our online tool.

OK, I’ll explain how this tool works. « Sample1 » and « Sample2 » correspond to the independent variable « Gender ». You choose in which order you enter the data, « Sample1 » for men or the opposite. In our case, we use « Sample1 » for women and « Sample2 » for men.

« #success » corresponds to the result Y=1, which means in our case « yes, the client left the bank ».

« #trials » is the total number of observations, which means the total number of women in « Sample1 » and the total number of men « Sample2 ».

That’s how you enter the data :

• For « Sample1 » in #success, you enter 1139 because there are 1139 women who left the bank. For « Sample1 » in #trials, you enter 4543 because there are 4543 women in total.

• For « Sample2 » in #success, you enter 898 because there are 898 men who left the bank. For « Sample2 » in #trials, you enter 5457 because there are 5457 men in total.

Here is the verdict : « Sample1 is more successful ». « Sample1 » corresponds to women and #success is :« yes, the client left the bank ». This verdict means that of all the bank’s client, women are more likely to leave the bank than men. And look, there is something important, it’s « p<0.001 ». This means that the « p » is strictly less than 0.001.

« p » is the value that indicates whether an independent variable has a statistically significant effect on a dependent variable. In our case, the independent variable is « Gender » and the dependent variable is « Exited », which is : « yes, the client left the bank ». So « p » is strictly less than 0.001, which means that the independent variable « Gender » has a statistically significant effect on the dependent variable « Exited ». This shows us that out of the total number of bank’s clients, women are more likely to leave the bank than men.

This is how we use Chi-square test with this online tool. This is the same principle on all online tools that you can find on Google or DuckDuckGo . You can repeat these instructions that I gave you with other tools, you will get the same results.

It’s cool with the Chi-square we validated the A/B test and to specify that this A/B test is validated, we’ll color the tab in green.

Right-click on the tab, select « Color » and select « Green ».

Perfect, now we’ll validate another A/B test. Selects « HasCreditCard » tab.

We’re going to create an A/B test « HasCreditCard » only with absolute values. To save time, right-click on « Gender Actual » tab and select « Duplicate ».

We’ll remove the green color on the tab « Gender Actual (2) ». Right-click on the tab and select « Color » and « None ».

You rename the tab « HasCreditCard Actual ».

Move the variable « HasCrCard » over « Gender » in « Columns ».

Excellent, everything is ready to do a Chi-square test. We’ll remove « Exited » labels to better see the absolutes values. Make a click and drag out.

Perfect, let’s go back to our online tool. In this case, « Sample1 » is « no », which means client who don’t have credit card and « Sample2 » for « yes », which means clients who have a credit card.

That’s how you enter the data :

• For « Sample1 » in #success, you enter 613 because there are 613 clients who left the bank. For « Sample1 » in #trials, you enter 2945 because there are 2945 clients who don’t have a credit card.
• For « Sample2 » in #success, you enter 1424 because there are 1424 clients who left the bank. For « Sample2 » in #trials, you enter 7055 because there are 7055 clients who have a credit card.

Let’s look at the verdict, it’s « No significant difference ». « p » value is very high, it’s above 5%. This confirms that the independent variable « HasCrCard » has no statistically significant effect on the dependent variable « Exited ». That was the conclusion we had made when we had done the A/B test with percentages.

We had seen that there was 21% of « Exited » (clients who left the bank) in the category « no » and 20% in the category « yes ». With these results we concluded that most likely the variable « HasCrCard » had no impact on the rate of clients who left the bank. Chi-square test confirms our conclusion and we can put the tab « HasCrCard » in green to say that it’s OK.

Right-click on the tab « HasCreditCard » => « Color » => « Green ».

Excellent, now, you can do a statistical A/B test with 2 categories. Soon, we will do statistical A/B tests with more than 2 categories.

-Steph

## Good Use Of Book Information

I watched an Olivier Roland’s video  and I learned good stuff.

I bought a lot of books to learn new skills but sincerely, there is little that I have put in place concretely in my life. I took notes, made summaries but it was alwys theoretical.

Here another way to take notes. It has nothing to do with taking the maximum information from the book but rather taking as much information as possible to use it effectively in our lives as quickly as possible.

That things be clear, I don’t talk about science fiction books but practical books that teach us to have a new skill.

A pratical book allows us to take action more effectively than if we hadn’t read the book.

Of course by reading we’ll take notes but we’ll not take notes like at school. We’ll highlight actions we can put in place today in our lives.

# Study Case

At this momet I’m reading a book about Tai-Chi. I’m learning to meditate and for years I have been pacticing martial arts so Tai-Chi is perfect for me. Each time that I see an action that I can use today in my life, I put the infomation in highlight. I put this information in uppercase, bold, underlined or in a different color. By this way I’ll read my notes later, this information will jump to my eyes.

We can take notes to have theoretical knowledge, there is no problem.

Once we’ll finish the book when we’ll read back our notes, we can immediately see actions we can use in our lives. The concepts is to choose actions that are the simplest and the fastest to set up in our lives.

# Baby step

The best strategy to start is to take only one action. This action is the most interesting and the easiest to set up in your life.

The problem that we have to use book’s information, it’s that we are too ambitious in the first step. We want realize great things in the first step so we don’t have the experience yet.

To avoid falling into this trap and be disappointed, imagine that your evolution is like baby steps. Choose an interesting and easy action to set up in your life today. It’s today that your work on it and not tomorrow or after tomorrow. By this way, you can immdiately adjust this action to improve it step by step.

What is the most ambitious action you wanted to do right after reading a book ?

-Steph

Smartphone apps :

To track my calories, I use MyFitnessPal

To track my training program, I use Jefit

## From Exercise Machines To Free Weights

I read a Nerd Fitness article  and there is good stuff.

Very often in gym’s advertisements, we watch a lot of high-tech machines to do exercises and the free weight zone is a small hidden room.

What I see in my gym, it’s people start their training with a little bit cardio and they use machines. They do this because it’s easier and this look more secure. Unfortunately, this is not the reality. The Matrix trap us with these machines.

Attention : If you have a medical prescription or you need to use machines for a special reason, continue to follow your training program prescribed by your doctor or physiotherapist.

# Why

The truth is that machines force muscles and joints to make movements that aren’t natural.

Machines force your body to move weights in a single pattern (up – down or left – right). Our bodies naturally don’t move like it, this is the problem. Movements or our bodies aren’t straight lines, it’s rather « S ». The result is that machines develop our muscles in a way that isn’t balanced and they endanger our joints and spine.

Like you can see, machines don’t put your body in safety and in addition, they don’t work your stabilizing muscles. Don’t work stabilizing muscles is very bad for everyday activities.

When you bent forward to take something on the floor, you body use a dozen and a dozen muscles at the same time. But since you no longer work your stabilizing muscles because of machines, your muscles don’t know how to work together so your body uses a muscle.

Study case

« Smith Machine » is famous to do squat in safety so it protects your back. Lie ! This machine is perfect to destroy your back and compressed your spine because this machine force your body to move only up and down. The real squat allow to do the fundamental movement and when you do it, you notice that your body doesn’t do a straight line from up to down. With the real squat you do a natural movement.

Do exercise with weight or with bodyweight is more efficient to burn calories than machines.

# Full body

All exercises classified in 2 categories : pull and push

I remember, I did a full body training program for 1 year before lift weight. This is what my Taekwondo’s teacher advised me and he was right (wow, Korean wisdom). A full body training program, 3 times per week is more efficient than a machines training program, 5-6 times per week for beginners. If you want to train more, you can do Tai-Chi or walking during rest days.

Full body is important

With full body, you learn to each body’s muscle to work together, you synchronize you body. This is allow you to be healthy, avoid using too much a muscle and avoid injury due to a weak stabilizing muscle. To maximize your body, eat healthy and you’ll burn more fat, build muscle and build a body of which you’re proud.

What is great with a full body training program, it’s when you skip a training day, it doesn’t matter. You’ll not have certain muscle more developed than any others because each training session work all body’s muscles

# Bodyweight (full body)

Mastering basics movements with bodyweight is perfect to prepare you for a training with weights.

It’s primitive movements therefore natural that allow you to become stronger without material.

The best example are gymnasts. Look gymnasts body, it’s amazing, impressive and the majority of their training are with bodyweight exercises.

Basic exercises

• Push = push up, dips, handstands

• Pull = pull up, body rows

• Legs = bodyweight squat, pistols squats, lunges, box jumps

For each exercise, you do 2-4 sets of 8-10 repetitions.

No it’s too simple  ! => Do this training program and if it’s too easy, increase the difficulty. You can do push up with one hand or pull up with one hand (like my friend Inti). You can also add more set ou repetition.

No it’s too difficult => Do this training program and if it’s too difficult, decrease the difficulty. You can do sets of 3 repetitions. The principle is you progress to do 4 sets of 10 repetitions. Take your time but be regular.

No I want abs => When you do squat, push up and pull up, you keep your core tight and by this way you work your abs. But if you want to do an exercise specially for abs, you can to plank and side plank.

# Strength training (full body)

It doesn’t matter whether you’re a man or a woman, dumbbells and barbells are your friends.

Basic exercise

• Push = Bench press, Overhead Press

• Pull = Deadlift, Bent Over Row

For each exercise, you do 2-4 sets of 8-10 repetitions.

No I don’t like squat => Do you think squat is an useless exercise ? I advise you to read the book of Mark Rippetoe. If you seriously want to train, this book is for you !

No I afraid of weights => Don’t panic. Most people in free weights section are too busy to look at themselves in the mirror, they will not look at you. You can use a barbell or dumbbells for each exercise to improve your pure strength. I advise you dumbbells.

No I’m afraid of being ridiculous => Everybody don’t care if you life a dumbbell of 4kg (8.8lbs) or you squat 181kg (400lbs) because everybody struggle to do their training program. Forget people, stay focus and do you training program until the end.

But if I do bad => At the beginning use light weight to do movements with the good form, the good technique. When you’ll stronger, you’ll can add weight slowly each week to reach your limit. You can hire a personal trainer for 1-2 sessions to learn and improve basic exercises techniques or form.

No I want only lose weight => It’s simple, you do basic exercises and you eat less. The training is the same for lose weight or gain muscle. It’s diet that make the difference ! It’s a science :

• Eat more calories than you burn = gain weight

• Eat less calories than you burn = lose weight

Your physical condition, it’s 80 % diet and 20 % training. If you want bulk, you need to have a specific diet to do this.