Chi-Square Test With More Than 2 Categories

tableau chi square test

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

In this article, we will do a Chi-square test with more than 2 categories. We will use the A/B test « Country » which has 3 categories which corresponds to 3 countries : German, Spain and France. Select « Gender Actual » tab, make a copy with a right-click and select « Duplicate ».

tableau chi square test

Name the tab « Gender Actual (2) » by « Country Actual ».

tableau chi square test

In « Dimensions », move the variable « Geography » over « Gender » in « Columns » to replace « Gender » with « Geography ».

tableau chi square test

tableau chi square test

Here’s how to do an A/B statistical test when there are 3 categories. We’ll start with the classic method and then I’ll show you another way to do Chi-square test with any number of categories.

Let’s start with the classical method. In this case, there are 3 categories so we can’t use the online tool of the previous article. In the previous article we used an online tool with only 2 categories « Sample1 » and « Sample2 ». That’s why we’re going to use another online tool, click here  .

tableau chi square test

In this online tool, we can enter the values without using the total values. That is, we enter only the number of observations in each category. We simply need to enter the values that are on our A/B test. And I’m going to show you how to turn our A/B test into a table. In this way, it will be easier to enter the values in the online tool without making any mistakes.

Go to the « Show me » tool at the top right.

tableau chi square test

Click on « text tables »

tableau chi square test

tableau chi square test

Click on « Swap Rows ans Columns » button.

tableau chi square test

tableau chi square test

Cool, now you have a table arranged in exactly the same way as the online tool.

In the online tool, we will select 2 rows and 3 columns.

tableau chi square test

As we have 3 categories and 2 possible results, we enter our values exactly as in the table we just created on Tableau.

tableau chi square test

Perfect, our table is ready. You can click on the « Calculate » button.

tableau chi square test

tableau chi square test

As you can see, we observe the same thing as the other online tool. There is our indicator « p » value which is less than 5%. Which means there is a meaning.

tableau chi square test

This statistical significance means that these results are valid for the total number of the bank’s clients and not just for the sample of 10 000 clients. We observe similar differences with A/B test « Country » whose results are based solely on the sample of 10 000 clients. We can conclude that in the total number of the bank’s clients, it’s the clients in Germany who are more likely to leave the bank. This is how we do things cleanly.

You saw, this online tool limited by 5 by 5 tables so you can’t use this tool when you have 6 categories or more. But fortunately it’s possible to do Chi-square test with any number of categories. It’s a special method and for you to understand that, I’ll give you a theoretical explanation.

Here we have 3 countries : German, Spain and France.

tableau chi square test

What we’re trying to compare is the clients number leaving the bank in each of these countries.

tableau chi square test

With our basic A/B test based on a sample of 10 000 clients, we obtained 16% for France, 32% for Germany and 17% for Spain. Now the question is : « Do we observe the same results on the total clients number of the bank ? », it means : « In general, does the country have a significant effect on the clients number leaving bank ? ». Germany has the largest number of clients leaving the bank so the idea is : « Why would we need to compare the 3 countries at the same time ? ».

tableau chi square test

If we do an A/B test statistical test with Germany and France and we get a significant difference in the clients number leaving the bank between these 2 countries, then that would mean that in general, the country has a significant effect on the clients number who bank. Indeed, if we find by comparing Germany and France that the Germans are more likely to leave the bank than the French, we can consider that Spain will not change anything. Germans will always be more likely to leave the bank than the French. Maybe there will be a different relationship between Germany and Spain but there will always be a statistically significant difference between France and Germany with a larger number of clients leaving the bank in Germany than France.

Here is a way to confirm that this logic is true. There is a test and the participants of this test are German, Spanish and French. Imagine that this test was done without looking at what is happening in Spain. Now you get the result and you ask yourself the question : « Would the results changed if you added Spain ? ». The answer is « no » because there is no interdependence between Germany, Spain and France. That is, the decision to leave the bank in France and Germany doesn’t depend on Spain. And therefore, it’s quite correct to separate the categories by putting 1 aside to compare the 2 others. And as now we have 2 categories, we can do a Chi-square test with the online tool that we used in the previous article.

So let’s go back to our worksheet and put a country aside to compare only 2 countries. Select « Country » tab.

tableau chi square test

What we observe is that the difference between Spain and France is very small, so it wouldn’t be interesting to do a Chi-square test between Spain and France. It’s more interesting to do a Chi-square test between Germany and France and to prove that there is a statistically significant difference between these 2 countries. This will be enough to conclude that the country has a statistically significant impact on the clients number who leave the bank.

Selects « Country Actual » tab.

tableau chi square test

We will use the online tool of the previous article, click here  .

We will make a copy of « Country Actual » to have a bar chart with absolute values. Select « Country Actual », right-click and select « Duplicate ».

tableau chi square test

In « Show Me », select « horizontal bars ».

tableau chi square test

tableau chi square test

Removes « SUM (Number of Records )» from « Columns » and removes « Exited » and « Geography » from « Rows ».

tableau chi square test

tableau chi square test

In « Dimensions », move « Geography » in « Columns ».

tableau chi square test

tableau chi square test

In « Measures », move « Number of Records » to « Rows ».

tableau chi square test

tableau chi square test

In « Measures », move « SUM(Number of Records) » in « Label ».

tableau chi square test

tableau chi square test

In « Dimensions », move « Exited » in « Label ».

tableau chi square test

tableau chi square test

In « Dimensions », move « Exited » in « Colors ».

tableau chi square test

tableau chi square test

We also need total absolute values, which means the total number of men and women. There is a very fast way to get that. Right-click on the vertical axis and select « Add Reference Line ».

tableau chi square test

Then in « Value », click on the drop-down on the right and select « Sum » to have the total sum of the observations.

tableau chi square test

And in « Scope », you select « Per Cell » option to specify that you want the total sums for each category, male and female.

tableau chi square test

Now, we have the total sum at the top of the bars. We will modify labels to have the absolute values. In « Label », we will change « Computation » to « Value » and click on the « OK » button.

tableau chi square test

tableau chi square test

tableau chi square test

Here’s how to enter the data :

For « Sample1 » in #success, you enter 810 because there are 810 people who left the bank. For « Sample1 » in #trials, you enter 5014 because there are 5014 people in total.

For « Sample2 » in #success, you enter 814 because there are 814 people who left the bank. For « Sample2 » in #trials, you enter 2509 because there are 2509 people in total.

tableau chi square test

Here is the verdict : « Sample2 is more successful ». « Sample2 » corresponds to German’s clients and #success is :« yes, the client left the bank ». This verdict means that of all the clients from German are more likely to leave the bank than clients from France. And look, there is something important, it’s « p<0.001 ». This means that the « p » is strictly less than 0.001. As you can see, « p » value is very small, which concludes that the tests are statistically significant.

Ooh, there’s another thing I wanted to show you with the tab « age » with the 2 bar charts in parallel.

tableau chi square test

As you can see, there are many categories (more than 5) because each category corresponds to a 5-year ago group with clients of the bank aged from 15 to 90 years old. This is a lot of comparison but it would be a good exercise for you to find what are the 2 categories to compare that shows that there is a significant statistic difference.

I give you a hint, compare slices from 50 to 54 years old or from 35 to 39 years olds. In fact, you should compare all peer categories where you observe difference on this basic A/B test. Do a basic A/B test with absolutes values. Then do a Chi-square test to check if the difference is statistically significant, I mean, if the result is valid for the total number of bank’s clients.

This is a way to statistically validate the insights we see onTableau. You see, it’s not very difficult and it’s effective. Here is a way to find insights on Tableau and validate them.

Subscribe to my newsletter and share this article if you think it can help someone you know. Thank you.

-Steph

Validate Data Mining In Tableau With A Chi-Square Test

validate validation

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

In this article we will start using statistics. Don’t worry we’ll do something simple, we’ll use the Chi-square test in a basic way. There is a special section to learn how to do statistics at an advanced level.

I’ll explain why we’re going to learn how to use the Chi-square test. The results we have with theses 2 bar charts are good. We see on theses 2 bar charts that age has a significant impact on the rate of client leaving the bank. We also see in which age groups the clients leaves the bank the most and which age groups the clients leave the bank the least. With that we have good insights.

In the A/B test « Gender », we can see that there is a correlation between the male and female sex and the choice to leave the bank. But as I said before, this A/B test is basic. The results of a basic A/B test visually shows us what is probably happenning in reality but we aren’t 100% sure of these results. To validate these results, we need do to use statistical tests like Chi-square test.

Doing a report based on basic A/B test is very risky and you can have completely false insights. I don’t advise you to do it (unless you want to leave your job). It’s for this reason that using Chi-square will help us to have strong insights.

Chi-square will allow us to know if our results are statistically significant. Our results are based on a sample of 10 000 clients and Chi-square test will tell us if these results are due to chance effects or if these results can represent all the client of the bank.

For example in our A/B test « Gender », we observed that in our sample of 10 000 clients, women are more likely to leave the bank compared to men.

tableau data mining science chi square test a/b test

Now, we aren’t sure if the results of this sample represent the behavior of all the bank’s clients.

To use basic Chi-square test, we use an online tool. Click here  .

tableau data mining science chi square test a/b test

On internet, there are plenty of websites to do a Chi-square test but we’ll use this one so that you can understand how it works. To do a Chi-square test, we need to use absolute values and in our A/B test we have percentage.

Let’s go back to Tableau. We’ll create a new tab with a version of A/B test with absolute values. In this way, we keep the A/B test with the percentages. Do a right-click on the « Gender » tab and select « Duplicate ».

tableau data mining science chi square test a/b test

Name the new tab « Gender Actual » to specify that it’s absolute values.

tableau data mining science chi square test a/b test

To have the absolute values, move « Number of Records » in « Measures » to the « Marks » area and put it over top of « SUM(Number of Records ».

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Move « Number of Records » in « Measures » to « Rows » over « SUM(Number of Records ».

tableau data mining science chi square test a/b test

Cool, we have our absolute values.

tableau data mining science chi square test a/b test

We also need total absolute values, which means the total number of men and women. There is a very fast way to get that. Right-click on the vertical axis and select « Add Reference Line ».

tableau data mining science chi square test a/b test

Then in « Value », click on the drop-down on the right and select « Sum » to have the total sum of the observations.

tableau data mining science chi square test a/b test

And in « Scope », you select « Per Cell » option to specify that you want the total sums for each category, male and female.

tableau data mining science chi square test a/b test

Now, we have the total sum at the top of the bars. We will modify labels to have the absolute values. In « Label », we will change « Computation » to « Value » and click on the « OK » button.

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Perfect, we have the total amount of observation at the top of each bar : 4543 women and 5457 men. We have what we need to use our online tool.

tableau data mining science chi square test a/b test

OK, I’ll explain how this tool works. « Sample1 » and « Sample2 » correspond to the independent variable « Gender ». You choose in which order you enter the data, « Sample1 » for men or the opposite. In our case, we use « Sample1 » for women and « Sample2 » for men.

« #success » corresponds to the result Y=1, which means in our case « yes, the client left the bank ».

« #trials » is the total number of observations, which means the total number of women in « Sample1 » and the total number of men « Sample2 ».

That’s how you enter the data :

  • For « Sample1 » in #success, you enter 1139 because there are 1139 women who left the bank. For « Sample1 » in #trials, you enter 4543 because there are 4543 women in total.

 

  • For « Sample2 » in #success, you enter 898 because there are 898 men who left the bank. For « Sample2 » in #trials, you enter 5457 because there are 5457 men in total.

tableau data mining science chi square test a/b test

Here is the verdict : « Sample1 is more successful ». « Sample1 » corresponds to women and #success is :« yes, the client left the bank ». This verdict means that of all the bank’s client, women are more likely to leave the bank than men. And look, there is something important, it’s « p<0.001 ». This means that the « p » is strictly less than 0.001.

tableau data mining science chi square test a/b test

« p » is the value that indicates whether an independent variable has a statistically significant effect on a dependent variable. In our case, the independent variable is « Gender » and the dependent variable is « Exited », which is : « yes, the client left the bank ». So « p » is strictly less than 0.001, which means that the independent variable « Gender » has a statistically significant effect on the dependent variable « Exited ». This shows us that out of the total number of bank’s clients, women are more likely to leave the bank than men.

This is how we use Chi-square test with this online tool. This is the same principle on all online tools that you can find on Google or DuckDuckGo . You can repeat these instructions that I gave you with other tools, you will get the same results.

It’s cool with the Chi-square we validated the A/B test and to specify that this A/B test is validated, we’ll color the tab in green.

Right-click on the tab, select « Color » and select « Green ».

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Perfect, now we’ll validate another A/B test. Selects « HasCreditCard » tab.

tableau data mining science chi square test a/b test

We’re going to create an A/B test « HasCreditCard » only with absolute values. To save time, right-click on « Gender Actual » tab and select « Duplicate ».

tableau data mining science chi square test a/b test

We’ll remove the green color on the tab « Gender Actual (2) ». Right-click on the tab and select « Color » and « None ».

tableau data mining science chi square test a/b test

You rename the tab « HasCreditCard Actual ».

tableau data mining science chi square test a/b test

Move the variable « HasCrCard » over « Gender » in « Columns ».

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Excellent, everything is ready to do a Chi-square test. We’ll remove « Exited » labels to better see the absolutes values. Make a click and drag out.

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Perfect, let’s go back to our online tool. In this case, « Sample1 » is « no », which means client who don’t have credit card and « Sample2 » for « yes », which means clients who have a credit card.

That’s how you enter the data :

  • For « Sample1 » in #success, you enter 613 because there are 613 clients who left the bank. For « Sample1 » in #trials, you enter 2945 because there are 2945 clients who don’t have a credit card.
  • For « Sample2 » in #success, you enter 1424 because there are 1424 clients who left the bank. For « Sample2 » in #trials, you enter 7055 because there are 7055 clients who have a credit card.

tableau data mining science chi square test a/b test

Let’s look at the verdict, it’s « No significant difference ». « p » value is very high, it’s above 5%. This confirms that the independent variable « HasCrCard » has no statistically significant effect on the dependent variable « Exited ». That was the conclusion we had made when we had done the A/B test with percentages.

We had seen that there was 21% of « Exited » (clients who left the bank) in the category « no » and 20% in the category « yes ». With these results we concluded that most likely the variable « HasCrCard » had no impact on the rate of clients who left the bank. Chi-square test confirms our conclusion and we can put the tab « HasCrCard » in green to say that it’s OK.

Right-click on the tab « HasCreditCard » => « Color » => « Green ».

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Excellent, now, you can do a statistical A/B test with 2 categories. Soon, we will do statistical A/B tests with more than 2 categories.

Share this article if you think it can help someone you know. Thank you.

-Steph

Create Bins and View Distributions

tableau, bins, bar, chart, distribution, age, data, science

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

It’s cool, you finished the 1st part. Now we’re going to do more deep Data Mining analysis with this bank’s dataset.

tableau, bins, bar, chart, distribution, age, data, science

To make these analyzes more deep, we’ll create a more statistical approach.

To do that we will create a new tab.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

For this new tab, we want to understand how client distributed according to their age. Is there a majority of young or old people ?

tableau, bins, bar, chart, distribution, age, data, science

Move the variable « Age » in « Columns ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

As we want to see the distribution of client ages, we need to use the variable « Number of Records » to see the number of observations. Move the variable « Number of Record » to « Rows ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

Boom, we have a chart but there is only one point on the top right. What happened is that Tableau took the sum of the ages of all the bank’s clients and the sum of all the « Number of Records », it means the total number of clients, 10 000 clients.

We’ll find a solution but before we’ll change the format to better see the chart. Right-click in the middle of the chart and select « Format ».

tableau, bins, bar, chart, distribution, age, data, science

For the font’s size, select « 12 ».

tableau, bins, bar, chart, distribution, age, data, science

Here you can see that the total age is 39 218 but that’s not what we’re looking for. What we want to see is the number of clients for each age.

I’ll explain what’s going on. We took the aggregated sums of our variables. Aggregate means that we took the total sum of the variable for each category. We added the ages but in fact we want to see the total number of observations for each age separately.

To have that, just click on the arrow in « SUM(Age) » in « Columns ».

tableau, bins, bar, chart, distribution, age, data, science

Then select « Dimensions »

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

You see, Tableau doesn’t take the aggregated sum of ages but it takes ages separately. We have a curve that shows us the continuous distribution of our clients ages. That is to say, for each age, the curve gives is the number of clients of this age.

We’ll look at the dataset. Right-click on « Churn Modelling » and select « View Data… ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

There is window that appears that shows us the data in detail. If you scroll to the right, you will find the column « Age ».

tableau, bins, bar, chart, distribution, age, data, science

We see that the ages rounded. As all ages rounded, Tableau is able to group clients by age. By positioning the mouse on the curve, we can see that there are 200 clients who are 26 years old.

tableau, bins, bar, chart, distribution, age, data, science

If in the dataset, ages weren’t rounded, you would have seen clients with 26.5 or 26.3 years. It would create a lot of irregularity, there would be plenty of spikes with lots of variations.

Oooooh look, there is a variation that isn’t normal.

tableau, bins, bar, chart, distribution, age, data, science

Let’s analyze it in detail. Around this peak, we see that there are 348 clients who are 29 years old.

tableau, bins, bar, chart, distribution, age, data, science

Here, 404 clients who are 31 years old.

tableau, bins, bar, chart, distribution, age, data, science

And this peak down that shows us that there are 327 clients who are 30 years old.

tableau, bins, bar, chart, distribution, age, data, science

How to explain this irregularity ? It’s possible that many people of 29 years old are about to turn 30 years old and many people of 31 years old who just had 31 years old. It’s chance that make us have inaccuracies. You may have other inaccuracies if you data isn’t precise and rounded. In our case, the ages are rounded but we want to get rid of our small irregularity that we see on our curve.

There is way to see our distribution without our irregularities, it’s « bins ». « Bins » consists of grouping the information into different categories. That is we’re going to regroup our clients in different age groups.

Right-click on « Age » in « Measures ». Select « Create » and select « Bins… ».

tableau, bins, bar, chart, distribution, age, data, science

A window appears. We’ll group our clients in 5-years increments. In « Size of bins », write « 5 » and click on the « OK » button.

tableau, bins, bar, chart, distribution, age, data, science

As you can see, the variable « Age » has remained in « Measures » but there is a new variable in « Dimensions ».This is the variable we created « Age(bins) ».

tableau, bins, bar, chart, distribution, age, data, science

Our « Age(bins) » variable was correctly placed in « Dimensions » because it is a category variable because each category corresponds to a 5-year age group.

For example, one category is 20 to 24 age group. Now we’ll create a new distribution based on « bins ».

To do that, we’ll remove the variable « Age » from « Columns » with a click and drag outside.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

You move the variable « Age(bins) » from « Dimensions » to « Columns ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

Note

In this case, it’s not possible to directly replace « Age » by « Age(bins) » over « Age » on « Columns ». This is because « Age » is a measure and « Age(bins) is a dimension.

That’s nice distribution, it’s usually the type of distribution (chart) we see in economics or mathematics. The difference with the old chart is that this chart is discrete. This chart is discrete because the clients grouped by age group while the previous chart was continuous.

On this distribution (chart), each bar corresponds to an age range. For example, this bar corresponds to the 25-29 age group.

tableau, bins, bar, chart, distribution, age, data, science

Now, we’ll change the colors.

In « Row », move « SUM(Number of Record) » while holding down the « Ctrl » or « Command » key on your keyboard to « Colors ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

We get our distribution in blue but we’ll change the color to red. Click on « Colors » and click on « Edit Colors »

tableau, bins, bar, chart, distribution, age, data, science

In the window that appears, click on the blue square on the right to display the color pallet.

tableau, bins, bar, chart, distribution, age, data, science

Select the red color and click on the « OK » button.

tableau, bins, bar, chart, distribution, age, data, science

Click on the « OK » button of the « Edit Colors » window.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

To facilitate the reading of the bar chart, we’ll add the number of clients in each age group. In « Row », move « SUM (Number of Record) » while holding the « Ctrl » or « Command » key on your keyboard to « Label ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

That’s it, we can see how many clients there are in each age group.

We see that the dominant bar is the 35-39 age bracket and the second dominant bar is the 30-34 age bracket. Overall, we can see that most clients are between 25 and 40 years old, which seems consistent.

On our bar chart, we have absolute values. We’ll replace that with percentages. Click in the little arrow in « SUM(Number of Records) » in « Label » and you select « Add Table Calculation… » but I’ll show you another way to do it.

tableau, bins, bar, chart, distribution, age, data, science

Instead of clicking « Add Table Calculation… », click on « Quick Table Calculation » and select « Percent of total ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

It’s cool, we have the exact percentage of people in each age bracket. Now, we can see that in the 25 to 40 age group, we have 20 + 23 +17= 60% of clients.

I’ll show you one last thing.You can change the size of the slices easily, just click on « Age(bins) » and select « Edit ».

tableau, bins, bar, chart, distribution, age, data, science

In the windows, you can change the size of the slices (bins). Put « 10 » instead of « 5 » to get 10-years slices. Click on the « OK » button.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

Now, we have a distibution with fewer slices and the dominant slice is 30 to 39 years old.

Well, it was just to show you how to change the size of bins. To go back to the old distribution with the 5-years slices, click on « Back » button.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

As you can see, the values on bars are in percentages but the values on the axis are in absolutes values. Here is an exercise that I ask you to do : « Put the values of the axis in percentage ». I’ll give you the answer the next article.

Share this article if you think if can help someone you know.Thank you.

-Steph

Add a Reference Line

reference line tableau data science

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

In the previous article we learned how to work with aliases. We will learn how to add a reference line in our bar chart.

Before I start, I’ll show you a trick in Tableau.

In our bar chart we can see the labels in this order : percentage and below : « Stayed » or « Exited ».

We will reverse this order. You go in this rectangle.

reference line tableau data science

And you place the label « Exited » above the label « SUM(Number of Records ».

reference line tableau data science

Look, the label « Stayed » is above percentage.

reference line tableau data science

With that, we can understand the bar chart more easily.

Let’s add a reference line, let’s go . But before, I think you’d like to know why I’m talking to you about a reference line.

A reference line helps us to compare bar chart results with a benchmark. This benchmark is represented by this reference line.

In our case, the benchmark is the percentage of clients who left the bank in our sample of 10 000 people.

The first thing to do is find this percentage in our bar chart. To be able to do that, remove « Gender » from « Columns ».

reference line tableau data science

Boom, we have a new bar chart.

reference line tableau data science

Look, we only have the percentage of clients who left the bank and the percentage of clients who stayed in the bank.

We see that on our sample of 10 000 people, there are 20% of the clients who left the bank and 80% of the clients stayed in the bank. This means that the churn rate (client departure rate) is 20%.

What we’re going to do is we will add this churn rate in our A/B test. To return to our A/B test, press 2 times on Ctrl+Z or Command+Z or you can click 2 times on the « Back » button in the menu bar.

reference line tableau data science

Now we know that the average clients who left the bank is 20%.

We will add a horizontal line in the Y axis (Y = 20%) to compare the 20% of the churn rate and the 2 categories male and female.

Let’s go. Right-click on the vertical axis (Y axis) and select « Add Reference Line ».

reference line tableau data science

A window appears with several options.

reference line tableau data science

You have the choice to add a line, a band, a distribution or a box plot.

We will use the line for the entire table.

Click on the « Line » button and activate the « Entire Table » checkbox. In « Value » selects « Constant ».

reference line tableau data science

The constant is 20%, so it’s necessary that you put 0.20 in « Value ».

reference line tableau data science

It’s possible to put a label on this reference line. For example, if the line reference corresponds to a formula, the label displays the formula. But for our case, our constant is 20% and it’s already displayed on the vertical axis so we will select « None ».

reference line tableau data science

For the format of the line, select the continuous line and click on the « OK » button.

reference line tableau data science

We have our reference line is added to our chart.

reference line tableau data science

Here is what we can see. Female clients are more likely to leave the bank than average clients. Male clients are less likely to leave the bank than average clients. 

In our case, it’s obvious to see that because there is only 2 categories, men and women.

Now you know how to add a reference line in a bar chart.

Share this article if you think it can help someone you know. Thank you.

-Steph

How To Do Sumo Deadlift

sumo deadlift

I read a Frederic Delavier’s book « Strength Training Anatomy » and I learned good stuff.

Standing with the barbell on the floor in front of you. Your legs spread with your feet outside (always in your knees axis) :

  • Bent your legs to have your thighs in horizontal. Your arms are straight, you take the barbell with a pronated grip. Your hands on the barbell are to your shoulders’ width. You have the possibility to have one hand with a supinated grip and the other with a pronated grip to prevent the barbell from rolling. And with this technique, you can lift an extremely heavy weight.

  • Inhale and block your breath. Arch slightly your back, squeeze your abs and you stretch your legs by straightening your torso to have a vertical position with your shoulder drawn back. Exhale at the end of the movement.

  • Put back the barbell on the floor by blocking your breath.

It’s important to keep your back straight during all the movement to avoid injury.

This exercise works especially quadriceps muscles and adductor muscles.

This exercise works less the back’s muscles than the classic deadlift because the back is less bent at the starting position.

Note

It’s important to lift the barbell in front of your shins at the beginning of the movement.

Do this exercise with light weights and high sets (maximum 10 sets) to strengthen the lumbar’s region by working thighs and gluteus.

If you do this exercise with heavy weight, you need to be careful to not trauma hips joints, adductors muscles, and lumbosacral junction.

The sumo deadlift is one of the 3 powerlifting’s movements.

Share this article if you think it can help someone you know. Thank you.

-Steph

Label And Format

 data science tableau label format

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

Our bar chart has colors by region but imagines that this bar chart is on a wall of an open space or in a report.

With labels, we can make this bar chart more clear, easier to understand.

In this bar chart, there are all necessary information: representative’s names, regions where representatives make sales and total sales for each representative in Swiss francs.

But, there is a problem. For example, if you ask for someone to say how many sales made Bill. This person must find Bill and see on the vertical axis to the left the value. Here we can see, it’s 1750.

But if we take the James case, we see that it’s between 1000 and 1500. James is far from the vertical axis and it’s difficult to say the true value.

That’s means, all people need to make effort to extract the bar chart’s information.

This it should not be the case because a Data Scientist searches always the best ways to communicate the information. This process is to help people to understand and extract the information in the easiest way.

Start with labels.

« Labels » button allows you to add text information in your bar chart.

data science tableau label format

You will add a label with the SUM(TotalSales) information

To do this, you click on SUM(TotalSales) and press and maintain the key Ctrl or Command on your keyboard and drag and drop SUM(TotalSales) on « Label ».

data science tableau label format

Now you can see the total sales value at the top of each bar.

data science tableau label format

The bar chart is easier to read because there is the total value of sales for each representative.

it’s time to add more information using the labels.

Use the « Rep » information. Click on « Rep », press and hold « Ctrl » key or Command key on your keyboard and drag and drop « Rep » to « Label ».

data science tableau label format

Now you can read the representatives names at the top of the bars.

data science tableau label format

You can also add the region. I’ll show you another way to add « Region » in « Labels ». Click « Region » in « Dimensions » and drag and drop « Region » on « Labels ».

data science tableau label format data science tableau label format

 

But it’s redundant because you can read the representatives names below and the regions at the top of the bar chart.

And each region has its own color. As it’s redundant, we remove « Rep » and « Region » from « Labels » by dragging and dropping out.

data science tableau label format data science tableau label format

 

It’s better, it remains only SUM(TotalSales).

data science tableau label format

Let’s go to the next level, we will publish our labels.

To do this, do a right-click on « Labels » and click on « … » button.

data science tableau label format

It allows you to have your own text. For example write « Sales : » and click on « OK » button.

data science tableau label format

Now you can see that your text appears at the top of the bars.

data science tableau label format

Well, click on « Labels » and click on « … » button.

data science tableau label format

Delete the text « Sale : »and click on « OK » button.

data science tableau label format

We will see now how to format your bar chart. This is the last step before your bar chart is in production.

You will change the labels size. Click on « Labels » and click on « Font »

data science tableau label format

Select « 12 » and bold.

data science tableau label format

Oh, you can do the same thing by clicking on « … » button

data science tableau label format

You have the possibility to change the color but we will keep the color black

data science tableau label format

Now you’re going to change the label type. Right-click on SUM(TotalSales) and click on « Format… ».

data science tableau label format

In fact the labels have their own format and you can change that by clicking on « Label » but all the other thing on Tableau give their format options make a right-click on it.

So when you click on « Format », you’ll see 2 tabs : « Axis » and « Pane ».

Select the tab « Pane » because that’s where the labels of our bar chart.

data science tableau label format

By clicking on « Alignement », you can change the text’s direction of the labels.

data science tableau label format

But what you can’t do with the « Labels » button is to change the digital type.

data science tableau label format

Return on the tab « Pane », we’ll change the numbers in currencies. Click « Numbers » and select « Currency(custom) ». You can also change the currency type in the « Prefix/suffix ».

data science tableau label format

To simplify, you delete 2 decimals in « Decimal Places ».

data science tableau label format

As you can see on my bar chart, the SUM(TotalSales) is vertical at the top of each bar. To change the direction of the label text, click « Alignement » in the « Pane » tab.

data science tableau label format

But there is a problem. Some bars don’t have SUM(TotalSales). To fix this, right-click on each bar and select « Mark Label » and « Alwlays Show ».

data science tableau label format

Now, the bar chart is more understandable.

Let’s put the units in thousands. Click on « Numbers » => « Currency(custom) » => « Units » => « Thousands (K) ».

data science tableau label format

Add a decimal in « Decimal Places ».

data science tableau label format

That’s better, we can see Swiss francs sales for each sales representative.

Look, there’s something you need to know You can’t change the size of the text in the tab « Pane ».

If you click on « Font » and change the size, it will not change anything on your bar chart.

data science tableau label format

This is because the font size in the « Label » button dominates the font that is in the tab « Pane ».

data science tableau label format

Ok, we changed the labels format. Now, let’s change the axes format.

To do this, right-click on the vertical axis and select « Format ».

data science tableau label format

Click on the « Axis » tab and change the text size with « Font » to 12.

data science tableau label format

Then, right-click on the horizontal axis. Selects « Format ».

data science tableau label format

And in the « Header » tab, you change the text size with « Font » to 12.

data science tableau label format

Oooh, do you see ? Mathiew is cut off. To arrange this, enlarge the bar chart by clicking and dragging on the right.

data science tableau label format

Right-click on « Central » in the top axis and select « Format ».

data science tableau label format

And changes the text’s size with « Font » to 12 and bold.

data science tableau label format

Now, look at the top of the bar chart. The « Region/Rep » line is useless because we know that Central, East and West are the regions and the representatives names are at the bottom of the bar chart.

data science tableau label format

To change it, right-click on « Region/Rep » and select « Hide Field Label for Columns ».

data science tableau label format

if you want to improve the title « TotalSales » by adding a space, right-click on the vertical axis and select « Edit axis ».

data science tableau label format

In the « General » tab, add a space in the title and click « OK ».

data science tableau label format

Let’s do one more thing. We’re going to put all the « Total Sales » in Swiss francs. Make a right-click on the vertical axis and select « Format ».

data science tableau label format

Click on tab « Axis » => « Numbers » => « Currency(custom) ».

In « Decimal Places », you put « 0 ». In « Units », you put « Thousand(K) ». In « Prefix/Suffix », you put « CHF ».

data science tableau label format

Well, you did a good job. Now you know how to change the format of the charts in Tableau.

Share this article if you think it can help someone you know. Thank you.

-Steph

How To Do Squat

squat

I read a Frederic Delavier’s book « Strength Training Anatomy » and I learned good stuff.

Squat is the #1 exercise for bodybuilding because it works a lot of the muscular system and is great for the cardiovascular system. Squat allows to have a good thoracic expansion and a good respiratory capacity.

  • Standing in front of the barbell resting on the support. Put yourself under the barbell and place the barbell on your trapezius a little higher that the posterior deltoids. Take the barbell with a pronated grip. The spread of the hands is variable according to the morphology. Pull your elbows backwards.

  • Inhale deeply (to maintain an intrathoracic pressure that will prevent your torso from sagging forward). Arch you back slightly, squeeze your abs, look forward and take off the barbell.

  • Back 1 or 2 steps. Stop with your feet parallel (or slightly outward). Your feet are about your shoulders width. Squat down by tilting your back forward (the flexion axis passing through the hip joint). Control the descent without rounding your back to avoid injury.

  • When your femurs arrive horizontally, do an extension of your legs by straightening your torso to return to the starting position. Exhale at the end of the movement.

Squat works mainly quadriceps, glutes, adducteurs, erector spinae, abs and hamstring.

Note

Squat is one of the best moves to develop the gluteal curve.

2 ways to place the barbell

squat barbell posiiton

  1. On trapezius

  1. On deltoids and trapezius like powerlifters

Variants

  1. People with stiff ankles or long femurs can place a wedge under the heels to avoid too much torso inclinaison. This allows to postpone a part of the effort on quadriceps.

  2. The barbell’s position may be on the back (on the posterior deltoids). This reduce the cantilever by increasing the lifting power of the back which allows to take heavier weights. This is a technique used by powerlifters.

  3. It’s possible to do squat with the Smith machine, which makes it possible to avoid the torso inclinaison and to locate the effort on quadriceps.

How to place the feet

The feet position is important during the execution of the classic squat (feet apart at about the shoulders width). Feet should be in parallel or slightly outward. What is most important is to respect the person’s morphology and to place the feet in the physiological axis of the knees. For example, if you walk with your feet out, squat with your feet out.

Different torso’s inclinaison according to the morphologies

squat morpology

  1. Short legs, long torso : slightly inclined torso, weak cantilever

  1. Long legs, short torso : very inclined torso, important cantilever

Good position

squat good position

During the squat, the back should be as straight as possible throughout the movement. According of the morphologies (long/short legs, stiff/flexible ankles) and the different execution’s technique (feet’s position, use of compensated sole, barbell in up/down position), the torso could be very inclined or slightly inclined because flexion is done at the hip joint.

Bad position

squat bad position

It’s necessary not to round the back while performing the squat because this can create injuries in the lumbar region and spinal disc herniation.

Note

To really feel the work of the glutes, it’s necessary to have the thighs horizontally.

1-2-3 : negatives phase

4: full squat

squat full

 

It’s possible to have thighs lower than horizontal to better feel the glutes work but this technique can be done only by people who have short femurs or flexible ankles. It’s necessary to be very careful with the full squat because it is really easy to round the back.

Attention

For all exercises done with very heavy weight, it’s necessary to perform a « blocking » :

  1. Take a deep breath and block the breathing to fill the lungs like a balloon. This stiffens the ribcage and prevents the top of the torso form tilting forward.

  2. Squeeze abs stiffens the belly, This increases the intra-abdominal pressure and prevents the torse form sagging forward.

  3. By slightly arching the lower back with lumbar squeeze, this allows to have the spine’s bottom in extension.

These 3 simultaneous actions is what we call « blocking ». This « blocking » has the function of avoiding the rounding or bending of the spine because with very heavy weights, it can create disc herniations .

 

Subscribe to my newsletter and share this article if you think it can help someone you know. Thank you.

-Steph