Chi-Square Test With More Than 2 Categories

tableau chi square test

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

In this article, we will do a Chi-square test with more than 2 categories. We will use the A/B test « Country » which has 3 categories which corresponds to 3 countries : German, Spain and France. Select « Gender Actual » tab, make a copy with a right-click and select « Duplicate ».

tableau chi square test

Name the tab « Gender Actual (2) » by « Country Actual ».

tableau chi square test

In « Dimensions », move the variable « Geography » over « Gender » in « Columns » to replace « Gender » with « Geography ».

tableau chi square test

tableau chi square test

Here’s how to do an A/B statistical test when there are 3 categories. We’ll start with the classic method and then I’ll show you another way to do Chi-square test with any number of categories.

Let’s start with the classical method. In this case, there are 3 categories so we can’t use the online tool of the previous article. In the previous article we used an online tool with only 2 categories « Sample1 » and « Sample2 ». That’s why we’re going to use another online tool, click here  .

tableau chi square test

In this online tool, we can enter the values without using the total values. That is, we enter only the number of observations in each category. We simply need to enter the values that are on our A/B test. And I’m going to show you how to turn our A/B test into a table. In this way, it will be easier to enter the values in the online tool without making any mistakes.

Go to the « Show me » tool at the top right.

tableau chi square test

Click on « text tables »

tableau chi square test

tableau chi square test

Click on « Swap Rows ans Columns » button.

tableau chi square test

tableau chi square test

Cool, now you have a table arranged in exactly the same way as the online tool.

In the online tool, we will select 2 rows and 3 columns.

tableau chi square test

As we have 3 categories and 2 possible results, we enter our values exactly as in the table we just created on Tableau.

tableau chi square test

Perfect, our table is ready. You can click on the « Calculate » button.

tableau chi square test

tableau chi square test

As you can see, we observe the same thing as the other online tool. There is our indicator « p » value which is less than 5%. Which means there is a meaning.

tableau chi square test

This statistical significance means that these results are valid for the total number of the bank’s clients and not just for the sample of 10 000 clients. We observe similar differences with A/B test « Country » whose results are based solely on the sample of 10 000 clients. We can conclude that in the total number of the bank’s clients, it’s the clients in Germany who are more likely to leave the bank. This is how we do things cleanly.

You saw, this online tool limited by 5 by 5 tables so you can’t use this tool when you have 6 categories or more. But fortunately it’s possible to do Chi-square test with any number of categories. It’s a special method and for you to understand that, I’ll give you a theoretical explanation.

Here we have 3 countries : German, Spain and France.

tableau chi square test

What we’re trying to compare is the clients number leaving the bank in each of these countries.

tableau chi square test

With our basic A/B test based on a sample of 10 000 clients, we obtained 16% for France, 32% for Germany and 17% for Spain. Now the question is : « Do we observe the same results on the total clients number of the bank ? », it means : « In general, does the country have a significant effect on the clients number leaving bank ? ». Germany has the largest number of clients leaving the bank so the idea is : « Why would we need to compare the 3 countries at the same time ? ».

tableau chi square test

If we do an A/B test statistical test with Germany and France and we get a significant difference in the clients number leaving the bank between these 2 countries, then that would mean that in general, the country has a significant effect on the clients number who bank. Indeed, if we find by comparing Germany and France that the Germans are more likely to leave the bank than the French, we can consider that Spain will not change anything. Germans will always be more likely to leave the bank than the French. Maybe there will be a different relationship between Germany and Spain but there will always be a statistically significant difference between France and Germany with a larger number of clients leaving the bank in Germany than France.

Here is a way to confirm that this logic is true. There is a test and the participants of this test are German, Spanish and French. Imagine that this test was done without looking at what is happening in Spain. Now you get the result and you ask yourself the question : « Would the results changed if you added Spain ? ». The answer is « no » because there is no interdependence between Germany, Spain and France. That is, the decision to leave the bank in France and Germany doesn’t depend on Spain. And therefore, it’s quite correct to separate the categories by putting 1 aside to compare the 2 others. And as now we have 2 categories, we can do a Chi-square test with the online tool that we used in the previous article.

So let’s go back to our worksheet and put a country aside to compare only 2 countries. Select « Country » tab.

tableau chi square test

What we observe is that the difference between Spain and France is very small, so it wouldn’t be interesting to do a Chi-square test between Spain and France. It’s more interesting to do a Chi-square test between Germany and France and to prove that there is a statistically significant difference between these 2 countries. This will be enough to conclude that the country has a statistically significant impact on the clients number who leave the bank.

Selects « Country Actual » tab.

tableau chi square test

We will use the online tool of the previous article, click here  .

We will make a copy of « Country Actual » to have a bar chart with absolute values. Select « Country Actual », right-click and select « Duplicate ».

tableau chi square test

In « Show Me », select « horizontal bars ».

tableau chi square test

tableau chi square test

Removes « SUM (Number of Records )» from « Columns » and removes « Exited » and « Geography » from « Rows ».

tableau chi square test

tableau chi square test

In « Dimensions », move « Geography » in « Columns ».

tableau chi square test

tableau chi square test

In « Measures », move « Number of Records » to « Rows ».

tableau chi square test

tableau chi square test

In « Measures », move « SUM(Number of Records) » in « Label ».

tableau chi square test

tableau chi square test

In « Dimensions », move « Exited » in « Label ».

tableau chi square test

tableau chi square test

In « Dimensions », move « Exited » in « Colors ».

tableau chi square test

tableau chi square test

We also need total absolute values, which means the total number of men and women. There is a very fast way to get that. Right-click on the vertical axis and select « Add Reference Line ».

tableau chi square test

Then in « Value », click on the drop-down on the right and select « Sum » to have the total sum of the observations.

tableau chi square test

And in « Scope », you select « Per Cell » option to specify that you want the total sums for each category, male and female.

tableau chi square test

Now, we have the total sum at the top of the bars. We will modify labels to have the absolute values. In « Label », we will change « Computation » to « Value » and click on the « OK » button.

tableau chi square test

tableau chi square test

tableau chi square test

Here’s how to enter the data :

For « Sample1 » in #success, you enter 810 because there are 810 people who left the bank. For « Sample1 » in #trials, you enter 5014 because there are 5014 people in total.

For « Sample2 » in #success, you enter 814 because there are 814 people who left the bank. For « Sample2 » in #trials, you enter 2509 because there are 2509 people in total.

tableau chi square test

Here is the verdict : « Sample2 is more successful ». « Sample2 » corresponds to German’s clients and #success is :« yes, the client left the bank ». This verdict means that of all the clients from German are more likely to leave the bank than clients from France. And look, there is something important, it’s « p<0.001 ». This means that the « p » is strictly less than 0.001. As you can see, « p » value is very small, which concludes that the tests are statistically significant.

Ooh, there’s another thing I wanted to show you with the tab « age » with the 2 bar charts in parallel.

tableau chi square test

As you can see, there are many categories (more than 5) because each category corresponds to a 5-year ago group with clients of the bank aged from 15 to 90 years old. This is a lot of comparison but it would be a good exercise for you to find what are the 2 categories to compare that shows that there is a significant statistic difference.

I give you a hint, compare slices from 50 to 54 years old or from 35 to 39 years olds. In fact, you should compare all peer categories where you observe difference on this basic A/B test. Do a basic A/B test with absolutes values. Then do a Chi-square test to check if the difference is statistically significant, I mean, if the result is valid for the total number of bank’s clients.

This is a way to statistically validate the insights we see onTableau. You see, it’s not very difficult and it’s effective. Here is a way to find insights on Tableau and validate them.

Subscribe to my newsletter and share this article if you think it can help someone you know. Thank you.

-Steph

Validate Data Mining In Tableau With A Chi-Square Test

validate validation

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

In this article we will start using statistics. Don’t worry we’ll do something simple, we’ll use the Chi-square test in a basic way. There is a special section to learn how to do statistics at an advanced level.

I’ll explain why we’re going to learn how to use the Chi-square test. The results we have with theses 2 bar charts are good. We see on theses 2 bar charts that age has a significant impact on the rate of client leaving the bank. We also see in which age groups the clients leaves the bank the most and which age groups the clients leave the bank the least. With that we have good insights.

In the A/B test « Gender », we can see that there is a correlation between the male and female sex and the choice to leave the bank. But as I said before, this A/B test is basic. The results of a basic A/B test visually shows us what is probably happenning in reality but we aren’t 100% sure of these results. To validate these results, we need do to use statistical tests like Chi-square test.

Doing a report based on basic A/B test is very risky and you can have completely false insights. I don’t advise you to do it (unless you want to leave your job). It’s for this reason that using Chi-square will help us to have strong insights.

Chi-square will allow us to know if our results are statistically significant. Our results are based on a sample of 10 000 clients and Chi-square test will tell us if these results are due to chance effects or if these results can represent all the client of the bank.

For example in our A/B test « Gender », we observed that in our sample of 10 000 clients, women are more likely to leave the bank compared to men.

tableau data mining science chi square test a/b test

Now, we aren’t sure if the results of this sample represent the behavior of all the bank’s clients.

To use basic Chi-square test, we use an online tool. Click here  .

tableau data mining science chi square test a/b test

On internet, there are plenty of websites to do a Chi-square test but we’ll use this one so that you can understand how it works. To do a Chi-square test, we need to use absolute values and in our A/B test we have percentage.

Let’s go back to Tableau. We’ll create a new tab with a version of A/B test with absolute values. In this way, we keep the A/B test with the percentages. Do a right-click on the « Gender » tab and select « Duplicate ».

tableau data mining science chi square test a/b test

Name the new tab « Gender Actual » to specify that it’s absolute values.

tableau data mining science chi square test a/b test

To have the absolute values, move « Number of Records » in « Measures » to the « Marks » area and put it over top of « SUM(Number of Records ».

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Move « Number of Records » in « Measures » to « Rows » over « SUM(Number of Records ».

tableau data mining science chi square test a/b test

Cool, we have our absolute values.

tableau data mining science chi square test a/b test

We also need total absolute values, which means the total number of men and women. There is a very fast way to get that. Right-click on the vertical axis and select « Add Reference Line ».

tableau data mining science chi square test a/b test

Then in « Value », click on the drop-down on the right and select « Sum » to have the total sum of the observations.

tableau data mining science chi square test a/b test

And in « Scope », you select « Per Cell » option to specify that you want the total sums for each category, male and female.

tableau data mining science chi square test a/b test

Now, we have the total sum at the top of the bars. We will modify labels to have the absolute values. In « Label », we will change « Computation » to « Value » and click on the « OK » button.

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Perfect, we have the total amount of observation at the top of each bar : 4543 women and 5457 men. We have what we need to use our online tool.

tableau data mining science chi square test a/b test

OK, I’ll explain how this tool works. « Sample1 » and « Sample2 » correspond to the independent variable « Gender ». You choose in which order you enter the data, « Sample1 » for men or the opposite. In our case, we use « Sample1 » for women and « Sample2 » for men.

« #success » corresponds to the result Y=1, which means in our case « yes, the client left the bank ».

« #trials » is the total number of observations, which means the total number of women in « Sample1 » and the total number of men « Sample2 ».

That’s how you enter the data :

  • For « Sample1 » in #success, you enter 1139 because there are 1139 women who left the bank. For « Sample1 » in #trials, you enter 4543 because there are 4543 women in total.

 

  • For « Sample2 » in #success, you enter 898 because there are 898 men who left the bank. For « Sample2 » in #trials, you enter 5457 because there are 5457 men in total.

tableau data mining science chi square test a/b test

Here is the verdict : « Sample1 is more successful ». « Sample1 » corresponds to women and #success is :« yes, the client left the bank ». This verdict means that of all the bank’s client, women are more likely to leave the bank than men. And look, there is something important, it’s « p<0.001 ». This means that the « p » is strictly less than 0.001.

tableau data mining science chi square test a/b test

« p » is the value that indicates whether an independent variable has a statistically significant effect on a dependent variable. In our case, the independent variable is « Gender » and the dependent variable is « Exited », which is : « yes, the client left the bank ». So « p » is strictly less than 0.001, which means that the independent variable « Gender » has a statistically significant effect on the dependent variable « Exited ». This shows us that out of the total number of bank’s clients, women are more likely to leave the bank than men.

This is how we use Chi-square test with this online tool. This is the same principle on all online tools that you can find on Google or DuckDuckGo . You can repeat these instructions that I gave you with other tools, you will get the same results.

It’s cool with the Chi-square we validated the A/B test and to specify that this A/B test is validated, we’ll color the tab in green.

Right-click on the tab, select « Color » and select « Green ».

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Perfect, now we’ll validate another A/B test. Selects « HasCreditCard » tab.

tableau data mining science chi square test a/b test

We’re going to create an A/B test « HasCreditCard » only with absolute values. To save time, right-click on « Gender Actual » tab and select « Duplicate ».

tableau data mining science chi square test a/b test

We’ll remove the green color on the tab « Gender Actual (2) ». Right-click on the tab and select « Color » and « None ».

tableau data mining science chi square test a/b test

You rename the tab « HasCreditCard Actual ».

tableau data mining science chi square test a/b test

Move the variable « HasCrCard » over « Gender » in « Columns ».

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Excellent, everything is ready to do a Chi-square test. We’ll remove « Exited » labels to better see the absolutes values. Make a click and drag out.

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Perfect, let’s go back to our online tool. In this case, « Sample1 » is « no », which means client who don’t have credit card and « Sample2 » for « yes », which means clients who have a credit card.

That’s how you enter the data :

  • For « Sample1 » in #success, you enter 613 because there are 613 clients who left the bank. For « Sample1 » in #trials, you enter 2945 because there are 2945 clients who don’t have a credit card.
  • For « Sample2 » in #success, you enter 1424 because there are 1424 clients who left the bank. For « Sample2 » in #trials, you enter 7055 because there are 7055 clients who have a credit card.

tableau data mining science chi square test a/b test

Let’s look at the verdict, it’s « No significant difference ». « p » value is very high, it’s above 5%. This confirms that the independent variable « HasCrCard » has no statistically significant effect on the dependent variable « Exited ». That was the conclusion we had made when we had done the A/B test with percentages.

We had seen that there was 21% of « Exited » (clients who left the bank) in the category « no » and 20% in the category « yes ». With these results we concluded that most likely the variable « HasCrCard » had no impact on the rate of clients who left the bank. Chi-square test confirms our conclusion and we can put the tab « HasCrCard » in green to say that it’s OK.

Right-click on the tab « HasCreditCard » => « Color » => « Green ».

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Excellent, now, you can do a statistical A/B test with 2 categories. Soon, we will do statistical A/B tests with more than 2 categories.

Share this article if you think it can help someone you know. Thank you.

-Steph

Look For Anomalies

anomaly

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

We’ll learn how to duplicate a bar char to create a new A/B test. We’ll create several A/B test to look for anomalies.

But before that, we’ll name the sheet. Right-click on the tabe and select « Rename Sheet ».

tableau a/b test tableau dataset anomalies

Rename the sheet « Gender ».

tableau a/b test tableau dataset anomalies

Now right-click on the « Gender » tab and select « Duplicate ».

tableau a/b test tableau dataset anomalies

Rename this new tab « Country ».

tableau a/b test tableau dataset anomalies

We’ll do an A/B test with the countries and we’ll reuse everything we did with the A/B test « Gender » to save time.

As you can see « Gender » is in « Columns ».

tableau a/b test tableau dataset anomalies

To use this A/B test with a variable other than « Gender », move the variable you want on top of « Gender » in « Columns ».

Go, go ! There is « Geography » in « Dimensions », takes « Geography » and puts it on « Gender ».

tableau a/b test tableau dataset anomalies

Boom with 1 click we have our A/B test for countries.

tableau a/b test tableau dataset anomalies

We have the percentage of clients who left and stayed in the bank for each country (Germany, Spain and France).

In this A/B test we can see that in Germany, many clients left the bank with a rate of 32%. For Spain and France, the rate of clients who left the bank is below the average departure rate (20%), 17% for Spain and 16% for France.

Already, we have interesting insigns. We can find out if in Germany there is a new aggressive competitor with more interesting offers or if there is a new law unfavorable to the bank’s offers that has been voted. It’s necessary to do reseach in Germany to find the reason for this high rate of departure.

You have seen, usually an A/B test has 2 categories but in our case, there are 3 categories. We could call it an A/B/C test but it’s a bit bizarre. When there are more than 2 categories, we call it a classification test.

In this article, I will continue to use the term A/B test but remember the term classification test for the next time.

Let’s do another A/B test quickly.

Duplicate this A/B test by right-clicking on the « Country » tab and selecting « Duplicate ».

tableau a/b test tableau dataset anomalies

tableau a/b test tableau dataset anomalies

This time we will study the variable « Has Cr Card ». This variable is « 1 » if the client has a credit card and « 0 » if the client doesn’t have a credit card.

You saw ? This variable is a categorical variable because it is binary « 1 » and « 0 » but it is in « Measures ». Since this variable is categorical, it should be in « Dimensions » so we will move the variable « Has Cr Card » from « Measure » to « Dimensions ».

tableau a/b test tableau dataset anomalies

tableau a/b test tableau dataset anomalies

Now that it’s done, move « Has Cr Card » over « Geography » in « Columns ».

tableau a/b test tableau dataset anomalies

tableau a/b test tableau dataset anomalies

It’s cool, we have a new A/B test for credit cards. What we can observe in this A/B test is that there is not a big difference between the departure rate of clients who don’t have a credit card (21%) and the departure rate of clients who have a credit card (20%).

It’s time to create aliases for this A/B test. Right-click on « Has Cr Card » and select « Alias…. ».

tableau a/b test tableau dataset anomalies

To start, « 0 » means that the clients don’t have a credit card so in « Value », you write « No ». « 1 » means that the clients has a credit card so in « Value », you write « Yes ». Then you click on the « OK » button.

tableau a/b test tableau dataset anomalies

tableau a/b test tableau dataset anomalies

That’s it, the bar chart is easy to read now. We understand that among clients who don’t have a credit card, 21% left the bank and among clients who have a credit card, 20% left the bank. We can conclude that having or not having a credit card doesn’t have a significant impact on the decision to leave the bank.

It’s time to rename this tab. Right-click on the « Sheet4 » tab and select « Rename Sheet ». Name the sheet « HasCreditCard ».

tableau a/b test tableau dataset anomalies

tableau a/b test tableau dataset anomalies

Let’s go, let’s do another A/B test with another variable. Let’s look at « Measure » and study the variable « IsActiveMember ».

The variable « IsActiveMember » is « 1 », if the client is active and « 0 » it the client is inactive. It’s necessary to detail the definition of IS ACTIVE. IS ACTIVE depends on the criteria of the bank. For example, it could be : « Did the client log in at least once to their bank account last month ? » or « Has the client made at least one banking transaction last month ? », etc.

As you can see, the variable « IsActiveMember » is a categorical variable (binary 1 and 0) so it’s a variable to move to « Dimensions ».

Here’s another way to move a variable from « Measures » to « Dimensions ». Right-click on « IsActiveMember » and select « Convert to Dimensions ».

tableau a/b test tableau dataset anomalies

Perfect, the variable « IsActiveMember » is in « Dimensions ».

tableau a/b test tableau dataset anomalies

We will duplicate our « HasCreditCard » sheet. Right-click on « HasCreditCard » tab and select « Duplicate ».

tableau a/b test tableau dataset anomalies

Renamce this tab « IsActiveMember ».

tableau a/b test tableau dataset anomalies

Since we have diplucted what we did with « HasCreditCard », we simply need to take the variable « IsActiveMember » from « Dimensions » and more that over « HasCrCard » in « Columns ».

tableau a/b test tableau dataset anomalies

tableau a/b test tableau dataset anomalies

Let’s create aliases to make reading this bar chart easier. Right-click on « IsActiveMember » and select « Aliases… ».

tableau a/b test tableau dataset anomalies

For « 0 », we put « No » because the client is not active and for « 1 », we put « Yes » because the client is active. Click on the « OK » button.

tableau a/b test tableau dataset anomalies

Here is what we can see with this A/B test « IsActiveMember ». Among inactive clients, 27% left the bank. Among active clients, 14% left the bank. This show is that clients who are not active are more likely to leave the bank than active clients.

Indeed, a client who is active means that he/she uses his/her bank account and products of the bank so an active client is satisfied with the bank. It’s possible that some clients leave the bank because of external factors such as a competitor, new regulations or elements of the private life of the client.

It’s cool, we created 4 A/B tests in a few minutes.

  1. An A/B test « Gender » that allowed us to see that women were more likely to leave the bank.

  2. An A/B test « Country » that allowed us to see that it is in Germany that clients are most likely to leave the bank.

  3. An A/B test « HasCreditCard » which allowed us to see that having or not having a credit card didn’t have a significant impact on the descision to leave the bank.

  4. An A/B test « IsActive Member » allows us to see that client who aren’t active are more likely to leave the bank .

I will leave you a homework. You’ll do an A/B test with the variable « Number Of Product » which is still a category variable. The variable « Number Of Products » indicates the number of product that the client has in the bank. Add aliases to make reading the bar chart easier.

I trust you I’ll give you the answer in th next article,

Share this article if you think you can help someone you know. Thank you.

-Steph

Boost Your Marketing Based On Science (Part 1)

brain

I watched an Olivier Roland’s video and I learned good stuff.

Let’s see what neuroscience found on things that influence people to buy.

The book of Michel Badoc and Anne-Sophie Bayle-Touroulou « Le neuro-consommateur » (in french)  helps us better to understand this.

It’s a book that has been written for other researchers and academics. This is why this book is very interesting for entrepreneurs and consumers.

Here are the elements from this book to boost your marketing.

Until now, marketing and communication are based on the rational purchasing decisions and perceptions of advertising messages by the consumer. But neuroscience shows us that a huge part of our actions come from the subconscious part of our brain.

For A.K Pradeep  and Martin Lindstrom , only 15% of purchasing decision are rational. Current marketing studies limited in the accuracy of customer behavior. What customers say doesn’t always match what they do. Responses collected during a market study can be influenced by context, which disturbs responses. With neuroscience, we can directly communicate with the brain to try to improve marketing.

Here the elements found in neuroscience on the unconscious behavior of consumers.

  • Age et gender

  • Memory

  • Emotions and desire in the decision

  • 5 sens

  • Cognitive ergonomics, pricing, distribution and sales.

  • Subliminals relationships

  • Community and social networks.

Let’s go, we’ll see that in detail. We’ll start with age and gender because these 2 create behaviors and attitudes, sometimes, difficult to understand by a person who doesn’t belong to the same category.

Age

reptilian limbic neocortex brain

Reptilian brain

It’s the center of instincts and the satisfaction of primary needs. This mainly affects young children. They respect the leader who is the mother or the father but also the strongest person who can protect them in case of external danger.

Limbic brain

It’s the center of stress emotions, instinctive behavior and memory. This mainly influences teenagers. They are mainly attracted by new brands / products and original fashions that can distinguish or oppose them to adult fashions.

Teenagers are often interested in causes or subjects with a lot of emotions : social, humanitarian, ecological, fair trade, etc. They prefer emotional communication over rational information.

neocortex,

It’s the center of anticipation and decisions. This mainly affects adults.

With internet, we can see several big differences in the generational behavior of consumers. There are Digital Native and Digital Immigrants and these 2 categories require different approaches.

digital native immigrant

Digital Natives

These are the people who grew up with computers, smartphone and internet. They prefer to have jerky information without verb and without object complement. They can read in parallel information on several different media. They don’t need to structure their thoughts and they can have a random read mode. They feel emotions much more with colors and designs rather than structured text. They want things to go fast.

Digital Immigrants

They prefer a linear processing of information. They like the text’s logic. They wish to receive the information in a slow way with consistency in the structure. They want to keep their privacy and are wary of the information’s distribution on internet. They sometimes want to work alone.

Gender

gender male female

There is a distinct difference between the behavior of female and male consumers.

Female

The left hemisphere of the brain is more developed in women and they’re subject to the hormones influence. We can see more of this phenomenon when a woman becomes a mother. A woman like to communicate more that a man, she likes to talk and be listened to. She needs shares her ideas, feelings and emotions.

She’s very well oriented in time. A woman is less emotional than a ma but she is more sensitive because she has the sens of smell, hearing and touch more developed than a man.

Male

The right hemisphere of the brain is more developed in man and they’re subject to the influence of testosterone. A man is more emotional than a woman, but he expresses less his emotions. He likes action and competition. He’s very well oriented in space, which allows him to find shortcuts. The man’s view is very developed and is eroticized. This explains why the man is attracted by the nude, jewelry, makeup and clothes.

For these reasons, it’s easier to mee a man’s expectations compared to a woman’s expectations.

Differences

difference

Male

As you can see, man primarily uses his view to select a product or service that he can use to show his strength and seductive power. He likes offers that give short-term profits. He prefers simple and direct communication. He prefers images rather than text. Price is more important for the man than for the woman.

Female

A woman is more complex in her expectations. She processes information in a way that is both rational and emotional. A woman is not attracted by nudity. She is attracted by a neat person with harmonious clothes and neat hands. In the case of a salesman, a woman has no preference for a man of a woman. This is influenced by several elements : voice, smell, facial expression, capacity to listen and quality of answers of the salesman.

A women prefers written and documented communication. She likes social media because she can express her ideas and meet people who share her points of view. She filters rational messages through her emotions. She likes positive communications. Before selecting a product/service, she will compare it with competitors and get information with her friends, co-workers and other people with experience.

A woman is less impulsive than a man even if a purchase can serve as an antistress. For a woman, the touch’s quality and the smell can influence a purchase like clothes.

This is the end of the 1st part.

Share this article if you think it can help someone you know. Thank you.

-Steph

Work With An Alias

data science alias bar chart tableau mining

I have just enrolled in a Data Science course on Udemy and I learned good stuff.

In the last article, I showed you how to do a simple A/B test. We will continue with the result we had with the A/B test.

data science alias bar chart tableau mining

Here is the result of the A/B test. What is in orange is the percentage of men who left the bank, it’s 16%. What is in blue is the percentage of women who left the bank, it’s 25%.

With our bar chart we can quickly see that women are more likely to leave the bank than men, all the rest being equal in our sample.

I remind you that this is a basic A/B test. There are 2 type of A/B test, the basic A/B test and the statistical A/B test. The statistical A/B test is done with a statistical test like the KHI-2 test. For our case, the basic A/B test already give us good insights.

To make our bar chart even easier to read, we will work with aliases.

The first thing we will do is we will improve the format. Right-click on this space between « Gender » and the bars and select « Format… ».

data science alias bar chart tableau mining

The « Sheet » tab appears. In « Worksheet » changes the text size to « 12 ».

data science alias bar chart tableau mining

What is good with data mining is that we aren’t obligated to make a perfect chart because we don’t have to present them in a report to managers or a meeting.

For example, if I had to present this chart in a report, it would be necessary to change the vertical title. But we only make a model so this change isn’t necessary.

Now, look at this rectangle. We can see « Exited », « 0 » and « 1 ».

data science alias bar chart tableau mining

« 0 » means that the client stayed in the bank and « 1 » means that the client left the bank. We can also see that client who left the bank are in orange so 25% for women and 16% for men. And the client who stayed in the bank are blue so 75% for women and 84% for men.

We did an excellent basic A/B test but it would be much easier to read if we replace « 0 » with « Stayed » and « 1 » with « Exited ».

With aliases we can do that. An alias is to replace the binary results « 0 » and « 1 » with « Stayed » and « Exited » because it’s not easy to remember the meaning of « 0 » and « 1 ».

There are 2 ways to do it : create a calculated field or use aliases.

We will use aliases. Know that aliases are not going to change the « 0 » and « 1 » in the dataset, this change is only in Tableau.

In « Dimensions », right-click on « Exited » and select « Aliases… ».

data science alias bar chart tableau mining

data science alias bar chart tableau mining

A small window appears. In this small window, you can create an alias for each value contained in the « Exited » variable.

The variable « Exited » contains the value « 0 » and « 1 ». For the value « 0 », we will create the alias « Stayed » to say that the client stayed in the bank. For the value « 1 », we will create the alias « Exited » to say that the client left the bank. Then click on the « OK » button.

data science alias bar chart tableau mining

Look, we can see the new values in the rectangle.

data science alias bar chart tableau mining

The values « 0 » and « 1 » have been replaced by « Stayed » and « Exited ».

Now that the aliases saved, we will take the variable « Exited » in « Dimensions » and move it to « Label ».

data science alias bar chart tableau mining

data science alias bar chart tableau mining

Look, we have our aliases « Stayed » and « Exited » on the bar chart.

In this ways, it’s easier for people to read the bar chart without asking what meaning of « 0 » of « 1 » values. « Stayed » and « Exited » are clearer.

Now you know how to use aliases so that people can easily read the binary values of a chart.

Share this article if you think it can help someone you know. Thank you.

-Steph

Add a Reference Line

reference line tableau data science

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

In the previous article we learned how to work with aliases. We will learn how to add a reference line in our bar chart.

Before I start, I’ll show you a trick in Tableau.

In our bar chart we can see the labels in this order : percentage and below : « Stayed » or « Exited ».

We will reverse this order. You go in this rectangle.

reference line tableau data science

And you place the label « Exited » above the label « SUM(Number of Records ».

reference line tableau data science

Look, the label « Stayed » is above percentage.

reference line tableau data science

With that, we can understand the bar chart more easily.

Let’s add a reference line, let’s go . But before, I think you’d like to know why I’m talking to you about a reference line.

A reference line helps us to compare bar chart results with a benchmark. This benchmark is represented by this reference line.

In our case, the benchmark is the percentage of clients who left the bank in our sample of 10 000 people.

The first thing to do is find this percentage in our bar chart. To be able to do that, remove « Gender » from « Columns ».

reference line tableau data science

Boom, we have a new bar chart.

reference line tableau data science

Look, we only have the percentage of clients who left the bank and the percentage of clients who stayed in the bank.

We see that on our sample of 10 000 people, there are 20% of the clients who left the bank and 80% of the clients stayed in the bank. This means that the churn rate (client departure rate) is 20%.

What we’re going to do is we will add this churn rate in our A/B test. To return to our A/B test, press 2 times on Ctrl+Z or Command+Z or you can click 2 times on the « Back » button in the menu bar.

reference line tableau data science

Now we know that the average clients who left the bank is 20%.

We will add a horizontal line in the Y axis (Y = 20%) to compare the 20% of the churn rate and the 2 categories male and female.

Let’s go. Right-click on the vertical axis (Y axis) and select « Add Reference Line ».

reference line tableau data science

A window appears with several options.

reference line tableau data science

You have the choice to add a line, a band, a distribution or a box plot.

We will use the line for the entire table.

Click on the « Line » button and activate the « Entire Table » checkbox. In « Value » selects « Constant ».

reference line tableau data science

The constant is 20%, so it’s necessary that you put 0.20 in « Value ».

reference line tableau data science

It’s possible to put a label on this reference line. For example, if the line reference corresponds to a formula, the label displays the formula. But for our case, our constant is 20% and it’s already displayed on the vertical axis so we will select « None ».

reference line tableau data science

For the format of the line, select the continuous line and click on the « OK » button.

reference line tableau data science

We have our reference line is added to our chart.

reference line tableau data science

Here is what we can see. Female clients are more likely to leave the bank than average clients. Male clients are less likely to leave the bank than average clients. 

In our case, it’s obvious to see that because there is only 2 categories, men and women.

Now you know how to add a reference line in a bar chart.

Share this article if you think it can help someone you know. Thank you.

-Steph

Visualize An A/B Test in Tableau

A/B tes

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

We are going to do a simple and very visual A/B test in Tableau.

The first thing to do is save this worksheet and name it « Map ».

Do a right-click on « Sheet1 » at the bottom of the screen and select « Rename Sheet ».

Tableau A/B test data science

The second thing to do is to save this workbook. For that we go to « File » and select « Save to Tableau Public As… »

Tableau A/B test data science

The « Tableau Public Sign In » window appears to connect to your Tableau account.

Tableau A/B test data science

Save the workbook with the name « DataMining » and click on the « Save » button.

Tableau A/B test data science

We will create a new worksheet specifically for the A/B test. Click on the « New Worksheet » icon at the bottom of the screen.

Tableau A/B test data science

Look, you created the new worksheet.

Tableau A/B test data science

To start we need the dependent variable we are studying. This dependent variable is « Exited » which is « 1 » if the client left the bank or « 0 » if the client stayed in the bank. Now look, this dependent variable is in « Measures » so Tableau recognized this variable as a dependent numeric variable.

For our case, the dependent variable « Exited » is actually a category. Our logic in this situation is : « Did the client leave or did the client stay ?. For this reason, we need to move the variable « Exited » in the dimensions.

Tableau A/B test data science

Now, the variable « Exited » is in dimensions.

Tableau A/B test data science

Let’s do a classic A/B test, the A/B test for gender (male or female).Here what we’re going to test, if we keep all the rest constant and if we take a male client and a female client, which of the two is most likely to leave the bank.

Let’s go, move « Gender » in « Column ».

Tableau A/B test data science

We have 2 columns : « Female » and « Male ».

Tableau A/B test data science

Move « Exited » on « Colors ».

Tableau A/B test data science

Look, we have 2 colors. Blue for « 0 », it means the clients who stayed in the bank. Orange for « 1 », it means the clients who have left the bank.

To be more specific, we want know how many clients stayed and how many clients left the bank.

Move the variable « number Of Records » into « Rows ».

Tableau A/B test data science

Tableau A/B test data science

What we can notice is that the total number of men is higher than the total number of women. Then we can see that among the women, a large proportion left and that among the men a small proportion left. However, this is not enough to allow us to understand what is happening.

We will add « number Of Record » as a label. Move « number Of Record » to « Label ».

Tableau A/B test data science

Tableau A/B test data science

Change the label’s size in « 12 » and bold.

Tableau A/B test data science

Now we know the number of people in each category.

To better visualize this, we will replace the numbers with percentages. We want to see what percentage of female clients are gone and what percentage of male client are gone. With percentages we can easily make the comparison that with the absolute numbers, it’s not possible because the total number of woman is different from the total number of man.

To convert a absolute number to a percentage, you need to click on the arrow next to « SUM(number Of Record) ».

Tableau A/B test data science

And click on « Add Table Calculation… »

Tableau A/B test data science

Tableau A/B test data science

In « Calculation Type », select « Percent of Total ».

Tableau A/B test data science

Be careful, here there is one important thing to do. Change « Table(across) » and choose « Table(down) ».

Tableau A/B test data science

« Table(down) » will give us the total percentage in each column. Now you can close the window, the changes have been made.

Boom, we have the percentages.

We will create labels to make it easier to read. Click on the small arrow next to « SUM(number Of Record) » and select « Format… ».

Tableau A/B test data science

The « Pane » tab appears. In the tab « Pane » in « Numbers », choose « Percentage » and select « 0 » decimal.

Tableau A/B test data science

We will make it even more consistent. Move « SUM(number Of Record) » by holding key « Ctrl » or « Command » to « Rows » to replace the old « SUM(number Of Record) ».

Tableau A/B test data science

As you can see, the vertical axis is in percentage.

Tableau A/B test data science

Let’s do an analysis of what we see. We see that the percentage of female clients who left the bank is 25%. We see that the percentage of male clients who left the bank is 16%. What we see that female clients are more likely to leave the bank than male clients, all the rest being equal.

This A/B test is not complete because we have not done any test of statistical significance but this approach is effective for quickly obtaining results.

We’re going to do a full A/B test later but today you learned how to do an effective A/B test by focusing on relevant things. With a test of statistical signifiance, there are irrelevant variables and it takes a little more time but we’ll that later.

Share this article if you think it can help someone you know. Thank you.

-Steph