Validate Data Mining In Tableau With A Chi-Square Test

validate validation

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

In this article we will start using statistics. Don’t worry we’ll do something simple, we’ll use the Chi-square test in a basic way. There is a special section to learn how to do statistics at an advanced level.

I’ll explain why we’re going to learn how to use the Chi-square test. The results we have with theses 2 bar charts are good. We see on theses 2 bar charts that age has a significant impact on the rate of client leaving the bank. We also see in which age groups the clients leaves the bank the most and which age groups the clients leave the bank the least. With that we have good insights.

In the A/B test « Gender », we can see that there is a correlation between the male and female sex and the choice to leave the bank. But as I said before, this A/B test is basic. The results of a basic A/B test visually shows us what is probably happenning in reality but we aren’t 100% sure of these results. To validate these results, we need do to use statistical tests like Chi-square test.

Doing a report based on basic A/B test is very risky and you can have completely false insights. I don’t advise you to do it (unless you want to leave your job). It’s for this reason that using Chi-square will help us to have strong insights.

Chi-square will allow us to know if our results are statistically significant. Our results are based on a sample of 10 000 clients and Chi-square test will tell us if these results are due to chance effects or if these results can represent all the client of the bank.

For example in our A/B test « Gender », we observed that in our sample of 10 000 clients, women are more likely to leave the bank compared to men.

tableau data mining science chi square test a/b test

Now, we aren’t sure if the results of this sample represent the behavior of all the bank’s clients.

To use basic Chi-square test, we use an online tool. Click here  .

tableau data mining science chi square test a/b test

On internet, there are plenty of websites to do a Chi-square test but we’ll use this one so that you can understand how it works. To do a Chi-square test, we need to use absolute values and in our A/B test we have percentage.

Let’s go back to Tableau. We’ll create a new tab with a version of A/B test with absolute values. In this way, we keep the A/B test with the percentages. Do a right-click on the « Gender » tab and select « Duplicate ».

tableau data mining science chi square test a/b test

Name the new tab « Gender Actual » to specify that it’s absolute values.

tableau data mining science chi square test a/b test

To have the absolute values, move « Number of Records » in « Measures » to the « Marks » area and put it over top of « SUM(Number of Records ».

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Move « Number of Records » in « Measures » to « Rows » over « SUM(Number of Records ».

tableau data mining science chi square test a/b test

Cool, we have our absolute values.

tableau data mining science chi square test a/b test

We also need total absolute values, which means the total number of men and women. There is a very fast way to get that. Right-click on the vertical axis and select « Add Reference Line ».

tableau data mining science chi square test a/b test

Then in « Value », click on the drop-down on the right and select « Sum » to have the total sum of the observations.

tableau data mining science chi square test a/b test

And in « Scope », you select « Per Cell » option to specify that you want the total sums for each category, male and female.

tableau data mining science chi square test a/b test

Now, we have the total sum at the top of the bars. We will modify labels to have the absolute values. In « Label », we will change « Computation » to « Value » and click on the « OK » button.

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Perfect, we have the total amount of observation at the top of each bar : 4543 women and 5457 men. We have what we need to use our online tool.

tableau data mining science chi square test a/b test

OK, I’ll explain how this tool works. « Sample1 » and « Sample2 » correspond to the independent variable « Gender ». You choose in which order you enter the data, « Sample1 » for men or the opposite. In our case, we use « Sample1 » for women and « Sample2 » for men.

« #success » corresponds to the result Y=1, which means in our case « yes, the client left the bank ».

« #trials » is the total number of observations, which means the total number of women in « Sample1 » and the total number of men « Sample2 ».

That’s how you enter the data :

  • For « Sample1 » in #success, you enter 1139 because there are 1139 women who left the bank. For « Sample1 » in #trials, you enter 4543 because there are 4543 women in total.

 

  • For « Sample2 » in #success, you enter 898 because there are 898 men who left the bank. For « Sample2 » in #trials, you enter 5457 because there are 5457 men in total.

tableau data mining science chi square test a/b test

Here is the verdict : « Sample1 is more successful ». « Sample1 » corresponds to women and #success is :« yes, the client left the bank ». This verdict means that of all the bank’s client, women are more likely to leave the bank than men. And look, there is something important, it’s « p<0.001 ». This means that the « p » is strictly less than 0.001.

tableau data mining science chi square test a/b test

« p » is the value that indicates whether an independent variable has a statistically significant effect on a dependent variable. In our case, the independent variable is « Gender » and the dependent variable is « Exited », which is : « yes, the client left the bank ». So « p » is strictly less than 0.001, which means that the independent variable « Gender » has a statistically significant effect on the dependent variable « Exited ». This shows us that out of the total number of bank’s clients, women are more likely to leave the bank than men.

This is how we use Chi-square test with this online tool. This is the same principle on all online tools that you can find on Google or DuckDuckGo . You can repeat these instructions that I gave you with other tools, you will get the same results.

It’s cool with the Chi-square we validated the A/B test and to specify that this A/B test is validated, we’ll color the tab in green.

Right-click on the tab, select « Color » and select « Green ».

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Perfect, now we’ll validate another A/B test. Selects « HasCreditCard » tab.

tableau data mining science chi square test a/b test

We’re going to create an A/B test « HasCreditCard » only with absolute values. To save time, right-click on « Gender Actual » tab and select « Duplicate ».

tableau data mining science chi square test a/b test

We’ll remove the green color on the tab « Gender Actual (2) ». Right-click on the tab and select « Color » and « None ».

tableau data mining science chi square test a/b test

You rename the tab « HasCreditCard Actual ».

tableau data mining science chi square test a/b test

Move the variable « HasCrCard » over « Gender » in « Columns ».

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Excellent, everything is ready to do a Chi-square test. We’ll remove « Exited » labels to better see the absolutes values. Make a click and drag out.

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Perfect, let’s go back to our online tool. In this case, « Sample1 » is « no », which means client who don’t have credit card and « Sample2 » for « yes », which means clients who have a credit card.

That’s how you enter the data :

  • For « Sample1 » in #success, you enter 613 because there are 613 clients who left the bank. For « Sample1 » in #trials, you enter 2945 because there are 2945 clients who don’t have a credit card.
  • For « Sample2 » in #success, you enter 1424 because there are 1424 clients who left the bank. For « Sample2 » in #trials, you enter 7055 because there are 7055 clients who have a credit card.

tableau data mining science chi square test a/b test

Let’s look at the verdict, it’s « No significant difference ». « p » value is very high, it’s above 5%. This confirms that the independent variable « HasCrCard » has no statistically significant effect on the dependent variable « Exited ». That was the conclusion we had made when we had done the A/B test with percentages.

We had seen that there was 21% of « Exited » (clients who left the bank) in the category « no » and 20% in the category « yes ». With these results we concluded that most likely the variable « HasCrCard » had no impact on the rate of clients who left the bank. Chi-square test confirms our conclusion and we can put the tab « HasCrCard » in green to say that it’s OK.

Right-click on the tab « HasCreditCard » => « Color » => « Green ».

tableau data mining science chi square test a/b test

tableau data mining science chi square test a/b test

Excellent, now, you can do a statistical A/B test with 2 categories. Soon, we will do statistical A/B tests with more than 2 categories.

Share this article if you think it can help someone you know. Thank you.

-Steph

Create Bins and View Distributions

tableau, bins, bar, chart, distribution, age, data, science

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

It’s cool, you finished the 1st part. Now we’re going to do more deep Data Mining analysis with this bank’s dataset.

tableau, bins, bar, chart, distribution, age, data, science

To make these analyzes more deep, we’ll create a more statistical approach.

To do that we will create a new tab.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

For this new tab, we want to understand how client distributed according to their age. Is there a majority of young or old people ?

tableau, bins, bar, chart, distribution, age, data, science

Move the variable « Age » in « Columns ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

As we want to see the distribution of client ages, we need to use the variable « Number of Records » to see the number of observations. Move the variable « Number of Record » to « Rows ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

Boom, we have a chart but there is only one point on the top right. What happened is that Tableau took the sum of the ages of all the bank’s clients and the sum of all the « Number of Records », it means the total number of clients, 10 000 clients.

We’ll find a solution but before we’ll change the format to better see the chart. Right-click in the middle of the chart and select « Format ».

tableau, bins, bar, chart, distribution, age, data, science

For the font’s size, select « 12 ».

tableau, bins, bar, chart, distribution, age, data, science

Here you can see that the total age is 39 218 but that’s not what we’re looking for. What we want to see is the number of clients for each age.

I’ll explain what’s going on. We took the aggregated sums of our variables. Aggregate means that we took the total sum of the variable for each category. We added the ages but in fact we want to see the total number of observations for each age separately.

To have that, just click on the arrow in « SUM(Age) » in « Columns ».

tableau, bins, bar, chart, distribution, age, data, science

Then select « Dimensions »

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

You see, Tableau doesn’t take the aggregated sum of ages but it takes ages separately. We have a curve that shows us the continuous distribution of our clients ages. That is to say, for each age, the curve gives is the number of clients of this age.

We’ll look at the dataset. Right-click on « Churn Modelling » and select « View Data… ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

There is window that appears that shows us the data in detail. If you scroll to the right, you will find the column « Age ».

tableau, bins, bar, chart, distribution, age, data, science

We see that the ages rounded. As all ages rounded, Tableau is able to group clients by age. By positioning the mouse on the curve, we can see that there are 200 clients who are 26 years old.

tableau, bins, bar, chart, distribution, age, data, science

If in the dataset, ages weren’t rounded, you would have seen clients with 26.5 or 26.3 years. It would create a lot of irregularity, there would be plenty of spikes with lots of variations.

Oooooh look, there is a variation that isn’t normal.

tableau, bins, bar, chart, distribution, age, data, science

Let’s analyze it in detail. Around this peak, we see that there are 348 clients who are 29 years old.

tableau, bins, bar, chart, distribution, age, data, science

Here, 404 clients who are 31 years old.

tableau, bins, bar, chart, distribution, age, data, science

And this peak down that shows us that there are 327 clients who are 30 years old.

tableau, bins, bar, chart, distribution, age, data, science

How to explain this irregularity ? It’s possible that many people of 29 years old are about to turn 30 years old and many people of 31 years old who just had 31 years old. It’s chance that make us have inaccuracies. You may have other inaccuracies if you data isn’t precise and rounded. In our case, the ages are rounded but we want to get rid of our small irregularity that we see on our curve.

There is way to see our distribution without our irregularities, it’s « bins ». « Bins » consists of grouping the information into different categories. That is we’re going to regroup our clients in different age groups.

Right-click on « Age » in « Measures ». Select « Create » and select « Bins… ».

tableau, bins, bar, chart, distribution, age, data, science

A window appears. We’ll group our clients in 5-years increments. In « Size of bins », write « 5 » and click on the « OK » button.

tableau, bins, bar, chart, distribution, age, data, science

As you can see, the variable « Age » has remained in « Measures » but there is a new variable in « Dimensions ».This is the variable we created « Age(bins) ».

tableau, bins, bar, chart, distribution, age, data, science

Our « Age(bins) » variable was correctly placed in « Dimensions » because it is a category variable because each category corresponds to a 5-year age group.

For example, one category is 20 to 24 age group. Now we’ll create a new distribution based on « bins ».

To do that, we’ll remove the variable « Age » from « Columns » with a click and drag outside.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

You move the variable « Age(bins) » from « Dimensions » to « Columns ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

Note

In this case, it’s not possible to directly replace « Age » by « Age(bins) » over « Age » on « Columns ». This is because « Age » is a measure and « Age(bins) is a dimension.

That’s nice distribution, it’s usually the type of distribution (chart) we see in economics or mathematics. The difference with the old chart is that this chart is discrete. This chart is discrete because the clients grouped by age group while the previous chart was continuous.

On this distribution (chart), each bar corresponds to an age range. For example, this bar corresponds to the 25-29 age group.

tableau, bins, bar, chart, distribution, age, data, science

Now, we’ll change the colors.

In « Row », move « SUM(Number of Record) » while holding down the « Ctrl » or « Command » key on your keyboard to « Colors ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

We get our distribution in blue but we’ll change the color to red. Click on « Colors » and click on « Edit Colors »

tableau, bins, bar, chart, distribution, age, data, science

In the window that appears, click on the blue square on the right to display the color pallet.

tableau, bins, bar, chart, distribution, age, data, science

Select the red color and click on the « OK » button.

tableau, bins, bar, chart, distribution, age, data, science

Click on the « OK » button of the « Edit Colors » window.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

To facilitate the reading of the bar chart, we’ll add the number of clients in each age group. In « Row », move « SUM (Number of Record) » while holding the « Ctrl » or « Command » key on your keyboard to « Label ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

That’s it, we can see how many clients there are in each age group.

We see that the dominant bar is the 35-39 age bracket and the second dominant bar is the 30-34 age bracket. Overall, we can see that most clients are between 25 and 40 years old, which seems consistent.

On our bar chart, we have absolute values. We’ll replace that with percentages. Click in the little arrow in « SUM(Number of Records) » in « Label » and you select « Add Table Calculation… » but I’ll show you another way to do it.

tableau, bins, bar, chart, distribution, age, data, science

Instead of clicking « Add Table Calculation… », click on « Quick Table Calculation » and select « Percent of total ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

It’s cool, we have the exact percentage of people in each age bracket. Now, we can see that in the 25 to 40 age group, we have 20 + 23 +17= 60% of clients.

I’ll show you one last thing.You can change the size of the slices easily, just click on « Age(bins) » and select « Edit ».

tableau, bins, bar, chart, distribution, age, data, science

In the windows, you can change the size of the slices (bins). Put « 10 » instead of « 5 » to get 10-years slices. Click on the « OK » button.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

Now, we have a distibution with fewer slices and the dominant slice is 30 to 39 years old.

Well, it was just to show you how to change the size of bins. To go back to the old distribution with the 5-years slices, click on « Back » button.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

As you can see, the values on bars are in percentages but the values on the axis are in absolutes values. Here is an exercise that I ask you to do : « Put the values of the axis in percentage ». I’ll give you the answer the next article.

Share this article if you think if can help someone you know.Thank you.

-Steph

A Pratical Tip To Validate Your Approach

data science tableau check

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

How was the A/B test « Number Of Product » ? Easy or difficult ?

Here is the result I found.

data science tableau check bar chart

I think you noticed there was something bizarre. There is an anomaly. We imagine that the more the client has products, the more the client is satisfied with the bank so this type of clients should stay in the bank.

In the first 2 bars we can see that a client who has 1 product is more likely to leave the bank than a client who has 2 products. But when a client has 3 or 4 products, we see a huge rate of clients leaving the bank.

Look, there is a little bizarre detail. In the 2nd bar, we can’t see the « Exited » label. This is because there is no place in the orange part to put the text. To make it simpler, we’ll remove the label « Exited ». Drag and drop on the « Exited » text label to the outside.

data science tableau check bar chart

data science tableau check bar chart

Perfect, we can read the percentages. On the 1st bar, we can see that among the client that have 1 products, 28% left the bank. On the 2nd bar, we can see that among clients who have 2 products, 8% left the bank. This show us that clients who have 1 products are more likely to leave the bank than clients with 2 products.

And for the next bars, we observe an anomaly. On the 3rd bar, we can see that among the clients who have 3 products, 83% left the bank. On the 4th bar, we can see that among clients who have 4 products, 100% left the bank. We clearly see that there is a problem and we need to do a deeper analysis to understand what is going on .

As a Data Scientist, we need to explain what happens in bars 3 and 4. Usually when a client has 3 or 4 banking products, that means he/she is satisfied and is loyal to the bank. But in our case, it’s the opposite because there is a high rate of client who left the bank. This is the time to do deeper analysis.

The first thing to analyze is the quality of the data. There is a very big anomaly and it may be because there is something insignificant in our data that disturbs the statistics. For example, it’s possible that when the bank selected these clients in this sample, there were very few clients with 4 products and all those clients with 4 products left the bank. Sometimes chance can create anomalies and you have to play attention to these effects of chance because they don’t seem important but they can create false interpretations.

To start, we will check the number of clients with 4 products.

In « Measure », move « Number Of Records » (which gives the number of observations) on « Label ».

data science tableau check bar chart

data science tableau check bar chart

We observe on the first 2 bars than many clients with 1 or 2 products selected for our sample. For clients with 3 or 4 products, we can see that there were fewer clients selected for our sample.

There are 220 clients with 3 products and 60 clients with 4 products. These small number of clients probably explain why we observe these anomalies.

In this sample of randomly selected clients, there are very few clients with 4 products and they all left the bank. In this situation, we can confirm that it’s a chance. When thing like that happen, you have to be very careful not to make conclusion too fast and make misinterpretations.

The conclusion is that a lot of clients have been selected for category 1 and 2. For category 3 and 4, there have been few clients selected so we can’t do accurate statistics. We need to do deeper analyze for these categories of clients with 3 and 4 products.

Now, let’s put the percentage back on the bar chart. Click on the « Back » button.

.

data science tableau check bar chart

Or do a click and drag of « SUM(Number of Record) » to outside.

data science tableau check bar chart

data science tableau check bar chart

We saw that there is an anomaly and what is interesting to do is to have a comment to remember to do a more in-depth analysis of columns 3 and 4.

Right-click between the bar chart’s title and the bars. Select « Annotate » then « Areas… ».

data science tableau check bar chart

A window appears. In this window, you write « Low observation in last 2 categories » and click on the « OK » button.

data science tableau check bar chart

data science tableau check bar chart

Click on the comment and move it on bars 3 and 4.

data science tableau check bar chart

data science tableau check bar chart

The next time you work on this bar chart, you will see this comment that will remind you to seriously analyze client who have 3 and 4 products.

Validate our approach

It’s time to show you how to validate an approach and how to validate the data. For this we will create a new A/B test.

Duplicate this worksheet with a right-click on the « NumberOfProducts » tab and select « Duplicate ».

data science tableau check bar chart

And rename the tab « Validation ».

data science tableau check bar chart

For this tab, we will erase the comment. Select the comment and press the « Delete » button on your keyboard.

data science tableau check bar chart

data science tableau check bar chart

Everything is ready, the idea is to find a variable that doesn’t affect our results. That is a variable that has no impact on a client’s decision to leave or stay in the bank.

Take for example, the variable « Customer Id ». Client’s identification number has no influence on the client’s decision to stay or leave the bank.

We’ll do an A/B test with the last digit of the « Customer Id » and we’ill check that there is the same clients proportion who leave the bank in the 10 categories of the last digit of the « Customer Id ». The 10 categories are the numbers 0,1,2,3,4,5,6,7,8,9.

Let’s g.To start, we will create the variable that contains the last digit of the « Customer Id ». To have this variable, we will create a « Calculated Field ».

Right-click on « Customer Id », select « Create » and click on « Calculated Field ».

data science tableau check bar chart

data science tableau check bar chart

Name the calculated field « LastDigitOfCustID ». In the text field, we use the « RIGHT » function with « Customer Id » in parenthesis to select the last character of the « Customer Id ». In our case, the last character of the « Customer Id » is the last digit.

Here is the code to write in the text field : Right ({Customer Id},1)

data science tableau check bar chart

data science tableau check bar chart

Oooops, you see there is a small mistake => The calculation contains errors.

There is an error in the formula because « Customer Id » is a number variable and the « RIGHT » function applies to a variable of type « STRING ».

To use the « RIGHT » function, we will convert « Customer Id » into a string. We will use the « STR » function with « Customer Id » in parenthesis.

Here is the code to write in the text field

And click on the « OK » button : Right (STR({Customer Id}),1).

data science tableau check bar chart

Now, you can see that our calculated field « LastDigitOfCustID » is in « Dimensions ».

Click on « LastDigitOfCustID » and move it on top of « NumOfProducts » in « Columns ».

data science tableau check bar chart

data science tableau check bar chart

Now we have a new bar chart and we see that for every last digit of the « Customer Id » there is about the same proportion of clients leaving the bank. All these proportions don’t correspond exactly to the average of 20% but these slight variations aren’t important.

Seeing this uniform distribution allows us to validate our data because these data are homogenous.

Conculsion

Here’s how you can check the homogeneity of your data. You take a variable that has no impact on the fact that a client leaves or stays in the bank. The example we did with the last digit of the « Customer Id » is excellent. We were able to verify that in each of the categories taken by this variable, if there was the same proportion of clients leaving the bank. As is the case, we can validate our data.

Imagine another result. When we do the test with the last digit of the « Customer Id », we observe that for one of the numbers, the rate of clients who left is really higher than the average. This shows us that there is a problem in our data because it indicates an anomaly.

You can find other ways to verify your data by using other « insignificant variables » to see if the distribution is homogeneous. But be careful when you select an « insignificant variable » because there may be traps.

Here is an example. If you create a variable that takes the first letter of the first name, the distribution will not be homogeneous. The reason is simple, there are many more people who have a name that starts with the letter « M » than with the letter « Y ».

Share this article if you think it can help someone you know. Thank you.

-Steph

Work With An Alias

data science alias bar chart tableau mining

I have just enrolled in a Data Science course on Udemy and I learned good stuff.

In the last article, I showed you how to do a simple A/B test. We will continue with the result we had with the A/B test.

data science alias bar chart tableau mining

Here is the result of the A/B test. What is in orange is the percentage of men who left the bank, it’s 16%. What is in blue is the percentage of women who left the bank, it’s 25%.

With our bar chart we can quickly see that women are more likely to leave the bank than men, all the rest being equal in our sample.

I remind you that this is a basic A/B test. There are 2 type of A/B test, the basic A/B test and the statistical A/B test. The statistical A/B test is done with a statistical test like the KHI-2 test. For our case, the basic A/B test already give us good insights.

To make our bar chart even easier to read, we will work with aliases.

The first thing we will do is we will improve the format. Right-click on this space between « Gender » and the bars and select « Format… ».

data science alias bar chart tableau mining

The « Sheet » tab appears. In « Worksheet » changes the text size to « 12 ».

data science alias bar chart tableau mining

What is good with data mining is that we aren’t obligated to make a perfect chart because we don’t have to present them in a report to managers or a meeting.

For example, if I had to present this chart in a report, it would be necessary to change the vertical title. But we only make a model so this change isn’t necessary.

Now, look at this rectangle. We can see « Exited », « 0 » and « 1 ».

data science alias bar chart tableau mining

« 0 » means that the client stayed in the bank and « 1 » means that the client left the bank. We can also see that client who left the bank are in orange so 25% for women and 16% for men. And the client who stayed in the bank are blue so 75% for women and 84% for men.

We did an excellent basic A/B test but it would be much easier to read if we replace « 0 » with « Stayed » and « 1 » with « Exited ».

With aliases we can do that. An alias is to replace the binary results « 0 » and « 1 » with « Stayed » and « Exited » because it’s not easy to remember the meaning of « 0 » and « 1 ».

There are 2 ways to do it : create a calculated field or use aliases.

We will use aliases. Know that aliases are not going to change the « 0 » and « 1 » in the dataset, this change is only in Tableau.

In « Dimensions », right-click on « Exited » and select « Aliases… ».

data science alias bar chart tableau mining

data science alias bar chart tableau mining

A small window appears. In this small window, you can create an alias for each value contained in the « Exited » variable.

The variable « Exited » contains the value « 0 » and « 1 ». For the value « 0 », we will create the alias « Stayed » to say that the client stayed in the bank. For the value « 1 », we will create the alias « Exited » to say that the client left the bank. Then click on the « OK » button.

data science alias bar chart tableau mining

Look, we can see the new values in the rectangle.

data science alias bar chart tableau mining

The values « 0 » and « 1 » have been replaced by « Stayed » and « Exited ».

Now that the aliases saved, we will take the variable « Exited » in « Dimensions » and move it to « Label ».

data science alias bar chart tableau mining

data science alias bar chart tableau mining

Look, we have our aliases « Stayed » and « Exited » on the bar chart.

In this ways, it’s easier for people to read the bar chart without asking what meaning of « 0 » of « 1 » values. « Stayed » and « Exited » are clearer.

Now you know how to use aliases so that people can easily read the binary values of a chart.

Share this article if you think it can help someone you know. Thank you.

-Steph

Visualize An A/B Test in Tableau

A/B tes

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

We are going to do a simple and very visual A/B test in Tableau.

The first thing to do is save this worksheet and name it « Map ».

Do a right-click on « Sheet1 » at the bottom of the screen and select « Rename Sheet ».

Tableau A/B test data science

The second thing to do is to save this workbook. For that we go to « File » and select « Save to Tableau Public As… »

Tableau A/B test data science

The « Tableau Public Sign In » window appears to connect to your Tableau account.

Tableau A/B test data science

Save the workbook with the name « DataMining » and click on the « Save » button.

Tableau A/B test data science

We will create a new worksheet specifically for the A/B test. Click on the « New Worksheet » icon at the bottom of the screen.

Tableau A/B test data science

Look, you created the new worksheet.

Tableau A/B test data science

To start we need the dependent variable we are studying. This dependent variable is « Exited » which is « 1 » if the client left the bank or « 0 » if the client stayed in the bank. Now look, this dependent variable is in « Measures » so Tableau recognized this variable as a dependent numeric variable.

For our case, the dependent variable « Exited » is actually a category. Our logic in this situation is : « Did the client leave or did the client stay ?. For this reason, we need to move the variable « Exited » in the dimensions.

Tableau A/B test data science

Now, the variable « Exited » is in dimensions.

Tableau A/B test data science

Let’s do a classic A/B test, the A/B test for gender (male or female).Here what we’re going to test, if we keep all the rest constant and if we take a male client and a female client, which of the two is most likely to leave the bank.

Let’s go, move « Gender » in « Column ».

Tableau A/B test data science

We have 2 columns : « Female » and « Male ».

Tableau A/B test data science

Move « Exited » on « Colors ».

Tableau A/B test data science

Look, we have 2 colors. Blue for « 0 », it means the clients who stayed in the bank. Orange for « 1 », it means the clients who have left the bank.

To be more specific, we want know how many clients stayed and how many clients left the bank.

Move the variable « number Of Records » into « Rows ».

Tableau A/B test data science

Tableau A/B test data science

What we can notice is that the total number of men is higher than the total number of women. Then we can see that among the women, a large proportion left and that among the men a small proportion left. However, this is not enough to allow us to understand what is happening.

We will add « number Of Record » as a label. Move « number Of Record » to « Label ».

Tableau A/B test data science

Tableau A/B test data science

Change the label’s size in « 12 » and bold.

Tableau A/B test data science

Now we know the number of people in each category.

To better visualize this, we will replace the numbers with percentages. We want to see what percentage of female clients are gone and what percentage of male client are gone. With percentages we can easily make the comparison that with the absolute numbers, it’s not possible because the total number of woman is different from the total number of man.

To convert a absolute number to a percentage, you need to click on the arrow next to « SUM(number Of Record) ».

Tableau A/B test data science

And click on « Add Table Calculation… »

Tableau A/B test data science

Tableau A/B test data science

In « Calculation Type », select « Percent of Total ».

Tableau A/B test data science

Be careful, here there is one important thing to do. Change « Table(across) » and choose « Table(down) ».

Tableau A/B test data science

« Table(down) » will give us the total percentage in each column. Now you can close the window, the changes have been made.

Boom, we have the percentages.

We will create labels to make it easier to read. Click on the small arrow next to « SUM(number Of Record) » and select « Format… ».

Tableau A/B test data science

The « Pane » tab appears. In the tab « Pane » in « Numbers », choose « Percentage » and select « 0 » decimal.

Tableau A/B test data science

We will make it even more consistent. Move « SUM(number Of Record) » by holding key « Ctrl » or « Command » to « Rows » to replace the old « SUM(number Of Record) ».

Tableau A/B test data science

As you can see, the vertical axis is in percentage.

Tableau A/B test data science

Let’s do an analysis of what we see. We see that the percentage of female clients who left the bank is 25%. We see that the percentage of male clients who left the bank is 16%. What we see that female clients are more likely to leave the bank than male clients, all the rest being equal.

This A/B test is not complete because we have not done any test of statistical significance but this approach is effective for quickly obtaining results.

We’re going to do a full A/B test later but today you learned how to do an effective A/B test by focusing on relevant things. With a test of statistical signifiance, there are irrelevant variables and it takes a little more time but we’ll that later.

Share this article if you think it can help someone you know. Thank you.

-Steph

Priorities of Dieting (Part 1)

 nutrition

You see, nutrition has a lot of variables. If the change in the composition of your body is an equation, each of these variables has a place. As all variables are important to one degree or another to get the best results, you’ll want to adjust these in the right order.

Back to Math class

I know I know. We’ve told you, “There’d be no math”.

Don’t worry; you will not have to really do math; I just want to show you an analogy.

When you were just learning math, from the beginning, you probably learned the correct order of operations: parentheses, exponent, multiplication, division, addition, subtraction. You learned that this had to be done in that order.

math class

This is important. If you didn’t know about and that you were facing a complex equation, you would have no idea how to solve it. Doing things in the mess will be not, in general, a good job. Assuming you could actually “solve” the equation, the answer would almost certainly false. And you’d be ultra frustrated.

Which brings us back to dieting and my view.

A basic equation to change the body

As I said, to change your body is like an equation. Unlike math, it isn’t an exact science, but fortunately it’s a science, and we don’t know enough about the body to make statements verdicts.

The main variables of body composition are nutrition, exercise, recovery, lifestyle and dietary supplement.

The main thing is that you need to focus on those things and it’s in this order you have to.

body equation

Undoubtedly, nutrition is the point of the most powerful leverage. I think we can all agree on. People always ask percentages so if you want a table, here:

  • Nutrition – 51%

  • Exercise – 29%

  • Recovery – 10%

  • Lifestyle – 9%

  • Supplements -1%

It’s 100%, true? Someone check my calculations. Okay, good.

Anyway, most fitness coach would agree with this table.

Obviously, it’s not accurate to 1000 % and EVERY variables influences others. If your lifestyle is to get out 6 nights a week and partying, your recovery will be rotten, etc.

Nutrition, Diet and how to make changes (If you want to change)

If nutrition is the most important thing to make changes in your body, make changes in your diet is also important, right ? Exact.

Cool, we agree on that. We can also assume that changes in your diet in the process to change your appearance is called dieting. And as we’re not afraid of this word, no one will be traumatized.

So dieting. On the surface, it seems simple enough: make changes (positive) in your food choices and see the results.

This is true on the surface and at depth, but good to know what to change and when to change to not be confused.

As fitness itself, nutrition and dieting have a number of individual components or variables; These variables include everything from the type of food you eat with the amount of food you eat when you eat, the food composition and what other foods are eaten with.

And as math, some of these variables are carry a bit more weight (sorry, couldn’t resist) in the overall equation; they will not only have a greater global impact, but often stay focused on these things first is the only way to see progress.

diet

However, your body isnt as cut and dry as a math problem; things change as we will see later in the equation and there are very few constants to balance the variables.

This means that these things will have different total values for different people and different effects on the entire process.

It’s frustrating to not know exactly how or what will be the deciding factor that getting where you want to be. BUT – we know the general order. Many nutrition experts seem so lost in the wars of specific theories and they lose sight of the bigger picture.

Me ? That’s what I want to show you: the big picure of dieting and what things must be changed in order that will lead to the largest global changes.

I’ll will explain details in the 2nd part.

-Steph

Les priorités du régime (Partie 1)

nutrition

Tu vois, la nutrition a beaucoup de variables. Si la modification de la composition de ton corps est une équation, chacune de ces variables a une place. Comme toutes les variables sont importantes pour un degré ou un autre pour obtenir les meilleurs résultats possibles, tu auras envie de les régler dans le bon ordre.

Retour à la classe de Math

Je sais je sais. On t’as dit, “il n’y aura de math”.

Ne t’inquiète pas; tu n’auras pas à vraiment à faire de math; Je veux juste faire une analogie pour te montrer.

Lorsque tu étais en train d’apprendre les maths, dès le début, tu as probablement appris le bon ordre des opérations : parenthèses, exposant, multiplication, division, addition, soustraction. Tu as appris que cela devait être fait dans cet ordre.

math class

Cela est important. Si tu ne connaissais pas l’ordre et que tu étais face à une équation complexe, tu aurais aucune idée de comment la résoudre. Faire des choses dans le désordre ne sera pas, en général, un travail bien. En supposant que tu pourrais effectivement «résoudre» l’équation, la réponse serait presque certainement fausse. Et tu serais hyper frustré.

Ce qui nous ramène au régime et mon point de vue.

Une équation de base pour changer le corps

Comme je l’ai dit, faire changer ton corps est comme une équation. Contrairement aux maths, ce n’est pas une science exacte, mais heureusement, c’est une science, et nous en savons pas assez sur le corps pour faire des déclarations verdicts.

Les variables principales de la composition du corps seraient nutrition, exercice, récupération, style de vie et supplément alimentaire.

La truc principale est que tu as besoin de te concentrer sur ces choses et c’est dans cet ordre que tu dois le faire.

body equation

Sans aucun doute, la nutrition est le point de levier le plus puissant. Je pense que nous pouvons tous être d’accord dessus. Les gens demandent toujours des pourcentages donc si tu veux un tableau, le voilà :

  • Nutrition – 51%

  • Exercice – 29%

  • Récupération – 10%

  • Style de vie – 9%

  • Suppléments alimentaires -1%

C’est 100%, vrai ? Quelqu’un vérifier mes calculs. D’accord, bien.

Quoi qu’il en soit, la plupart des coach de fitness seraient d’accord avec ce tableau.

Évidemment, ce c’est pas exactes à 1000 % et chaque variable influence les autres. Si ton style de vie consiste à sortir 6 nuits par semaine et faire la fête, ta récupération sera pourri , etc.

Nutrition, Régime et comment faire des changements (Si tu veux changer)

Si la nutrition est la chose la plus importante pour apporter des changements à ton corps, faire des changements dans ton alimentation est aussi important, vrai ? Exact.

Cool, nous sommes d’accord là-dessus. Nous pouvons également admettre que des changements de ton alimentation dans le processus pour modifier ton apparence est appelé un régime amaigrissant. Et comme nous n’avons pas peur de ce mot, personne ne va être traumatisé.

Donc, un régime amaigrissant. En surface, ça semble assez simple: faire des changements (positifs) dans tes choix alimentaires et tu verras les résultats.

Cela est vrai à la surface et en profondeur, mais il bon de savoir quoi changer et quand changer pour ne pas être embrouillé.

Comme le fitness lui-même, la nutrition et les régimes amaigrissants ont un certain nombre de composants ou variables individuels; Ces variables comprennent tout, du type de nourriture que tu manges à la quantité de nourriture que tu manges lorsque tu manges, de la composition de l’aliment et quelles autres aliments sont mangé avec.

Et comme les maths, certaines de ces variables ont plus de poids (désolé, pas pu résister) dans l’équation globale; non seulement ils vont avoir un impact global plus grand, mais souvent rester concentré sur ces choses en premier est le seul moyen de voir les progrès.

diet

Cependant, ton corps n’est pas aussi clair et net qu’un problème de math; les choses changent comme nous allons voir plus loin dans l’équation et il y a très peu de constantes pour équilibrer les variables.

Ce qui signifie que ces choses vont avoir des valeurs totales différentes pour différentes personnes et des effets différents sur l’ensemble du processus.

Il est frustrant de ne pas savoir exactement comment ou ce qui va être le facteur décisif pour que tu arrives là où tu veux être. MAIS – nous savons l’ordre général. Un grand nombre de spécialistes de la nutrition semblent tellement perdu dans la guerres des théories spécifiques qu’ils perdent de vue le tableau d’ensemble.

Moi ? Voilà ce que je veux te montrer: les grandes ligne pour suivre un régime et les choses qui doivent être changées dans l’ordre qui mènera aux plus grands changements globaux. Les détails seront expliqué dans la 2e partie.

-Steph