## Validate Data Mining In Tableau With A Chi-Square Test

In this article we will start using statistics. Don’t worry we’ll do something simple, we’ll use the Chi-square test in a basic way. There is a special section to learn how to do statistics at an advanced level.

I’ll explain why we’re going to learn how to use the Chi-square test. The results we have with theses 2 bar charts are good. We see on theses 2 bar charts that age has a significant impact on the rate of client leaving the bank. We also see in which age groups the clients leaves the bank the most and which age groups the clients leave the bank the least. With that we have good insights.

In the A/B test « Gender », we can see that there is a correlation between the male and female sex and the choice to leave the bank. But as I said before, this A/B test is basic. The results of a basic A/B test visually shows us what is probably happenning in reality but we aren’t 100% sure of these results. To validate these results, we need do to use statistical tests like Chi-square test.

Doing a report based on basic A/B test is very risky and you can have completely false insights. I don’t advise you to do it (unless you want to leave your job). It’s for this reason that using Chi-square will help us to have strong insights.

Chi-square will allow us to know if our results are statistically significant. Our results are based on a sample of 10 000 clients and Chi-square test will tell us if these results are due to chance effects or if these results can represent all the client of the bank.

For example in our A/B test « Gender », we observed that in our sample of 10 000 clients, women are more likely to leave the bank compared to men.

Now, we aren’t sure if the results of this sample represent the behavior of all the bank’s clients.

To use basic Chi-square test, we use an online tool. Click here  .

On internet, there are plenty of websites to do a Chi-square test but we’ll use this one so that you can understand how it works. To do a Chi-square test, we need to use absolute values and in our A/B test we have percentage.

Let’s go back to Tableau. We’ll create a new tab with a version of A/B test with absolute values. In this way, we keep the A/B test with the percentages. Do a right-click on the « Gender » tab and select « Duplicate ».

Name the new tab « Gender Actual » to specify that it’s absolute values.

To have the absolute values, move « Number of Records » in « Measures » to the « Marks » area and put it over top of « SUM(Number of Records ».

Move « Number of Records » in « Measures » to « Rows » over « SUM(Number of Records ».

Cool, we have our absolute values.

We also need total absolute values, which means the total number of men and women. There is a very fast way to get that. Right-click on the vertical axis and select « Add Reference Line ».

Then in « Value », click on the drop-down on the right and select « Sum » to have the total sum of the observations.

And in « Scope », you select « Per Cell » option to specify that you want the total sums for each category, male and female.

Now, we have the total sum at the top of the bars. We will modify labels to have the absolute values. In « Label », we will change « Computation » to « Value » and click on the « OK » button.

Perfect, we have the total amount of observation at the top of each bar : 4543 women and 5457 men. We have what we need to use our online tool.

OK, I’ll explain how this tool works. « Sample1 » and « Sample2 » correspond to the independent variable « Gender ». You choose in which order you enter the data, « Sample1 » for men or the opposite. In our case, we use « Sample1 » for women and « Sample2 » for men.

« #success » corresponds to the result Y=1, which means in our case « yes, the client left the bank ».

« #trials » is the total number of observations, which means the total number of women in « Sample1 » and the total number of men « Sample2 ».

That’s how you enter the data :

• For « Sample1 » in #success, you enter 1139 because there are 1139 women who left the bank. For « Sample1 » in #trials, you enter 4543 because there are 4543 women in total.

• For « Sample2 » in #success, you enter 898 because there are 898 men who left the bank. For « Sample2 » in #trials, you enter 5457 because there are 5457 men in total.

Here is the verdict : « Sample1 is more successful ». « Sample1 » corresponds to women and #success is :« yes, the client left the bank ». This verdict means that of all the bank’s client, women are more likely to leave the bank than men. And look, there is something important, it’s « p<0.001 ». This means that the « p » is strictly less than 0.001.

« p » is the value that indicates whether an independent variable has a statistically significant effect on a dependent variable. In our case, the independent variable is « Gender » and the dependent variable is « Exited », which is : « yes, the client left the bank ». So « p » is strictly less than 0.001, which means that the independent variable « Gender » has a statistically significant effect on the dependent variable « Exited ». This shows us that out of the total number of bank’s clients, women are more likely to leave the bank than men.

This is how we use Chi-square test with this online tool. This is the same principle on all online tools that you can find on Google or DuckDuckGo . You can repeat these instructions that I gave you with other tools, you will get the same results.

It’s cool with the Chi-square we validated the A/B test and to specify that this A/B test is validated, we’ll color the tab in green.

Right-click on the tab, select « Color » and select « Green ».

Perfect, now we’ll validate another A/B test. Selects « HasCreditCard » tab.

We’re going to create an A/B test « HasCreditCard » only with absolute values. To save time, right-click on « Gender Actual » tab and select « Duplicate ».

We’ll remove the green color on the tab « Gender Actual (2) ». Right-click on the tab and select « Color » and « None ».

You rename the tab « HasCreditCard Actual ».

Move the variable « HasCrCard » over « Gender » in « Columns ».

Excellent, everything is ready to do a Chi-square test. We’ll remove « Exited » labels to better see the absolutes values. Make a click and drag out.

Perfect, let’s go back to our online tool. In this case, « Sample1 » is « no », which means client who don’t have credit card and « Sample2 » for « yes », which means clients who have a credit card.

That’s how you enter the data :

• For « Sample1 » in #success, you enter 613 because there are 613 clients who left the bank. For « Sample1 » in #trials, you enter 2945 because there are 2945 clients who don’t have a credit card.
• For « Sample2 » in #success, you enter 1424 because there are 1424 clients who left the bank. For « Sample2 » in #trials, you enter 7055 because there are 7055 clients who have a credit card.

Let’s look at the verdict, it’s « No significant difference ». « p » value is very high, it’s above 5%. This confirms that the independent variable « HasCrCard » has no statistically significant effect on the dependent variable « Exited ». That was the conclusion we had made when we had done the A/B test with percentages.

We had seen that there was 21% of « Exited » (clients who left the bank) in the category « no » and 20% in the category « yes ». With these results we concluded that most likely the variable « HasCrCard » had no impact on the rate of clients who left the bank. Chi-square test confirms our conclusion and we can put the tab « HasCrCard » in green to say that it’s OK.

Right-click on the tab « HasCreditCard » => « Color » => « Green ».

Excellent, now, you can do a statistical A/B test with 2 categories. Soon, we will do statistical A/B tests with more than 2 categories.

## Work Effectively And Earn More (Part 2)

If you don’t have read Part 1, click here .

# 5 actions to be effective

Optimize your working time

Use Pareto’s Law by focusing on the 20% of your actions that contribute 80% of your results and using Parkinson’s Law to determine how long to complete a task.

Here are other actions to put in place to optimize your time :

• Don’t disperse yourself

• Stop multitasking – This has been scientifically proven to be a waste of time and productivity. Read this scientific study .

• Stop interruptions – Things like smartphone notifications, emails or messages.

• Group actions.

• Remove unnecessary tasks – To find out if you’re doing a useless task, ask yourself this question from Peter Drucker : « Why am I doing this ? Is it necessary ? » With this question, you can easily delete unnecessary tasks. Exceptionally, you can use a notification on your smartphone that displays this question every 30 minutes. It’s a type of reminder all day long.

• Identify the 20% of things and people that cause 80% of your problems and delete them. If it’s someone in your family, talk to that person 2-3 times a week instead of every day.

Automate everything you can

Many tasks can be automated in companies. For example to send messages on social media (I use Buffer  ). It’s possible to automate a sale on internet, it’s the customer who does everything. The customer looks for a product, uses his/her credit card by filing out the payment form of the website, and the bill created automatically based on the information provided by the customer, etc.

It’s also possible to automate a company, this is the case of Drop Shipping. Drop Shipping is when you sell products that you don’t have in stock and that are sent directly from the supplier to the customers. Amazon offers this type of service too, you can put in their catalogs products that you sell and entrust to Amazon for the stock’s management, sending and returns of products. I wrote an article on Amazon’s drop shipping, here.

There is also the case of muses that explains Tim Ferriss in his book « 4 hours workweek ».

Delegate

Focus on your strengths and delegate the rest. Create a list of tasks that you want to delegate with instructions. Then gives these tasks to a team by assigning each type of task to a specialist.

Duplicate

There is no point in reinventing the wheel. You can duplicate the recipes of your mentors success and use that in your own company.

Recycle

A job that you did can be reused in a different form. For example, articles from a blog can be used to make a book, a podcast or a video.

# 4 actions to earn more

Determine your goal and strategy

Determine your goal, your process to reach it and the strategy to put in place. Here are some examples of strategies for developing your wealth :

• Replace your salary with real estate income and start your own company.

• Keep your work as employee and invest a maximum on stock market to create passive income.

• Create a company to have a complementary income like a blog, a podcast or a Youtube channel.

• Buy a piece of land and build several apartments (condos).

• Etc.

Optimize your management to spend less money

• Analyse the things you have to pay to eliminate waste : unnecessary subscription, insurance too expensive, etc.

• Print your bank statement and analyze it

• Seeking a way to achieve the same result by spending less : compare, buy cheaper, negotiate to save money for the things you really need.

• Optimize your taxation by reducing your taxes.

You can work on something once and get paid several times. You can create a seminar, keep 3 children instead of 1, you walk 5 dogs instead of 1, etc.

You can also use a job you have already done to create complementary income. Foe example, if you like to take picture, you can put them in stock photos on internet.

Duplicate the processes known to create wealth

• Pay yourself first

• Make money work for you by saving at least 10% of your income to invest them.

• Invest in yourself with training to learn new skills

Here are the options you can use to create a company that serves your life (and not your life serving your company). With internet it’s easier to use these levers with a blog, podcast or a Youtube channel by creating content.

## Dataset For Data Mining

To have the dataset to do Data Mining, you need to go to the superdatascience website . In « Part.1 Visualization », you see the section « How to use Tableau for Data Mining ». Click on « Churn Modeling » to download the file.

Once you have downloaded the file, move the file to the directory you created for the course. In this directory, create a new directory (unless you already do it) named « 2.Chunk investigation ».

Open this fiel with Excel or with other spreadsheet software.

Know that we use this dataset for the visualization part but we will also use this dataset for the modeling part.

Let’s analyze the data of this dataset.

This dataset is quite large because it contains 10 000 lines and a few columns. This is the list of a bank’s client. The client information is :

• Customer id (login)

• Surname (last name)

• Credit score ( is the measure that indicates the client’s ability to borrow)

• Geography (client’s country)

• Gender (male or female)

• Age

• Tenure -(the number of years the client is in the bank)

• Balance (balance of the client’s bank account)

• NumOfProduct (number of product that the client has in the bank – credit card, contract, account)

• HasCrCard (does the client have a credit card ?)

• IsActiveMember (did the client use his/her credit card during the last month ?)

• EstimatedSalary (the bank’s estimate of the client’s annual salary)

• Exited (did the client leave the bank ?)

Now, I will explain the context related to this dataset. This bank has branches in several countries like Germany, Spain and France. This bank noticed that lately there were many clients who left the bank. The bank has a report called « churn rate » which is the customers rate who leave the bank and for a few months the « churn rate » is really higher than usual. It’s for this reason that the bank needs a data scientist (you) to find the problem and propose solutions.

This dataset is a small sample of clients bank. These are 10 000 randomly selected client.

The column « Exited » is a column that didn’t exist before. This column has created when the bank realized that there was an abnormal number of client who were leaving the bank.

Then the bank observed these clients for 6 months to see which client left the bank.

In the « Exited » column, the number « 1 » means that the client left the bank and the number « 0 » means that the client stayed in the bank.

To analyze this dataset, you’ll need to do A/B Tests. For exemple, a classic A/B Test is to see if women are more likely to left the bank than men. That’s means, see the number of men who left the bank, see the number of women who left the bank and then normalize by the total number of clients. It’s important to normalize the number of clients because there are not the same proportions of women as men. Next, based on the last column « Exited », you’ll find out if it’s the men or women who are likely to left the bank.

Once you have relevant results, you can show your report to the bank. And with this report you should be able to propose solutions to the bank. For example, if the report says that women leave the bank in bulk, it’s because there is a problem and it’s necessary to see whether the bank is offering women something right. Or it’s possible that another bank offers a much more attractive offer for women or something else.

You will learn how to investigate in the dataset and find answer through client information with A/B tests.

## Bad Fitness Tips

I read a Nerd Fitness article  and I learned good stuff.

In the fitness world, there is a lot of contradictory information and it’s difficult to be healthy with this mess.

This for this reason I’ll give you  advice that my old friend Bobby told me so it’s Bobby who talks and Bobby says :

# Clothes

It’s important to have nice expensive sneakers to have results. You need to have the lastest Nike/Reebok/Puma/Adidas/etc.

For clothes, you need to have spandex pants because it helps to burn fat in thighs. You burn fat faster by doing less effort.

The gym is a fashion show.

# Material

Abs machines

You need to have a machine like Ab Coaster, Ab Roller, Ab Chair, Hawai Chair (it’s the best ) . Abs, it’s life.

A training without being exhausted

It’s stinking to be exhausted after a training session, this is why that Flex belt is pratical . This is something else, Weight loss wraps . You put that on your body and your body becomes thinner, this is cool.

Shake Weight

If you’re a lady and you want to have arms toned, Shake weight  is for you. You do it 10 minutes per day and you’ll have results.

The 562 movements machine

Do you know Nautilus machine ? It’s the same thing. You sit down, you move your arms and legs and you’ll have a pro bodybuilder’s body. Don’t use free weight because it’s really heavy and really difficult to use.

It’s a machine when you use it, you feel like you’re walking on a hill. It’s better than walking outside.

If you want buy everything, I advise you to take a credit. Don’t worry, your bank will like it. Look this new generation of treadmill.

# Training

Most expensive gym

I advise you to choose the most expensive gym of the town with the most machines as possible. You need to avoid free weight because it’s hard to be with people who lift dumbbells and barbells.

Subscribe to all gym’s classes

You need to participate to all gym’s classes like jazzersize, boxersize, supersize, sweatin’ to the oldies, dub-steppin’ to the newies, etc. You can just do one lesson, it’s cool.

Free weight

This is for lady. You need to don’t lift dumbbells or barbells otherwise you’ll have a bodybuilder’s body. If you need to lift a weight, I advise you to take the lightest weight and do 50-60 repetitions. Attention, you must not lift dumbbell more than 1.3kg (3lbs).

Change your program all 4 hours

By this way you shock your muscles. You need to change your training program very often to prevent your body from getting used to. There is 28-day workout split, it’s excellent. Tomorrow, it’s « left arm » day.

# Diet

Doesn’t change your diet

You don’t need to change what you eat. The proof is there are gym that offer pizzas and bagels . You need to take a lot of pills and shakers to have an athletic body. Diet pills, fat burner shakes, mass gainer, double up muscle building power, etc.

Supermarket

At the supermarket or grocery, you avoid all fruits and vegetables. Anything that doesn’t have labels « low-fat », you also avoid. You need to have healthy preprared meals. If on the labels there are 50 unpronountable ingredients on the label, it’s good.

Take everything that has the label « low-fat » like ice-cream, bagels, bread, cookies, french fries, etc. Takes whatever is « low-fat » is healthy.

If you don’t like to do shopping you can buy salads in fast-foods. Salads are healthy like macaroni salad, potatoes salad and deep fried ball of lard salad.

Energy drinks are excellent for you health, you can drink it like you want. It’s better than water.

Here was advices of my old friend Bobby. You smart and I know that you noticed that ALL THESE ADVICES ARE CRAP. Sadly, a lot of people think that these advices are useful and they become crazy when they don’t see results.

What is the worst fitness advice you have seen ?

-Steph

