## Create Bins and View Distributions

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

It’s cool, you finished the 1st part. Now we’re going to do more deep Data Mining analysis with this bank’s dataset.

To make these analyzes more deep, we’ll create a more statistical approach.

To do that we will create a new tab.

For this new tab, we want to understand how client distributed according to their age. Is there a majority of young or old people ?

Move the variable « Age » in « Columns ».

As we want to see the distribution of client ages, we need to use the variable « Number of Records » to see the number of observations. Move the variable « Number of Record » to « Rows ».

Boom, we have a chart but there is only one point on the top right. What happened is that Tableau took the sum of the ages of all the bank’s clients and the sum of all the « Number of Records », it means the total number of clients, 10 000 clients.

We’ll find a solution but before we’ll change the format to better see the chart. Right-click in the middle of the chart and select « Format ».

For the font’s size, select « 12 ».

Here you can see that the total age is 39 218 but that’s not what we’re looking for. What we want to see is the number of clients for each age.

I’ll explain what’s going on. We took the aggregated sums of our variables. Aggregate means that we took the total sum of the variable for each category. We added the ages but in fact we want to see the total number of observations for each age separately.

To have that, just click on the arrow in « SUM(Age) » in « Columns ».

Then select « Dimensions »

You see, Tableau doesn’t take the aggregated sum of ages but it takes ages separately. We have a curve that shows us the continuous distribution of our clients ages. That is to say, for each age, the curve gives is the number of clients of this age.

We’ll look at the dataset. Right-click on « Churn Modelling » and select « View Data… ».

There is window that appears that shows us the data in detail. If you scroll to the right, you will find the column « Age ».

We see that the ages rounded. As all ages rounded, Tableau is able to group clients by age. By positioning the mouse on the curve, we can see that there are 200 clients who are 26 years old.

If in the dataset, ages weren’t rounded, you would have seen clients with 26.5 or 26.3 years. It would create a lot of irregularity, there would be plenty of spikes with lots of variations.

Oooooh look, there is a variation that isn’t normal.

Let’s analyze it in detail. Around this peak, we see that there are 348 clients who are 29 years old.

Here, 404 clients who are 31 years old.

And this peak down that shows us that there are 327 clients who are 30 years old.

How to explain this irregularity ? It’s possible that many people of 29 years old are about to turn 30 years old and many people of 31 years old who just had 31 years old. It’s chance that make us have inaccuracies. You may have other inaccuracies if you data isn’t precise and rounded. In our case, the ages are rounded but we want to get rid of our small irregularity that we see on our curve.

There is way to see our distribution without our irregularities, it’s « bins ». « Bins » consists of grouping the information into different categories. That is we’re going to regroup our clients in different age groups.

Right-click on « Age » in « Measures ». Select « Create » and select « Bins… ».

A window appears. We’ll group our clients in 5-years increments. In « Size of bins », write « 5 » and click on the « OK » button.

As you can see, the variable « Age » has remained in « Measures » but there is a new variable in « Dimensions ».This is the variable we created « Age(bins) ».

Our « Age(bins) » variable was correctly placed in « Dimensions » because it is a category variable because each category corresponds to a 5-year age group.

For example, one category is 20 to 24 age group. Now we’ll create a new distribution based on « bins ».

To do that, we’ll remove the variable « Age » from « Columns » with a click and drag outside.

You move the variable « Age(bins) » from « Dimensions » to « Columns ».

Note

In this case, it’s not possible to directly replace « Age » by « Age(bins) » over « Age » on « Columns ». This is because « Age » is a measure and « Age(bins) is a dimension.

That’s nice distribution, it’s usually the type of distribution (chart) we see in economics or mathematics. The difference with the old chart is that this chart is discrete. This chart is discrete because the clients grouped by age group while the previous chart was continuous.

On this distribution (chart), each bar corresponds to an age range. For example, this bar corresponds to the 25-29 age group.

Now, we’ll change the colors.

In « Row », move « SUM(Number of Record) » while holding down the « Ctrl » or « Command » key on your keyboard to « Colors ».

We get our distribution in blue but we’ll change the color to red. Click on « Colors » and click on « Edit Colors »

In the window that appears, click on the blue square on the right to display the color pallet.

Select the red color and click on the « OK » button.

Click on the « OK » button of the « Edit Colors » window.

To facilitate the reading of the bar chart, we’ll add the number of clients in each age group. In « Row », move « SUM (Number of Record) » while holding the « Ctrl » or « Command » key on your keyboard to « Label ».

That’s it, we can see how many clients there are in each age group.

We see that the dominant bar is the 35-39 age bracket and the second dominant bar is the 30-34 age bracket. Overall, we can see that most clients are between 25 and 40 years old, which seems consistent.

On our bar chart, we have absolute values. We’ll replace that with percentages. Click in the little arrow in « SUM(Number of Records) » in « Label » and you select « Add Table Calculation… » but I’ll show you another way to do it.

Instead of clicking « Add Table Calculation… », click on « Quick Table Calculation » and select « Percent of total ».

It’s cool, we have the exact percentage of people in each age bracket. Now, we can see that in the 25 to 40 age group, we have 20 + 23 +17= 60% of clients.

I’ll show you one last thing.You can change the size of the slices easily, just click on « Age(bins) » and select « Edit ».

In the windows, you can change the size of the slices (bins). Put « 10 » instead of « 5 » to get 10-years slices. Click on the « OK » button.

Now, we have a distibution with fewer slices and the dominant slice is 30 to 39 years old.

Well, it was just to show you how to change the size of bins. To go back to the old distribution with the 5-years slices, click on « Back » button.

As you can see, the values on bars are in percentages but the values on the axis are in absolutes values. Here is an exercise that I ask you to do : « Put the values of the axis in percentage ». I’ll give you the answer the next article.

-Steph

I watched an Olivier Roland’s video  and I learned good stuff.

Here are the benefits of creating content frequently.

# Share techniques

I like to share the techniques, tactics and methods that I tested and worked for me and for others. As an athlete, I like to see my potential’s extent, I like to win and I like to see people around me win. It’s always frustrating to see someone who has a talent and isn’t fighting to realize their dream.

Often when I discover something in a book or someone shows me a technique that I test and that works, I want to say to the world « Look this technique works, try it right now ». That’s why I created my blog.

# Challenges

I watched a The Family‘s video  (I wrote an article about it Part 1 , Part 2 and Part 3). I decided to publish 5 articles per week for several reasons :

As an athlete, I like to push my limits. Posting 5 articles per week has forced me to find themes that fascinate me so I can enjoy learning new things and writing articles about it. Every day, I’m looking for new things. There are articles I write and don’t published because I realize that it doesn’t fit my vision.

At the beginning of my blog, you can see that I write articles to introduce some bodybuilders and I stopped doing that because I decided not to talk about athletes using steroids. With challenges, it allows me to refine the vision of my blog and my failures are part of the learning process.

# Frequency

From, 2014 to 2015, I published 1-2 articles per week. Toward the end of 2015, I watched a video about a 30-day challenge. After watching this video, I decided to write 5 articles a week for 30 days and after published 1-2 articles a week. During the 30 day challenge, it was difficult for me to write articles (find subjects and find the time to write texts) but my blog’s traffic seriously increased. When I took the rhythm of 1-2 articles a week, I felt less stressed because I had less pressure but my blog’s traffic of my blog had decreased.

I started to get frustrated because one of the important things for a blog is to have traffic. So, I started again this challenge of 5 articles per week during 30 days by writing only on topics where I have fun. My blog’s traffic increased like the first time. Frankly, it’s cool because I know that with this frequency, I can reach more people. And that allows me to develop my business. Yes, it’s possible to bring value to the world by bringing value to yourself, thus making money. Both are compatible.

Today, I have prospects and clients coming from my blog. People read my articles, it’s free and then they send me emails or messages with social media to find out if I can solve one of their problem.

# Learn

With this frequency of publication, 5 articles per week, it allows me to accelerate my learning curve. You can see the difference of writing between my first articles and my last articles. I improved my writing style. All that, it took me a long time and I know that I have a lot of things to improve. And I also learned that success is something that builds little by little. There is no success in 3 months or 1 year. Success is a thing in the long run. You learn to make the difference between lies and truths to have a healthy life in all areas.

For 1-2 years, you can see I’m learning to make videos on Instagram and Snapchat. I make 5 videos a week. Since about 6 months, I make an audio file (podcast) on my old exercise articles. In this audio file, I read the article so that you can have the information in a passive way. You can listen to my audio file while driving, working instead of listening to music. Making videos and podcasts are things I learn to do for my blog.

# Have fun

Sharing content frequently no matter the format, text, audio, video, it’s something that I do with fun. People are asking me how do I keep doing this for years. The answer is that I have fun when I do that. It’s like when I train or dance Cuban salsa. When I do that, I have fun and if I can learn something to improve myself, I do it.

I know there are people who create content to make money quickly but it doesn’t happen that way. You need to do this for 3-5 years for free to build your reputation and then you can afford to make money. That’s why it’s important to have fun when you do that because the difficulties come, you see them from a different angle. There is no cheat code so have fun.

-Steph

In the previous article we learned how to work with aliases. We will learn how to add a reference line in our bar chart.

Before I start, I’ll show you a trick in Tableau.

In our bar chart we can see the labels in this order : percentage and below : « Stayed » or « Exited ».

We will reverse this order. You go in this rectangle.

And you place the label « Exited » above the label « SUM(Number of Records ».

Look, the label « Stayed » is above percentage.

With that, we can understand the bar chart more easily.

Let’s add a reference line, let’s go . But before, I think you’d like to know why I’m talking to you about a reference line.

A reference line helps us to compare bar chart results with a benchmark. This benchmark is represented by this reference line.

In our case, the benchmark is the percentage of clients who left the bank in our sample of 10 000 people.

The first thing to do is find this percentage in our bar chart. To be able to do that, remove « Gender » from « Columns ».

Boom, we have a new bar chart.

Look, we only have the percentage of clients who left the bank and the percentage of clients who stayed in the bank.

We see that on our sample of 10 000 people, there are 20% of the clients who left the bank and 80% of the clients stayed in the bank. This means that the churn rate (client departure rate) is 20%.

What we’re going to do is we will add this churn rate in our A/B test. To return to our A/B test, press 2 times on Ctrl+Z or Command+Z or you can click 2 times on the « Back » button in the menu bar.

Now we know that the average clients who left the bank is 20%.

We will add a horizontal line in the Y axis (Y = 20%) to compare the 20% of the churn rate and the 2 categories male and female.

Let’s go. Right-click on the vertical axis (Y axis) and select « Add Reference Line ».

A window appears with several options.

You have the choice to add a line, a band, a distribution or a box plot.

We will use the line for the entire table.

Click on the « Line » button and activate the « Entire Table » checkbox. In « Value » selects « Constant ».

The constant is 20%, so it’s necessary that you put 0.20 in « Value ».

It’s possible to put a label on this reference line. For example, if the line reference corresponds to a formula, the label displays the formula. But for our case, our constant is 20% and it’s already displayed on the vertical axis so we will select « None ».

For the format of the line, select the continuous line and click on the « OK » button.

We have our reference line is added to our chart.

Here is what we can see. Female clients are more likely to leave the bank than average clients. Male clients are less likely to leave the bank than average clients.

In our case, it’s obvious to see that because there is only 2 categories, men and women.

Now you know how to add a reference line in a bar chart.

-Steph

## Manage Repetition When There Is A Lot Of Content

I watched an Olivier Roland’s video  and I learned good stuff.

There is a secret to manage repetitions when there is a lot of content. The secret is that you don’t need to manage your repetitions when you have a mission to learn something from your audience. To make a transfer of knowledge to your audience.

Attention, if every day you repeat the same thing, you will make your audience run away. I think you know the limit yourself.

Here are 2 benefits of repeating your content, education and editorial.

# Education

One the great keys to education is repetition. Being regularly confronted with knowledge make it possible to memorize and use this knowledge in everyday life.

# Editorial

1. You don’t have to reinvent the wheel all the time. You can recycle content that you published 6 months, 1 year, 2 years ago, etc.

If you look at a magazine and compare that the magazine has published in previous years, you will see that there is a lot of content that repeats itself.

2. You can recycle this content in a different way. You can talk about this content with a different angle because your opinion has evolved (you have improved your knowledge on this subject).

3. You can recycle this content in a different format. There are 3 type of formats : text, audio and video.

You can be the best in your country, there will always be people who have never read the article you published 2 years ago.

# Transfer of knowledge

It’s necessary to repeat your content for people who discover what you’re doing and haven’t yet consumed your content. If you write your articles thinking that your audience is taking the time to read all of your previous articles, you’re making a mistake and you could lose a part of your audience.

Repetition is necessary in education because human beings forget things easily.

People in your audience who read one of your article a year ago, will forget 3/4 of the content. And it’s the same thing for you when you read someone else’s content. That’s why recycling this content with a different angle or format allows your audience to learn better and memorize better.

If your mission is to make a transfer of knowledge, repetition is a great way to motivate people to take action. It forces people in your audience to ask : « Did I use that knowledge ? Did I act ? ». I didn’t create my blog to do intellectual masturbation. I create and recycle my content to motivate people to take action.

It’s possible that I wrote an article and that a part of my audience didn’t understand it. Recycling this article with a different angle or format may allow this part of my audience to better understand this article. The more I write and the better I can explain, it’s cool.

-Steph

## Label And Format

Our bar chart has colors by region but imagines that this bar chart is on a wall of an open space or in a report.

With labels, we can make this bar chart more clear, easier to understand.

In this bar chart, there are all necessary information: representative’s names, regions where representatives make sales and total sales for each representative in Swiss francs.

But, there is a problem. For example, if you ask for someone to say how many sales made Bill. This person must find Bill and see on the vertical axis to the left the value. Here we can see, it’s 1750.

But if we take the James case, we see that it’s between 1000 and 1500. James is far from the vertical axis and it’s difficult to say the true value.

That’s means, all people need to make effort to extract the bar chart’s information.

This it should not be the case because a Data Scientist searches always the best ways to communicate the information. This process is to help people to understand and extract the information in the easiest way.

« Labels » button allows you to add text information in your bar chart.

You will add a label with the SUM(TotalSales) information

To do this, you click on SUM(TotalSales) and press and maintain the key Ctrl or Command on your keyboard and drag and drop SUM(TotalSales) on « Label ».

Now you can see the total sales value at the top of each bar.

The bar chart is easier to read because there is the total value of sales for each representative.

Use the « Rep » information. Click on « Rep », press and hold « Ctrl » key or Command key on your keyboard and drag and drop « Rep » to « Label ».

Now you can read the representatives names at the top of the bars.

You can also add the region. I’ll show you another way to add « Region » in « Labels ». Click « Region » in « Dimensions » and drag and drop « Region » on « Labels ».

But it’s redundant because you can read the representatives names below and the regions at the top of the bar chart.

And each region has its own color. As it’s redundant, we remove « Rep » and « Region » from « Labels » by dragging and dropping out.

It’s better, it remains only SUM(TotalSales).

Let’s go to the next level, we will publish our labels.

To do this, do a right-click on « Labels » and click on « … » button.

It allows you to have your own text. For example write « Sales : » and click on « OK » button.

Now you can see that your text appears at the top of the bars.

Well, click on « Labels » and click on « … » button.

Delete the text « Sale : »and click on « OK » button.

We will see now how to format your bar chart. This is the last step before your bar chart is in production.

You will change the labels size. Click on « Labels » and click on « Font »

Select « 12 » and bold.

Oh, you can do the same thing by clicking on « … » button

You have the possibility to change the color but we will keep the color black

Now you’re going to change the label type. Right-click on SUM(TotalSales) and click on « Format… ».

In fact the labels have their own format and you can change that by clicking on « Label » but all the other thing on Tableau give their format options make a right-click on it.

So when you click on « Format », you’ll see 2 tabs : « Axis » and « Pane ».

Select the tab « Pane » because that’s where the labels of our bar chart.

By clicking on « Alignement », you can change the text’s direction of the labels.

But what you can’t do with the « Labels » button is to change the digital type.

Return on the tab « Pane », we’ll change the numbers in currencies. Click « Numbers » and select « Currency(custom) ». You can also change the currency type in the « Prefix/suffix ».

To simplify, you delete 2 decimals in « Decimal Places ».

As you can see on my bar chart, the SUM(TotalSales) is vertical at the top of each bar. To change the direction of the label text, click « Alignement » in the « Pane » tab.

But there is a problem. Some bars don’t have SUM(TotalSales). To fix this, right-click on each bar and select « Mark Label » and « Alwlays Show ».

Now, the bar chart is more understandable.

Let’s put the units in thousands. Click on « Numbers » => « Currency(custom) » => « Units » => « Thousands (K) ».

Add a decimal in « Decimal Places ».

That’s better, we can see Swiss francs sales for each sales representative.

Look, there’s something you need to know You can’t change the size of the text in the tab « Pane ».

If you click on « Font » and change the size, it will not change anything on your bar chart.

This is because the font size in the « Label » button dominates the font that is in the tab « Pane ».

Ok, we changed the labels format. Now, let’s change the axes format.

To do this, right-click on the vertical axis and select « Format ».

Click on the « Axis » tab and change the text size with « Font » to 12.

Then, right-click on the horizontal axis. Selects « Format ».

And in the « Header » tab, you change the text size with « Font » to 12.

Oooh, do you see ? Mathiew is cut off. To arrange this, enlarge the bar chart by clicking and dragging on the right.

Right-click on « Central » in the top axis and select « Format ».

And changes the text’s size with « Font » to 12 and bold.

Now, look at the top of the bar chart. The « Region/Rep » line is useless because we know that Central, East and West are the regions and the representatives names are at the bottom of the bar chart.

To change it, right-click on « Region/Rep » and select « Hide Field Label for Columns ».

if you want to improve the title « TotalSales » by adding a space, right-click on the vertical axis and select « Edit axis ».

In the « General » tab, add a space in the title and click « OK ».

Let’s do one more thing. We’re going to put all the « Total Sales » in Swiss francs. Make a right-click on the vertical axis and select « Format ».

Click on tab « Axis » => « Numbers » => « Currency(custom) ».

In « Decimal Places », you put « 0 ». In « Units », you put « Thousand(K) ». In « Prefix/Suffix », you put « CHF ».

Well, you did a good job. Now you know how to change the format of the charts in Tableau.

-Steph

## Navigate In Tableau

We’ll explore Tableau’s tools

From the connection manager, we’ll go into the Tableau’s workspace.

Click on the « Sheet1 » tab at the bottom of the window.

Here is the Tableau’s workspace.

The 2 important elements of the workspace are « Data » on the left and the workspace on the right. It’s in the workspace that you’ll create tables and charts.

« Data » divided into 2 zones : dimensions and measures.

The dimensions and measures are 2 different rules that will allow you to manipulate data.

Tableau sets the numerical values in « measures » and the categorical or quantitative variables in « dimension ». This is the Tableau’s settings by default.

There is also another way to explain « dimension » and « measures ». The « dimensions » are independent variables and the « measures » are dependent variables.

For exemple, « Units » is a measure, it’s the number of items sold per product. « Region » is a dimension, it’s the geographic region where the product sold. With 2 elements we can know how many items sold by region. This means that « Region » is an independent variable and « Units » is a dependent variable because it will be grouped by region.

But if you don’t like it, you can move the entities between dimension and measures and the opposite by click and drag.

In the menu bar, at the top, there is « File » where you can open and save file.

« Data » to connect to new source files.

« Worksheet » is the workspace to create analyzes

« Dashboard » is a combination of worksheet

« Story » is a combination of worksheet and dashboard

« Analysis » to specify how you want to do your analysis on your workspace

« Map » to add maps to the workspace

« Format » contains formatting options

Now, let’s study the workspace.

In the workspace, the main elements are « Columns » and « Rows ». This is where you decide which data goes in columns and rows in your worksheet.

You can also choose different format for these elements like colors, size, text level of detail and tooltips (useful tool optional).

Let’s do a test. Use data from « Region » (which is in « dimension »). Move « Region » with a click and drop to the center of your workspace. Now, « Region » is in the element « Rows ».

A table appears in your workspace.

You put a dimension in your workspace. Now put a measure in your workspace.

Uses the « Units » data. Move « Units » with a click and drop next to the « Region » column.

As you can see, Tableau automatically put « Region » in the « Rows » element and the « Units » data aggregated by region. In this way, you can tell how many items were sold by region.

Now, what you can do is to move « SUM(Units) » to the « Columns » element.

And then, you have a « bar chart » to see how many items have been sold by region. You can enlarge the graphic with a click and drop.

Let’s look at the tools that are in « Show Me » zone.

Click on « Pie chart » to have this chart’s type.

Click on « Size » icon and drag from left to right you can increase the chart’s size.

In this chart, each region has a color and proportion of items sold by region.

You can also test the « bubble chart ». Tableau organizes the data automatically and everything and placed in the « Marks ».

You can test « Treemaps » chart. This is the same principle as « bubble chart » but it’s rectangles instead of circles.

As you can see in « Show Me », there are charts disabled. This is because you need some elelments in your data to be able to activate them.

For example for the « Area chart », you need « date »data to activate it.