Create Bins and View Distributions

tableau, bins, bar, chart, distribution, age, data, science

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

It’s cool, you finished the 1st part. Now we’re going to do more deep Data Mining analysis with this bank’s dataset.

tableau, bins, bar, chart, distribution, age, data, science

To make these analyzes more deep, we’ll create a more statistical approach.

To do that we will create a new tab.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

For this new tab, we want to understand how client distributed according to their age. Is there a majority of young or old people ?

tableau, bins, bar, chart, distribution, age, data, science

Move the variable « Age » in « Columns ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

As we want to see the distribution of client ages, we need to use the variable « Number of Records » to see the number of observations. Move the variable « Number of Record » to « Rows ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

Boom, we have a chart but there is only one point on the top right. What happened is that Tableau took the sum of the ages of all the bank’s clients and the sum of all the « Number of Records », it means the total number of clients, 10 000 clients.

We’ll find a solution but before we’ll change the format to better see the chart. Right-click in the middle of the chart and select « Format ».

tableau, bins, bar, chart, distribution, age, data, science

For the font’s size, select « 12 ».

tableau, bins, bar, chart, distribution, age, data, science

Here you can see that the total age is 39 218 but that’s not what we’re looking for. What we want to see is the number of clients for each age.

I’ll explain what’s going on. We took the aggregated sums of our variables. Aggregate means that we took the total sum of the variable for each category. We added the ages but in fact we want to see the total number of observations for each age separately.

To have that, just click on the arrow in « SUM(Age) » in « Columns ».

tableau, bins, bar, chart, distribution, age, data, science

Then select « Dimensions »

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

You see, Tableau doesn’t take the aggregated sum of ages but it takes ages separately. We have a curve that shows us the continuous distribution of our clients ages. That is to say, for each age, the curve gives is the number of clients of this age.

We’ll look at the dataset. Right-click on « Churn Modelling » and select « View Data… ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

There is window that appears that shows us the data in detail. If you scroll to the right, you will find the column « Age ».

tableau, bins, bar, chart, distribution, age, data, science

We see that the ages rounded. As all ages rounded, Tableau is able to group clients by age. By positioning the mouse on the curve, we can see that there are 200 clients who are 26 years old.

tableau, bins, bar, chart, distribution, age, data, science

If in the dataset, ages weren’t rounded, you would have seen clients with 26.5 or 26.3 years. It would create a lot of irregularity, there would be plenty of spikes with lots of variations.

Oooooh look, there is a variation that isn’t normal.

tableau, bins, bar, chart, distribution, age, data, science

Let’s analyze it in detail. Around this peak, we see that there are 348 clients who are 29 years old.

tableau, bins, bar, chart, distribution, age, data, science

Here, 404 clients who are 31 years old.

tableau, bins, bar, chart, distribution, age, data, science

And this peak down that shows us that there are 327 clients who are 30 years old.

tableau, bins, bar, chart, distribution, age, data, science

How to explain this irregularity ? It’s possible that many people of 29 years old are about to turn 30 years old and many people of 31 years old who just had 31 years old. It’s chance that make us have inaccuracies. You may have other inaccuracies if you data isn’t precise and rounded. In our case, the ages are rounded but we want to get rid of our small irregularity that we see on our curve.

There is way to see our distribution without our irregularities, it’s « bins ». « Bins » consists of grouping the information into different categories. That is we’re going to regroup our clients in different age groups.

Right-click on « Age » in « Measures ». Select « Create » and select « Bins… ».

tableau, bins, bar, chart, distribution, age, data, science

A window appears. We’ll group our clients in 5-years increments. In « Size of bins », write « 5 » and click on the « OK » button.

tableau, bins, bar, chart, distribution, age, data, science

As you can see, the variable « Age » has remained in « Measures » but there is a new variable in « Dimensions ».This is the variable we created « Age(bins) ».

tableau, bins, bar, chart, distribution, age, data, science

Our « Age(bins) » variable was correctly placed in « Dimensions » because it is a category variable because each category corresponds to a 5-year age group.

For example, one category is 20 to 24 age group. Now we’ll create a new distribution based on « bins ».

To do that, we’ll remove the variable « Age » from « Columns » with a click and drag outside.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

You move the variable « Age(bins) » from « Dimensions » to « Columns ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

Note

In this case, it’s not possible to directly replace « Age » by « Age(bins) » over « Age » on « Columns ». This is because « Age » is a measure and « Age(bins) is a dimension.

That’s nice distribution, it’s usually the type of distribution (chart) we see in economics or mathematics. The difference with the old chart is that this chart is discrete. This chart is discrete because the clients grouped by age group while the previous chart was continuous.

On this distribution (chart), each bar corresponds to an age range. For example, this bar corresponds to the 25-29 age group.

tableau, bins, bar, chart, distribution, age, data, science

Now, we’ll change the colors.

In « Row », move « SUM(Number of Record) » while holding down the « Ctrl » or « Command » key on your keyboard to « Colors ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

We get our distribution in blue but we’ll change the color to red. Click on « Colors » and click on « Edit Colors »

tableau, bins, bar, chart, distribution, age, data, science

In the window that appears, click on the blue square on the right to display the color pallet.

tableau, bins, bar, chart, distribution, age, data, science

Select the red color and click on the « OK » button.

tableau, bins, bar, chart, distribution, age, data, science

Click on the « OK » button of the « Edit Colors » window.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

To facilitate the reading of the bar chart, we’ll add the number of clients in each age group. In « Row », move « SUM (Number of Record) » while holding the « Ctrl » or « Command » key on your keyboard to « Label ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

That’s it, we can see how many clients there are in each age group.

We see that the dominant bar is the 35-39 age bracket and the second dominant bar is the 30-34 age bracket. Overall, we can see that most clients are between 25 and 40 years old, which seems consistent.

On our bar chart, we have absolute values. We’ll replace that with percentages. Click in the little arrow in « SUM(Number of Records) » in « Label » and you select « Add Table Calculation… » but I’ll show you another way to do it.

tableau, bins, bar, chart, distribution, age, data, science

Instead of clicking « Add Table Calculation… », click on « Quick Table Calculation » and select « Percent of total ».

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

It’s cool, we have the exact percentage of people in each age bracket. Now, we can see that in the 25 to 40 age group, we have 20 + 23 +17= 60% of clients.

I’ll show you one last thing.You can change the size of the slices easily, just click on « Age(bins) » and select « Edit ».

tableau, bins, bar, chart, distribution, age, data, science

In the windows, you can change the size of the slices (bins). Put « 10 » instead of « 5 » to get 10-years slices. Click on the « OK » button.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

Now, we have a distibution with fewer slices and the dominant slice is 30 to 39 years old.

Well, it was just to show you how to change the size of bins. To go back to the old distribution with the 5-years slices, click on « Back » button.

tableau, bins, bar, chart, distribution, age, data, science

tableau, bins, bar, chart, distribution, age, data, science

As you can see, the values on bars are in percentages but the values on the axis are in absolutes values. Here is an exercise that I ask you to do : « Put the values of the axis in percentage ». I’ll give you the answer the next article.

Share this article if you think if can help someone you know.Thank you.

-Steph

Add Color

colors

I have just enrolled in a Data Science course on Udemy and I learned good stuff.

Podcast:

In the last article, we created our calculated field « TotalSales » that you can see in «Measure » zone.

In Tableau, the calculated field is very used (almost every time) because in most case the data don’t give the value you want to show.

The calculated field « TotalSales » is a simple example to make you understand how it works but know that you can do things more complex. I’ll show you that later.

In this article, I’ll show you how to manipulate colors because it’s an important element to communicate. With colors, people will understand more quickly what you want to explain to them.

Imagine that you have to show this bar chart to the manager who handles the bonuses. By putting a little color, a little art, you could improve the reading of this bar chart.

To use colors, click on this button.

data science tableau color

You can change the color with the basic colors.

data science tableau color

Or you can have more colors by clicking here.

data science tableau color

If you have a picture in the background, you have the possibility to change the opacity to have a transparent effect of colors.

data science tableau color

You can add a border, change the border’s color, etc.

data science tableau color

But what would be nice to do is to have bars with different colors.

To start, take « Rep » and move it on « Colors ».

data science tableau color

With this, there is a unique color for each representative.

data science tableau color

There is also another method to do that. Instead of taking « Rep » and moving it to « Color », you can click « Rep » here.

data science tableau color

If you move it to « Colors », you’ll break everything because « Rep » will no longer be in the « Columns » zone.

data science tableau color

To avoid this, press Ctrl or Command on your keyboard and click « Rep » to make appear the sign « + ». Now that you made a copy of « Rep », move it to « Colors ». It’s like making a copy/paste from « Rep » to « Colors ».

data science tableau color data science tableau color

With this method, « Rep » is always in the « Columns » zone. This is a method that is very practical when there are many dimensions.

It’s possible to change representative’s colors by clicking here.

data science tableau color

As you can see, there are several choices of palettes.

data science tableau color

You can test the « color blind » palette which is very useful for color blind people. To select this palette, click « Assign Palette » and « Apply ».

When a palette has fewer colors than representatives, you will have a message saying that some colors will be duplicated. But this is not a problem because there are names below the bars.

data science tableau color

Now we want to see something else with our bar chart. Press “Ctrl” or “Command” on your keyboard and click on SUM(TotalSales) to display the « + » sign. Then move SUM(TotalSales) to « Colors » to replace « Rep ».

data science tableau color data science tableau color

As you can see SUM(TotalSales) has different colors. The colors are on a continuous basis which means that the more sales there are, the darker the color.

For our case, this is not useful because the size of the bars represents the sales number but for other situations, this is useful.

The problem now is that there are duplicate colors and because of this, the Manager could misinterpret the results. An alternative approach would be to ensure that the Manager understands the results.

The solution is to take « Region » (by pressing “Ctrl” or “Command” on your keyboard) and move it to « Colors ».

data science tableau color

You can also take « Region » (with “Ctrl” or “Command”) and move it to SUM(TotalSales) to replace SUM(TotalSales).

data science tableau color

With that, the bars are colored by region.

data science tableau color

That way, you can clearly see the 3 regions through colors that are unique to each region and you can see the total sales per representatives with the size of the bar.

This is a small example so that you can understand the basics to manipulate colors in Tableau. There are still more complex techniques to manage the colors that I will show you later.

Plays with the colors so you can fully understand how it works. You could find your favorite palette and find your style. Have fun.

Share this article if you think it can help someone you know. Thank you.

-Steph

Navigate In Tableau

front boat

I have just enrolled in a Data Science course on Udemy  and I learned good stuff.

We’ll explore Tableau’s tools

From the connection manager, we’ll go into the Tableau’s workspace.

Click on the « Sheet1 » tab at the bottom of the window.

data science tableau screenshot

Here is the Tableau’s workspace.

data science tableau screenshot

The 2 important elements of the workspace are « Data » on the left and the workspace on the right. It’s in the workspace that you’ll create tables and charts.

We’ll start with « Data » on the left.

data science tableau screenshot

« Data » divided into 2 zones : dimensions and measures.

The dimensions and measures are 2 different rules that will allow you to manipulate data.

Tableau sets the numerical values in « measures » and the categorical or quantitative variables in « dimension ». This is the Tableau’s settings by default.

There is also another way to explain « dimension » and « measures ». The « dimensions » are independent variables and the « measures » are dependent variables.

For exemple, « Units » is a measure, it’s the number of items sold per product. « Region » is a dimension, it’s the geographic region where the product sold. With 2 elements we can know how many items sold by region. This means that « Region » is an independent variable and « Units » is a dependent variable because it will be grouped by region.

But if you don’t like it, you can move the entities between dimension and measures and the opposite by click and drag.

In the menu bar, at the top, there is « File » where you can open and save file.

data science tableau screenshot

« Data » to connect to new source files.

data science tableau screenshot

« Worksheet » is the workspace to create analyzes

data science tableau screenshot

« Dashboard » is a combination of worksheet

data science tableau screenshot

« Story » is a combination of worksheet and dashboard

data science tableau screenshot

« Analysis » to specify how you want to do your analysis on your workspace

data science tableau screenshot

« Map » to add maps to the workspace

data science tableau screenshot

« Format » contains formatting options

data science tableau screenshot

Now, let’s study the workspace.

In the workspace, the main elements are « Columns » and « Rows ». This is where you decide which data goes in columns and rows in your worksheet.

You can also choose different format for these elements like colors, size, text level of detail and tooltips (useful tool optional).

data science tableau screenshot

Let’s do a test. Use data from « Region » (which is in « dimension »). Move « Region » with a click and drop to the center of your workspace. Now, « Region » is in the element « Rows ».

A table appears in your workspace.

data science tableau screenshot

You put a dimension in your workspace. Now put a measure in your workspace.

Uses the « Units » data. Move « Units » with a click and drop next to the « Region » column.

data science tableau screenshot

As you can see, Tableau automatically put « Region » in the « Rows » element and the « Units » data aggregated by region. In this way, you can tell how many items were sold by region.

Now, what you can do is to move « SUM(Units) » to the « Columns » element.

data science tableau screenshot data science tableau screenshot

And then, you have a « bar chart » to see how many items have been sold by region. You can enlarge the graphic with a click and drop.

Let’s look at the tools that are in « Show Me » zone.

data science tableau screenshot

Click on « Pie chart » to have this chart’s type.

data science tableau screenshot

Click on « Size » icon and drag from left to right you can increase the chart’s size.

data science tableau screenshot

In this chart, each region has a color and proportion of items sold by region.

You can also test the « bubble chart ». Tableau organizes the data automatically and everything and placed in the « Marks ».

data science tableau screenshot

You can test « Treemaps » chart. This is the same principle as « bubble chart » but it’s rectangles instead of circles.

data science tableau screenshot

As you can see in « Show Me », there are charts disabled. This is because you need some elelments in your data to be able to activate them.

For example for the « Area chart », you need « date »data to activate it.

Share this article if you think it can help someone you know. Thank you.

-Steph