Last Updated: March 2, 2020
At Anvil, we are knee-deep in data every day. We are constantly segmenting data to find groups that are performing well above the average and to find those that may be missing the mark. However, sometimes when one segment has a conversion rate that is only half a percent higher than another, it’s hard to say with confidence that it will consistently perform better.
What we needed was a way to quickly determine statistical significance and a way to visualize it in our Data Studio reports. Data Studio doesn’t have an out-of-the-box tool to help users calculate statistical significance so we created our own community visualization.
What are Community Visualizations?
Data can be messy and presenting it in a clear and concise manner is extremely important. As an analyst, it is important to understand the story you are wanting to tell before you begin to visualize the data. The story should dictate the visualizations used in a dashboard or report. If the wrong visualizations are used, it can greatly hinder the audience’s ability to effectively and efficiently interpret the data.
Google Data Studio is a reporting tool that easily integrates with countless digital marketing and data storage platforms. Data Studio has several visualization options readily available for users. Unlike many reporting tools, Data Studio allows users to import community visualizations. In other reporting tools, users are often limited to the visualizations that are available, letting the tool influence the story instead of the story being dictated by the data.
Calculating Statistical Significance
We have built several custom visualizations at Anvil, but the one we use most frequently is our statistical significance calculator. For our purposes, we have decided that statistical significance occurs when we are 95% (p<.05) confident that a relationship between categorical data exists because of something other than chance. In practical terms, what this means is that we can break users into categories and determine if their category impacts their likelihood to convert.
For example, we can place users into categories based on their device type. Then we can determine if the difference in conversion rates between categories is due to chance or if one will typically outperform others.
Our Stats Analyzer uses the Chi-Square method to determine statistical significance. Using p<.05 as the threshold for significance means that Chi-Square works best when there are only a relatively small number of categories to compare. The more rows in the calculation, the higher the likelihood the p-value will go below .05, whether or not the results show real significance.
- The more data you analyze (sessions, etc.), the more likely you are to reach statistical significance.
- Expected values are calculated in the context of the entire system; if you were to remove “tablet” from the example below, the expected values for “mobile” and “desktop” would change accordingly.
- If you’re interested in learning more, you can view a thorough description of how to calculate Chi-Square values (YouTube).
The message at the top of the visualization tells you if statistical significance has been achieved at the 95% confidence interval. The table within the visualization breaks down the devices, the actual number of sessions, and the conversions per device type in the first 3 columns. In the fourth column, we have the expected number of conversions. This will be calculated automatically by the tool. The last column represents the percent by which each category fell short or exceeded the expected conversions (the expected value from Chi-Square divided by the actual count).
Importing the Statistical Significance Calculator
To add the calculator to your Data Studio report, complete the following steps:
- Click the icon with four squares next to Add a chart.
- Click the “Explore More” button. Find the Stats Analyzer.
- To fix the error, add the correct data source to the visualization.
- Under dimensions, add the category you are analyzing, such as device type.
- Under metrics, add the conversion you are looking to analyze. This could be a goal conversion, an ecommerce transaction, or an event. In our example, we will use transactions.
- Next, you will need to add a second metric for the number of opportunities users had to convert. When using GA data, we often use sessions, like in our example. As long as the data source contains the dimension and two metrics from step 1, the visualization should appear. When adding the data source, make sure “Community visualization access” has been turned on. (This will only be an option if you own the data source. If you’re using the sample Google Analytics data, Community visualization access is already turned on.)
- Make sure to select the appropriate date range.
Changing the Design of the Statistical Significance Calculator
Many companies create their reports with their branding in mind so we wanted to make the design customizable. To do this, click on the visualization, and choose style at the top of the right pane.
- To change the subtitle (the second line under the warning label), click subtitle and type your desired subtitle.
- To change the background color of the calculator, click the first paint bucket and select your desired color.
- To change the background color of the warning label at the top, click the second paint bucket.
- To change the text color of the warning label, choose the first A with a dropdown.
- To change the text color of the table, choose the second A with a dropdown.
- To change the name of the title under the warning label, change the text where it says basic metric.
- To change the heading of the total and expected conversions, type the name of your conversion where it says converted.
- To change the name of the opportunities column, type a descriptive name of what is considered to be an opportunity in the next field.
See the style guide below for a labeled graphic.
Other Uses Cases for the Statistical Significance Calculator
In the example above, we have shown how the statistical significance calculator can be applied to conversions from users who visit the site on different device types, but this is not the only use case for the statistical significance calculator.
- We have used the statistical significance calculator to determine if the different channels driving users to the site have an impact on conversion rates. (This works best when there are fewer than five channels driving users to the site.)
- Additionally, the statistical significance calculator can be hooked up to other data sets other than Google Analytics data. For instance, we have connected it to Google Ads and used it to compare the performance of three or four campaigns at a time.
These are just a few examples, but there are countless other opportunities that the statistical significance calculator could be applied to. Just remember, to get the best results, try to limit the number of rows. And the more data you have, the higher chances you will find statistical significance.
Want to see it in action?