Last Updated: December 17, 2019
At Anvil, we are knee-deep in data every day. We are constantly segmenting data to find groups that are performing well above the average and to find those that may be missing the mark. However, sometimes when one segment has a conversion rate that is only half a percent higher than another, it’s hard to say with confidence that it will consistently perform better.
What we needed was a way to quickly determine statistical significance and a way to visualize it in our Data Studio reports. Data Studio doesn’t have an out-of-the-box tool to help users calculate statistical significance so we created our own community visualization.
Data can be messy and presenting it in a clear and concise manner is extremely important. As an analyst, it is important to understand the story you are wanting to tell before you begin to visualize the data. The story should dictate the visualizations used in a dashboard or report. If the wrong visualizations are used, it can greatly hinder the audience’s ability to effectively and efficiently interpret the data.
Google Data Studio is a reporting tool that easily integrates with countless digital marketing and data storage platforms. Data Studio has several visualization options readily available for users. Unlike many reporting tools, Data Studio allows users to import community visualizations. In other reporting tools, users are often limited to the visualizations that are available, letting the tool influence the story instead of the story being dictated by the data.
We have built several custom visualizations at Anvil, but the one we use most frequently is our statistical significance calculator. For our purposes, we have decided that statistical significance occurs when we are 95% (p<.05) confident that a relationship between categorical data exists because of something other than chance. In practical terms, what this means is that we can break users into categories and determine if their category impacts their likelihood to convert.
For example, we can place users into categories based on their device type. Then we can determine if the difference in conversion rates between categories is due to chance or if one will typically outperform others.
Our Stats Analyzer uses the Chi-Square method to determine statistical significance. Using p<.05 as the threshold for significance means that Chi-Square works best when there are only a relatively small number of categories to compare. The more rows in the calculation, the higher the likelihood the p-value will go below .05, whether or not the results show real significance.
The message at the top of the visualization tells you if statistical significance has been achieved at the 95% confidence interval. The table within the visualization breaks down the devices, the actual number of sessions, and the conversions per device type in the first 3 columns. In the fourth column, we have the expected number of conversions. This will be calculated automatically by the tool. The last column represents the percent by which each category fell short or exceeded the expected conversions (the expected value from Chi-Square divided by the actual count).
To add the calculator to your Data Studio report, complete the following steps:
Many companies create their reports with their branding in mind so we wanted to make the design customizable. To do this, click on the visualization, and choose style at the top of the right pane.
See the style guide below for a labeled graphic.
In the example above, we have shown how the statistical significance calculator can be applied to conversions from users who visit the site on different device types, but this is not the only use case for the statistical significance calculator.
These are just a few examples, but there are countless other opportunities that the statistical significance calculator could be applied to. Just remember, to get the best results, try to limit the number of rows. And the more data you have, the higher chances you will find statistical significance.