Funnel plots for count data in Tableau – brute force approach

Many years ago, my former employer (and precursor to Public Health England or PHE) – the Association of Public Health Observatories, released a series of Excel spreadsheets in which the basic statistical calculations for various funnel plots were described and demonstrated.  You can still find these resources online (proportions, rates and DSRsISRs and SMRs, counts).  If you don’t know what the difference is between a rate, proportion or a count (nor how to identify them), read the accompanying guide here.

Put simply, a funnel plot is a two-axis control chart that takes into account sample size when performing significance testing.  It places a measure of variation on the Y-axis and a measure of population (or sample size) on the X-axis.  The control limits become tighter as the associated population becomes larger – hence the funnel.  This principle can be applied to a variety of measures, albeit with different statistical underpinnings.

For example (courtesy of Rick Wicklin at SAS Blog):

Example of a funnel plot for proportions displaying the success of adoptions of children in Local Authorities. The success rate is along the Y-axis, the number of adoptions is along the X-axis.

A small number of other Tableau users have covered how to recreate these charts.  The most impressively detailed explanation is from Jonathan Drummey, located at his blog Drawing with Numbers.  His post does a few things really well that I won’t attempt to recreate:

  • he details the challenge of pulling together a dataset in Tableau that can display both the control limits and the marks;
  • he explains how this is constrained by how Tableau draws its elements (and why this poses a challenge); and
  • he also explains the various approaches to resolving these tensions.

In effect, the major problem that we have with funnel plots is that when designing these in Excel (or many other products) we are essentially using one arbitrary data range to generate the control limit funnel and an entirely separate and arbitrary data range for the marks.  Tableau doesn’t really work like that – it expects a unified and single table for each visualisation.  That table may be a result of joins and blends from multiple data sources, but it must be a single table in the end.

Jonathan also focused on the methodology behind funnel plots for proportions.  The advantage proportion-based data has is that Tableau can perform the calculations necessary to generate the funnel control limits.  To my knowledge, it cannot (as of v10.1), perform this same function for count-based data.

Its also worth pointing out that the control limits of count-based data are not dependent on the process central tendency, as with other funnels.  The limits are static and pre-defined, akin to a critical values table.

The Challenge

I needed to quickly develop a count-based funnel plot Tableau interface for professional purposes, but didn’t have the advantage of having an integrated R Server to hold the Poisson functions to call upon dynamically.  Effectively, this meant that the statistical testing must be done external to Tableau.

I could dynamically assign control limit values to each observation in the data source (via a SQL/Access UDF), but I also had the challenge that in some cases the number of observations/marks may be low (perhaps, three or four), so didn’t want to run into the high cardinality problem of jagged funnels.

Nothing worse than a jagged funnel.

So… how do you get Tableau to display a decent (non-jagged) funnel curve and a set of marks?  And how do you preserve the display of the funnel if you choose to apply a filter to the marks?

The How

By using some god-awful data structures, that’s how!

It is possible to ‘trick’ Tableau into displaying unrelated and arbitrary datasets in the same chart by boxing your data diagonally (this was a trick I picked up from Bill Lyons when trying to display customised geographical polygon and point data in a single map).

In effect, the dataset takes on an odd structure:

Example dataset with randomised data, arranged in diagonal blocks with a shared field.

This allows you to throw both arbitrary datasets onto separate ‘pills’ of a dual axis Tableau viz, but conform to Tableau’s need to have a single data table driving the viz.  Let’s walk through the process using the data found in PHE’s example Excel spreadsheet and technical guidance.

Data Preparation

For this example, I’ve prepared a quick spreadsheet demonstrating the principles above.

ExampleFunnelPlotCountsTableau

As you can see, the Poisson table containing the control limit values is on the top-left ‘quadrant’ and takes up the first 100 rows, with the Under-75 Lung Cancer mortality data in the bottom-right ‘quadrant’ taking up the rest of the table rows.  I’ve also added an additional field named “Grouping Type” – this will be used later to demonstrate a particular feature.  You will also notice a shared field which defines what the dataset belongs to.  I’ve chosen to use a binary field to define which is the funnel data.  This is not strictly necessary but makes things a little easier further on.

Example spreadsheet showing the point at which the two datasets intersect.

Couple of things here.  Firstly, the Poisson table in the PHE spreadsheet is scaled to the maximum value in the lung cancer data.  It uses a neat custom function to iterate gradually wider gaps between each point on the funnel, up to the maximum expected value, within the confines of 100 rows.  You may, however, want your funnel to have the potential to reach a greater length than 750 on the X-axis.  The Excel spreadsheet form PHE can generate a more comprehensive table if you modify the observed data.  I’ll update this post later with an adaptation of the embedded function that can generate a Poisson table of your choice.

Second thing, the example spreadsheet was obviously created by me copying and pasting a few ranges into Excel.  This is obviously not practical in an enterprise environment where data needs to be pulled on demand.  Doing this in a more automated SQL environment requires you to union the two tables, using null values.  Something along the lines of this:

Once we’ve done this, we can connect to it in Tableau…

Building the Visualisation

Let’s build the funnel…

  • Pull the “Measure Values” pill to the rows shelf and get rid of any values pills that aren’t the lower or upper limits (a “Measure Names” filter will be created automatically).
  • Pull the “Expected Events” pill to the columns shelf.
  • Pull the “Row Number” pill to the detail shelf on the marks card.
  • Switch the marks from “automatic” to “line”.
  • “Measure Names” will also have been automatically placed on the detail shelf.  Move it to colour instead.

You should end up with something like this:

Step 1: Building the basic funnel structures.

Let’s build the marks…

  • Create the Observed / Expected ratio by creating a calculated field named “O/E”:

( [Observed] [Expected] ) -1

  • Pull the “O/E” pill to the rows shelf.
  • Pull the “Expected” pill to the columns shelf.
  • Pull “Unit Detail” to the detail shelf (make sure you pull this to the detail shelf on the “Expected” marks card, not the “Expected Events” card).
  • Change the marks on the “Expected” marks card to “shape”, rather than “line” (if it defaulted to line).

You should have something that looks like this:

Step 2: Bringing in the marks.

  • Convert the columns shelf and the rows shelf to dual axis.  Read here if you’re unfamiliar with this.
  • Synchronise the axes.
  • Remove “Row Number” from the “Expected” marks card.

It should look like this:

Step 3: Making the dual axis (your marks are probably still green, this picture was taken slightly out of sequence).

We can clear up the formatting at this point

  • Reduce the control line thickness and colour them more appropriately.
  • Adjust the colour of the mark shapes
  • Reduce the size and opacity of the mark shapes to something more fitting.
  • Fix the minimum and maximum of the Y-axis to -1 and 1, respectively (we’ll come back to this later).

And, we have a funnel chart:

Step 4: Tidying up the defaults.

This works because the null values (in the rows under the control limit data and the null values above the lung cancer data) aren’t being rendered by Tableau on their respectively axes.  You can see this confirmed by the notification in the bottom-right of the viz.

At this point, we have an end product that is reasonably similar to the PHE product.  But let’s see if we can go a little further.  What if the lung cancer dataset had multiple time periods for each Local Authority?  What if it had separate calculations for males and females?  At some point we wouldn’t want to display all of this information in one viz… we’d want to be able to filter.

Introducing Interactivity

Trouble is, if we filter by any field in the lung cancer dataset, we’ll lose the funnel.

  • Try assigning “Grouping Type” to the filters shelf; and
  • for now, select all values.

You’ll note that along with the group names (County, London, etc) there is a “Null” option.  Those nulls are the first 100 rows in which the control limit are contained.

If the filter is set as a mutually exclusive option such as a radio button or dropdown, etc (and it’s reasonable to assume you’d want to do this if you’re exposing the filter to the end-user), filtering to “London” or any of the other groups will remove the control limits, along with the unwanted sub-group marks.

You could create a calculated field which replaces the nulls with the string “Control Limit” and change the filter to a multi-select type:

Step 5: Applying multi-select quick filters.

This would allow the user to show what they want (and the end-user would now explicitly understand what “Null” was), but it may not make the greatest sense if you wanted to force mutual exclusivity on the end-user when choosing the mark groupings to display.  You may also be dealing with a tech-nervous end-user audience who find multi-select options overwhelming.

Also, you may want to filter by multiple fields (like gender or time period), so this solution starts to get messy to the end-user.

One solution would be to abandon quick filters and make use of a parameter.  Try doing the following:

  • If you added the “Grouping Type” pill to the filter shelf earlier, remove it.
  • Create a parameter named “Group Type Selection”, using data type string and allowable values from “list”.
  • Add values from the field “Grouping Type”.
  • Add and extra value at the bottom as “(All)” and then drag it to the top of the list.

After choosing to display the parameter control, it should look like this:

Step 6: Parameter development which will drive our makeshift ‘filter’

Now let’s construct a calculated field which will act as our filtering field, based on our parameter selection:

  • Construct a calculated field named “Include Marks” with the following formula:

[Funnel Shared] = TRUE
OR
[Group Type Selection] = “(All)”
OR
[Grouping Type] = [Group Type Selection]

  • Pull the “Include Marks” pill onto the filters shelf.
  • Select the “True” values to be displayed.

Using the parameter will now act as a ‘filter’, while still retaining the control limit data.  Take a look:

Step 7: Filtered data and a very present funnel.

Great, but what if we have extreme values that exceed where we’ve fixed the Y-axis?  Or what if we want the horizontal extent of the funnel to shrink when we select a sub-group in the “Grouping Type” with lower expected values?

We’re going to construct two level of detail calculated fields, each of which calculate the maximum value of their respective axis measure.  We’ll also create another calculated field to act as a filter, based on the two level of detail calculations.  Let’s go:

  • Create a calculated field named “Expected Max LOD” as follows:

{ fixed [Include Marks] : MAX([Expected]) }

  • Create a calculated field named “O/E Max LOD” as follows:

{ fixed [Include Marks] : MAX([O/E]) }

  • Create a calculated field called “Include Funnel” as follows:

[Expected Events] < [Expected Max LOD] * 1.2
AND
[Upper3s] < [O/E Max LOD] * 1.2
OR
[Funnel Shared] = FALSE

  • Revert the Y-axis back to automatic, rather than the fixed limits that we set earlier.  You should now see a funnel where your data is in the far, left-bottom area.
  • Pull “Include Funnel” to the filters shelf.

You should now see a funnel that will only display the parts of the control limits that are (near to) below the maximum extent on both axes of the sub-group you choose in the parameter.  The “* 1.2” aspect of the LOD functions are not particularly magical, they just give enough ‘padding’ to ensure that both ends of the funnel extend just beyond the maximum axis values in the lung cancer dataset.

Step 8: Before and after the “Include Funnel” filter has been applied.

Great, but what about ‘filtering’ on multiple dimensions?  What if the dataset contained other dimensions like gender?  You can extend the principle of parameter filtering further.  For demonstration’s sake, I’ve chosen to simulate an additional dimension called “Grouping Type 2” by creating a calculated field (which reads the first letter of the “Unit Detail” field and groups accordingly) and an accompanying parameter created following earlier instructions.

The “Include Marks” calculated field then needs altering to take account of this new parameter result:

[Funnel Shared] = TRUE
OR

([Grouping Type] = [Group Type Selection]
AND

([Grouping Type 2] = [Group Type Selection 2]
OR
[Group Type Selection 2] = “(All)”)
)

OR

([Grouping Type 2] = [Group Type Selection 2]
AND

([Grouping Type] = [Group Type Selection]
OR
[Group Type Selection] = “(All)”)
)

OR

([Group Type Selection 2] = “(All)”
AND
[Group Type Selection] = “(All)”
)

You can experiment with this further, adding in additional dimensions and parameters.  You are given a fair warning that your brain may begin to melt with the clustered Boolean logic.

You may also want to do some minor formatting chores:

  • Disable tooltips on the control limits (you won’t be able to disable the tooltips on the “Expected” card – instead, do this by removing all text from the tooltip dialogue); and
  • Change the number format properties of the Y-axis to display as a percentage (right-click the axis and select “format”).
  • You can also pull the “Observed” field pill onto the “Expected” card and have it display in the tooltip.

Anyway, the final result:

Hope it helps.  Do let me know.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Leave a reply by filling in your details below, or connect automatically with social media:



Your email address will not be published. Required fields are marked *