Tuesday, September 17, 2013

Tableau "Add to Context" Filters

I like to review ancient Tableau features from time to time, just to make sure things haven't changed. A Tableau "Add to context" filter is one of these.

The Feature called "Add to Context" is a great way to look at a subset of data inside of Tableau. It works by attempting to create a TEMP table in the database (which may or may not always succeed, depending on the database and your permissions - we are assuming that TEMP tables succeed for this blog post). As such, this feature can also be considered "dangerous" if it is not used correctly. Two areas where it is used incorrectly are:

1. Returning all the records from the original join or data connection.

Consider an imaginary table of data which is 100 million rows. The proper use of "add to context" is to filter down to 800k rows (for example), perhaps for one segment, customer, line of service, or whatever dimension you want. The improper use of "add to context" is to select everything in the filter. This has the effect of creating a new, second 100 million row table. If the originating data model was a multiple table join, the resulting context table is a single, denormalized, fully indexed Tableau TEMP table. A nice denormalizing effect, but largely useless if you are simply recreating the original 100 million rows. Now imagine you have Tableau Server thrown into the mix: each time an end user interacts with this sheet or dashboard, a new 100 million row TEMP table will be created. Not good... the solution is do not do this, please!Don't select all the field values in a "add to context" filter. It defeats the purpose of using this feature in the first place.

2. Returning a very wide table result when you only need some of the columns.

Let's continue to use the 100 million row example. But this time, we will stipulate that it contains 160 dimensions and 65 measurements. 225 columns in total. The feature "add to context" will dutifully create a TEMP table with all of these columns, and indexes on each column. This could take a long time to generate, before you even get to the analysis itself! If you do not need them, you can hide fields prior to using "add to context". Once hidden, they will not be included in the TEMP table output. This can have a huge performance impact on your use of "add to context".

In an actual test against a 40 million row table, we ran a context filter on a very wide and a more narrow Tableau result set. The wide data connection took 47.57 seconds to A) create the TEMP table, B) index the columns and C) query the "number of records". You can see all of this in the Tableau desktop logs. By the way, it's worth pointing out how amazing this fact is, that Tableau generates all of the correct SQL code for this. However, the result is huge and wide. This data connection had 50 dimensions and 32 measurements.

We ran the same exact "add to context" test against a second data connection which only had 11 dimensions and 4 measurements. This time the query took 13.98 seconds. This was a 340% reduction in query time!

Be careful when using this feature. It can be a huge time saver - as well as a huge pain...


Robert Morton said...

For some data sources like Oracle and Hadoop Hive, Tableau will use a subquery instead of a temporary table. This still has a denormalizing effect since the query optimizer may not be good about optimizing the interaction between the outer query and the subquery, but at least it does not materialize the results.

Another type of context that has even higher precedence than a Context Filter is a Data Source Filter. You can manipulate these through the Data menu, with the 'Edit Data Source Filters...' option on a given data source. Tableau prefers to use subqueries for data source filters, but may have to use a temporary table if the database cannot support certain operations within the subquery (like TOP N... ORDER BY).

Aside from the implementation differences, an important functionality difference between Context Filters and Data Source Filters is that the data source will hold on to its data source filters when you publish it to Tableau Server. This is particularly useful for embedding User Filters in the data source, so that a Desktop user can connect to the data source via Tableau Server (aka Data Server data source) and will be subject to the user filters. This makes for an easily authored data security layer. Just don't forget to set permissions on the published data source to allow queries but forbid downloading!


northwestcoder said...

Nice. Thank you for this...