Monday, March 22, 2010

Public Data - Seattle Restaurant Inspections

Forcing a Sort in Tableau

Note: this is largely depricated by Tableau v7.0 and later... Dynamic sorting is difficult in any application, regardless of the technology. There are business and logical reasons why this is the case. Using Tableau, there are often several ways to effectively "sort" information, and no one method is "better" or "worse" than other methods - it's purely the business situation which will dictate this. In the example below, there are several converging Tableau concepts at work, including: nested sorting, sets, hidden data, quick table calculations, calculated fields, and some light formatting. What is particularly interesting about this example is the use of hidden data to maintain accuracy with the "percent of total" calc, without losing an effective sorting mechanism.

View The Example | View The Explanation

Wednesday, March 10, 2010

Thematic Maps - Seattle Crime Statistics

Sometimes I get rather caught up in the various dross and droll of the geocoding and mapping subject (read: thrash) as it relates to analytics, and I forget that a good map does not have to reference actual geography to represent a theme of data... Seems like a "D'uh" moment. I must be getting old.

The data below comes from and this visual uses a simple police precinct image as the background.

The basic conclusion I can come to with this data is: overall, crime is quite a bit down year over year for 2008-2009. Nifty.

All Crime Data | Crime Decreases | Crime Increases

Monday, March 1, 2010

Forecast Modeling

Author's Note, Dec 2010:This blog post is now obsolete by virtue of the fact that Tableau 6.0 has "parameters" - the entire subject of data explosion or data fanning is no longer a requirement.

original blog post

Quick start cheat notes:

  • sample data is here 
  • make all joins "not equal to" 
  • set all filters to single-select (and make them global just for convenience)

Read the rest of the article below this dashboard, for the full details... 
Use the sliders to interact



A discussion about "predictive modeling", or "forecast analysis", or, "what if analysis"

This entry discusses steps required to use Tableau to create what if or hypothetical analysis models. While not currently a core feature of the product suite (with the typical resulting marketing and sales splash as such), it turns out that with a little thought, and by following a few simple rules, you can create forecast models in Tableau. For the purposes of this discussion, I have used Xcelsius as a comparison, a tool that is commonly thought of as excellent for this type of analysis.

Products that provide extensive forecasting and predictive modeling algorithms are great tools for creating "what if" or hypothetical analysis - there's a great deal of heavy lifting going on to make this magic work. Some of the technical challenges involved include (not limited to):

All of the heavy lifting or number crunching needs to occur somewhere or at some point in the overall process

The user experience and software design thereof needs to provide compelling and useful tools for managing the input and output of the modeling.

There are some good software packages out there that provide forecast modeling and predictive analysis. A few worthy mentions include SAS, SPSS, the open source "R" package, and SAP’s Xcelsius (formally owned by Business Objects, who in turn purchased it from Infommersion, Inc, see nice history link here). From anecdotal evidence it appears that Xcelsius attempts to operate on one very small amount of data at a time, and then "explodes" this data out into a much larger data set - most likely in memory. General approaches to this type of modeling scenario include pre-processing possible values in a database engine, or, performing other in-memory processing.

Tableau can achieve this "exploded" data concept by creating table joins on all possible values - essentially a Cartesian join showing all possible combinations of data. There are a few simple tricks to make this work:

  1. Create all joins as "table1.fieldX <> table2.fieldY" (i.e. use the "not equal" type). Make sure no values will match, or else this does not work! See the excel file which shows the four sheets to join; as well, you can download the workbook shown above to see the joins.
  2. Any time you create calculated fields, make sure that they are non-aggregation calculations. See the workbook for the calculation called "potential value". Basically, you want row-by-row calculations at all times! 
  3. And most importantly! Make sure you have single-select filters for all forecast or what-if variables which are controlling the "fact" data.

What does that last item 3 really mean? Let’s take a closer look at the sample excel data:

In the example excel file, we had three categories (K-12 School District, S&P 2000, and Federal Agencies) and 6 possible values (50k through 300k) for a total of 18 possible values.

By joining on a percentage between 1 and 100, the data gets exploded out to 1800. And by adding in a second variable called "blending", which ranges from -.21 percent to .21 percent (43 possible choices including zero), this further explodes the 1800 records out to 1800 times 43 = 77,400 records.

However, because our three Tableau filters for "blending", "percentage" and "variable" are all marked as single-value selections, we will only see 3 records at a time (the original three categories).

Exploring the workbook shown above by downloading it will help convey this point.

You can now start to guess why Xcelsius and possibly other products start to perform slowly when dealing with larger data. One of the reasons is the data explode factor involved. If instead of the above numbers, I wanted to apply a two-variable what if or forecast model on 500 types of values (500 customers showing sales, 500 products showing inventory, etc), and one variable ranged from 1 to 100, the second variable ranges from -30 to +30, then my data explode result would be 500 * 100 * 61 = 3,050,000 records! No matter how you slice and dice this problem - in memory vs. pre-calculated, oracle vs. sql server or some other database, java versus dot net- you will still be faced with this concept of data "explosion"...

Tableau can handle 3 million records, by the way...