Tuesday, February 11, 2014

Tableau Server Performance Monitoring

Analyzing Tableau Server performance is a complex beast. This is due to the fact that no one installation is like any other installation... makes sense. Tableau Inc. has some great built-in tools including the performance recorder as well as a white paper wherein they benchmark a given installation and response times.

But now what? This doesn't help us because as mentioned... all systems are different. A quick search on the web doesn't reveal much. It turns out that there are a few different disparate data sources that we can blend together into a unified view in order to analyze Tableau Server performance. They include:

  • Windows Performance Monitor
  • Tableau Server HTTP requests
  • Tableau Server audit events and
  • Tableau Server background tasks.

Here is a short list of steps to get this type of unified view; at which point, you could download the workbook shown below and swap out the data sources using the replace data source feature.

  • Enable the Custom Administrative Views feature of Tableau Server.
  • Learn about and enable Windows Performance Monitor. Tableau has a KB article to read. The workbook shown below uses 5-second intervals.
  • Take note that Tableau Server uses GMT Time. The workbook shown below uses a Tableau calculation called "Tableau Timestamp" to offset all relevant dates by 8 hours to pacific timezone in order to match my Windows perfmon timestamps. You will need to change this to your particular timezone.
  • The CSV output of your Windows Performance Monitor data collector will have really ugly field header names like "\\machineName\processName\PrivateBytes" - I changed these inside the workbook shown below - you will need to ensure that you can correctly swap the field names out with your own data.
  • Also note that the workbook shown below came from my laptop. I extracted all the data sources and inside the extract I filtered to "today only" which was Feb 6th, 2014. Obviously, you will want to disable extracts, or re-extract accordingly.

What's the point? Well here's the thing, folks ... instead of trying to come up with some repeatable statistic e.g.

  • "Total private Vizql RAM divided by number of unique users logged in" - this fails because you cannot predict how long a particular user sits on a viz, or, what the viz looks like, or, what the data source looks like, or
  • "Total RAM divided by HTTP requests" - this fails because there are lots of http requests for a given viz, or
  • "Total RAM divided by distinct users" - this fails because not all users are doing the same thing at the same time, or
  • Any number of other wacko stats.
...Instead of trying to do any of that, I recommend simple "immersion analysis". Immersion analysis is covered from several different angles in Dick Hauer's seminal work Psychology of Intelligence Analysis (it's a great read for numerous reasons - also free as a digital download, and no, I do not work for the CIA :) The basic premise of immersion analysis - specific to the Tableau Server platform and performance analysis - is that you should not be looking for a discrete and conclusive answer to the question of performance, e.g. "this thing times that thing must equal this other thing". Instead you should be using the available data to gather hypotheses with which to perform further research. Specifically, you are looking for:
  • Time Patterns: is there a spike or peak at a recurring or periodic time? (this is achieved with any of the data sources shown below)
  • Content Patterns: is there a particularly painful workbook or dashboard? (this is achieved by cross referencing RAM spikes against the audit tables. You will want to see which workbooks or dashboards were being looked at when the RAM spiked)
  • Task Patterns: Same as above, but this time you are cross-referencing background task details against RAM spikes (e.g. is a particular "refresh extract" pinning down Tableau Server?)

It might be considered a fruitless task to come up with known-good and reproducable performance benchmarks with the Tableau platform. Instead, you should gather information, come up with a hypothesis (e.g. a likely root cause for a performance spike), and then research the heck out of that hypothesis. Rinse, repeat, as needed. Download the workbook shown below to get started. Enjoy!

1 comment:

Alok Soni said...

This is a great publish.
I see that you have converted the wgserver\private bytes.

Are you factoring in multiple instances of wgserver, vizql or using just one those bytes?