Friday, February 22, 2013

Multivariate Analysis

Over the years I have heard people throw around the phrases 'multivariate analysis' and 'regression analysis' in situations where I don't think they know what they're talking about. It's just a checkbox on a list for them. Guess what? I also don't really profess to understand all the functional math behind the many different subjects related to these phrases. (Well, maybe just a little...) But what I can say is that I'm starting to suspect that all of the traditional modeling is becoming less and less of a requirement and more and more of a nice to have. It's also boring as hell. And it's also totally doable in Tableau if you are willing to change the way you look at information.


For example, wikipedia has a nice sentence in their writeup on multivariate analysis: "Often, studies that wish to use multivariate analysis are stalled by the dimensionality of the problem." What? They are? Sad... this is not a problem in Tableau! Data can be measures, dimensions, discrete, continuous, dependent, independent, quantifiable, and categorical… all at once. We just don't even think about these kinds of challenges at all.


To the right, we are using one of the oldest books in the Tableau bag of witches. I don't want to beat a broken horse, but seriously? Tableau "measure names" and "measure values" are da bomb! We have also added Tableau parameters to allow selection of the primary variable; logarithmic scale to show widely different value ranges; and the use of color encoding to help identify variables. Tight, amarite? No, this paragraph was not run through a bro-speak filter. It's just your browser.



They also state: "regression analysis can be used to understand which among the independent variables are related to the dependent variable…" - Hello? I think this is called a Tableau Bin or a Tableau Group the last time I checked? It's been around forever. Oh, and you can create ridiculously complicated Tableau Sets and calculations to drive various cohort and dimensional slicing analyses if you need that type of thing.


The interesting item on the viz to the left is the use of type-in filters to remove calculated outliers. Not originating data, but calculations of that data. The calculations in this case are rather simple: it's the difference of the per-student values (GPA or Hours) from the mean values of the entire data set. They could be as complex as you need them, I suppose.. Where this gets into multivariate territory is the use of small multiples if needed - even picking the dimension for the small multiple itself. Play with the choices to see what's going on.


One type of multivariate analysis is "bivariate analysis" which is the simplest form you can get away with in order to pass your entry level college stats class. You gotta have two variables to call it "multi" after all! In conversation, "bivariate" will get you into trouble because it's just not as cool-sounding as "multivariate". Although it does has an above-average coolness factor compared to most buzz words.


On the viz to the right we are allowing bivariate analysis to explode into greater and greater detail. Why? At lower, blockier resolutions you could lasso an entire circle; view underlying data; and then do something with that data. Look for this icon after you lasso some data:

Or, simply increase the resolution ("Bin Size") to see the more-detailed pattern. Like the above examples, another way to say the word multivariate is to simply place an independent variable - in this case "low income flag" - onto the row or column shelf in Tableau.



I feel like a lot of this discussion comes down to syntax and dialect. When I speak with people who have a rigid way of looking at information from a didactic and terminology point of view, I want to shake them around a little bit and ask "have you even tried to look at your information in any form whatsoever, before asking about this advanced crap that no one really understands that well to begin with?" At least, that is what I am thinking. Perhaps I am just staring at you vaguely.

A scenario I see all the time is when someone wants a particular analysis - say, multivariate - but when I ask them why, they don't know. They were just told to do it. Or they read about it somewhere. Or a colleague insisted on it. Or any number of other inane reasons. I advise these people to take a moment to ask themselves "what am I trying to do here."

Even worse is when someone truly knows what they want, but they don't want to spin up any cycles trying to use a tool to do it. If you are going to use Tableau or any piece of analytic software, take a moment to understand how it works. There's rarely a magic "multivariate" button. And if there is a claim of one, caveat emptor!