Give me six hours to chop down a tree and I will spend the first four sharpening the axe.
Lincoln’s wisdom is easily applied to data analysis. Very little of working with data is actually running the model. Most of it is defining the problem, gathering data, and preparing a presentation. Do not swing at your data with a dull axe! Here is how to analyze data like Lincoln:
1. Frame the problem.
Most business problems are unstructured. Our challenge is to find a sales-boosting promotion or a staffing pattern that enhances customer satisfaction.
The first step is to define the problem. Offer a hypothesis. Define your dependent and independent variables. Understand how the variables relate clearly.
2. Define — and defend — your assumptions
This one is difficult when you are modelling data on behalf of stakeholders who don’t know much about analytics. They want every exception and exclusion included in the model. Any assumptions you provide are presented as unrealistic.
Remember, though, that this is only a model. No model will ever be as complex or complete as real life. You just want something that will help you make sense of your data.
Assumptions allow you to control for factors so that you can test relationships between other variables. It’s the best solution we’ve got in business to a “control.”
You must make assumptions in your model because you are never going to capture everything in your data. Don’t try to make your data jive with the model 100%. In fact, if you do, then you’re going to overfit your data.
A model is like a pair of glasses. Don’t look at the glasses to test value. Look through them.
3. Collect and validate necessary data
Now you can gather data. Having all the data you need before starting your analysis will save time and frustration for later on.
In many cases, this will be the most time-consuming part of your project. You need to create a data set from several sources.
This can get ugly. That’s why it’s called “data wrangling.” But carefully gathering data lets you check for discrepancies or irregularities. Are you missing something? Is some data formatted incorrectly? What about outliers? Is there a normal distribution? Answer these questions now before you get too deep into your analysis.
4. Conduct your analysis.
This is the “tree chopping” segment of our exercise. If you did everything else right you will spend the least amount of time on this problem.
5. Communicate results.
I like to think of this part as “reserve data wrangling.” You take results and translate them into a newer format still: actionable insights.
If you have developed a well-framed model with the proper assumptions and the necessary data, you should have a reasonable solution. It shouldn’t be unbelievable, but it should provide some new information.
Unless you are communicating to other analysts, don’t present in math-speak. Alfred Marshall, the 19th Century economist, has the right idea here.
(1) Use mathematics as a shorthand language, rather than an engine of inquiry. (2) Keep to them till you have done. (3) Translate into English. (4) Then illustrate by examples that are important in real life. (5) Burn the mathematics. (6) If you can’t succeed in (4), burn (3). This last I did often.
When given a problem, data analysts are tempted to start swinging the axe immediately. Instead, take a lesson from Abraham Lincoln. Spend your time sharpening that axe, and your data problem tree will fall much easier.